Nigel Daley wrote:
> So shouldn't fixing this test to conform to the new
model in HADOOP-1134
> be the concern of the patch for HADOOP-1134?
Yes, but, as it stands, this patch would silently stop
working correctly
once HADOOP-1134 is committed. It should instead be written
in a more
robust way, that can survive expected changes. Relying on
HDFS using
ChecksumFileSystem isn't as reliable as an explicit
constructor that
says "I want an unchecksummed FileSystem."
> As it stand, I can't run
> NNBench at scale without using a raw file system, which
is what this
> patch is intended to allow.
It seems strange to disable things in an undocumented and
unsupported
way in order to get a benchmark to complete. How does that
prove
scalability? Rather, leaving NNBench alone seems like a
strong argument
for implementing HADOOP-1134 sooner.
Still, if you want to be able to disable checksums, for
benchmarks or
whatever, we can permit that, but should do so explicitly.
> HADOOP-928 caused this test to use a
> ChecksumFileSystem and subsequently we saw our
"read" TPS metric plummet
> from 20,000 to a couple hundred.
Ah, NNBench used the 'raw' methods before, which was kind of
sneaky on
its part, since it didn't benchmark the typical user
experience.
Although the namenode performance should only halve at worst
with
checksums as currently implemented, no?
> Let's get our current benchmark back on track before we
commit
> HADOOP-1134 (which will likely take a while before it
is "Patch
> Available").
I'd argue that we should fix the benchmark to accurately
reflect what
users see, so that we see real improvement when HADOOP-1134
is
committed. That would make it a more useful and realistic
benchmark.
However if you believe that a checksum-free benchmark is
still useful, I
think it should be more future-proof.
Doug
|