On Fri, May 11, 2007 at 05:23:43AM +0100, Olly Betts wrote:
> > I want the top speed during indexing and searches,
and I do not care about
> > smallest database. I think most of users feel the
same. If "gzip -9" makes
> > the indexing slightly slower, remove it. *smile*
>
> The thing is that smaller is often faster. Once I/O
becomes the
> limiting factor, compression will speed things up. CPU
speeds have
> increased faster than storage speeds over time, so this
is likely to
> be more true than it ever was!
This is hugely important, and is something that a lot of
people
miss. It doesn't make a huge amount of difference when
you're dealing
with small data sets (say, less than half the size of core),
but then
the delta cost should be fairly minimal. Once you get into
moderately
large data sets (say two to four times core), you're going
to start
hurting very badly if you're wasting time transferring data
suboptimally (*). Even if you can stack enough disks to get
maximum
fibre speed, you're still only managing a few gig per
second; given
your core will be a minimum of 8G these days, cutting down
your
storage size becomes really important. (And that's assuming
that only
one machine has access to the fabric, when it's more likely
to be
shared...)
David Braben has an interesting graph that backs this up
(admittedly
from the point of view of consoles). It's *more* important
to get
decent compression on your data than it was in the days of
Elite and
Exile!
(*) I have a tiresome anecdote about inefficient data
transfer over
NFSv3 versus NFSv4 bringing our data centre to a
standstill.
J
--
/-----------------------------------------------------------
---------------
James Aylett
xapian.org
james tartarus.org
uncertaintydivision.org
_______________________________________________
Xapian-discuss mailing list
Xapian-discuss lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss
a>
|