On Wed, Dec 19, 2007 at 02:32:52PM -0800, Kevin Duraj
wrote:
> In my case having 100-500GB data on hard disk, the data
cannot fit
> into memory and using two databases is two times slower
than using
> single database.
Are you spindle-restricted here? Just a thought.
I don't actually know how the matcher deals with multiple
databases
right now, but I suspect it does it in a sort of
pseudo-parallel [1],
in which case putting two databases behind the same re-seek
bottleneck
is going to utterly destroy performance in a way that
wouldn't happen
if you laid out your data differently onto the available
platters. Figuring out the profile of this kind of thing is
a pain,
because you often have to write your own analysis tools :-/
[1] I'm sure Olly or Richard can jump in here, but I'm
assuming this
because if you fill up the candidate mset from both
databases
concurrently then I think you're *probably* going to run for
less
time, because your minimum-weight to get into the candidate
mset
probably has more chance of drifting up faster (assuming the
two
databases are roughly equally relevant to your query). Lots
of caveats
there, and my assumption may be wrong anyway
J
--
/-----------------------------------------------------------
---------------
James Aylett
xapian.org
james tartarus.org
uncertaintydivision.org
_______________________________________________
Xapian-discuss mailing list
Xapian-discuss lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss
a>
|