List Info

Thread: Re: Re: Re: get_docid over multi-database search




Re: Re: Re: get_docid over multi-database search
country flaguser name
United Kingdom
2007-12-20 08:12:30
On Wed, Dec 19, 2007 at 02:32:52PM -0800, Kevin Duraj
wrote:

> In my case having 100-500GB data on hard disk, the data
cannot fit
> into memory and using two databases is two times slower
than using
> single database.

Are you spindle-restricted here? Just a thought.

I don't actually know how the matcher deals with multiple
databases
right now, but I suspect it does it in a sort of
pseudo-parallel [1],
in which case putting two databases behind the same re-seek
bottleneck
is going to utterly destroy performance in a way that
wouldn't happen
if you laid out your data differently onto the available
platters. Figuring out the profile of this kind of thing is
a pain,
because you often have to write your own analysis tools :-/

[1] I'm sure Olly or Richard can jump in here, but I'm
assuming this
because if you fill up the candidate mset from both
databases
concurrently then I think you're *probably* going to run for
less
time, because your minimum-weight to get into the candidate
mset
probably has more chance of drifting up faster (assuming the
two
databases are roughly equally relevant to your query). Lots
of caveats
there, and my assumption may be wrong anyway 

J

-- 
/-----------------------------------------------------------
---------------
  James Aylett                                              
   xapian.org
  jamestartarus.org                              
uncertaintydivision.org

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )