List Info

Thread: Re: Update of "SolrPerformanceFactors" by paulsundling




Re: Update of "SolrPerformanceFactors" by paulsundling
user name
2007-08-24 15:06:43
On 8/24/07, Apache Wiki <wikidiffsapache.org> wrote:
> + Using an [EmbeddedSolr] for indexing can be over 50%
faster than one using XML messages that are posted.

Paul, were the documents posted one-per-message, or did you
try
multiple (like 50 to 100) per message?  If one per message,
the best
way to increase performance is to have multiple threads
adding docs.

I'd be curious to know how a single CSV file would clock in
at as well...

-Yonik

RE: Update of "SolrPerformanceFactors" by paulsundling
user name
2007-08-24 19:24:35
Sorry I replied to a subset of that question on the user
list.  I'll
include my whole message, since it also relates to a past
topic on this
list (Time for a cleaner API):

The embedded approach is at http://wiki.
apache.org/solr/EmbeddedSolr

For my testing I have a tunable setting for records to
submit and did 10
per batch.  Both approaches committed after every 1000
records, also
tunable.  

A custom Lucene implementation I helped implement was even
faster than
embedded, using a ramdrive as a double buffer.  However that
did require
a much larger memory footprint.

The embedded class have little to no documentation and
almost look like
stub implementations, but they work well.

While this project will succeed in a large part to how easy
it is to
integrate with non Java clients, I would actually like to
see this
project more java friendly, like a reference indexing
implementation.
There are a lot of tools that could be more widely useful
like
SimplePostTool.  

With a few API changes it could be used for the demo as well
as a useful
library.  Instead I extended and then had to abandon that
and resort to
cut and paste reuse in the end.  The functionality was 95%
there, but
just needed API tweaks to make it usable.  It also seems
unusual
exposing fields directly instead of using accessors in the
Java code.
Accessors can be give a lot of flexibility that field access
doesn't
have.

It would also be nice to able to get java objects back
besides XML and
JSON, like an Embedded equivalent for search.  That way you
could
integrate more easily with Spring MVC, etc.  There may also
be some
performance gains there.  

Paul Sundling


-----Original Message-----
From: yseeleygmail.com [mailto:yseeleygmail.com] On Behalf Of Yonik
Seeley
Sent: Friday, August 24, 2007 1:07 PM
To: solr-devlucene.apache.org
Subject: Re: [Solr Wiki] Update of
"SolrPerformanceFactors" by
paulsundling


On 8/24/07, Apache Wiki <wikidiffsapache.org> wrote:
> + Using an [EmbeddedSolr] for indexing can be over 50%
faster than one

> + using XML messages that are posted.

Paul, were the documents posted one-per-message, or did you
try multiple
(like 50 to 100) per message?  If one per message, the best
way to
increase performance is to have multiple threads adding
docs.

I'd be curious to know how a single CSV file would clock in
at as
well...

-Yonik


Re: Update of "SolrPerformanceFactors" by paulsundling
user name
2007-08-24 19:36:25
On 8/24/07, Sundling, Paul <paul.sundlingsonyconnect.com> wrote:
> It also seems unusual
> exposing fields directly instead of using accessors in
the Java code.
> Accessors can be give a lot of flexibility that field
access doesn't
> have.

If you are referring to the UpdateCommand classes, I
absolutely agree.
 My fault... but I never intended for those to be public
API.  I'm not
sure if it's too late to change them - the update processor
patch
really makes them more public, so after the next Solr
release it will
be even harder to change.

-Yonik

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )