On 31-Aug-07, at 7:13 AM, Yonik Seeley wrote:
> On 8/30/07, Chris Hostetter <hossman_lucene fucit.org> wrote:
>>> example/solr/conf/solrconfig.xml:
>>>
>>>
<maxBufferedDocs>1000</maxBufferedDocs>
>>>
>>> Anyone else thinks that this might be a tad
high? lucene ships
>>> with MBD==10.
>>
>> A lot of the settings in the orriginal example
config/schema came
>> from one
>> particular index we had at CNET ... i think it
would makes sense
>> to change
>> almost any settings that have a hardcoded default
in code to match
>> the
>> hardcoded default.
>
> I don't think Solr should necessarily use the same
defaults as Lucene.
> An MBD of 10 performed much worse for the average Solr
collection.
> In the next release, I think the default should be to
flush by memory
> (prob at 32MB level) since it will give good
performance at reasonable
> memory usage regardless of document size.
I agree that flush by mem is the best option, but I wasn't
sure if
we'd end up doing a release before lucene 2.3 or not.
In general I am okay with the Solr defaults being different
from
lucene defaults (I'd expect library code to be more
conservative).
1000, though, is really big for decent sized docs (like web
pages
with lots of metadata fields). Especially since things like
token
filters for the buffered docs get kept around until they are
flushed
(WhitespaceTokenizer, for instance, allocates 1.2kB of
buffers per
instance). 100, perhaps (assuming it isn't moot)?
-Mike
|