|
List Info
Thread: RE: Embedded about 50% faster for indexing
|
|
| RE: Embedded about 50% faster for
indexing |

|
2007-08-27 14:44:15 |
Whether embedded solr should give me a performance boost or
not, it did.
I'm
not surprised, since it skips XML parsing. Although you
never
know where cycles are used for sure until you profile.
I tried doing more records per post (200) and it was
actually slightly
slower and seemed to require more memory. This makes sense
because you
have to take up more memory for the StringBuilder to store
the much
larger XML. For 10,000 it was much slower. For that size I
would need
to XML streaming or something to make it work.
The solr war was on the same machine, so network overhead
was only from
using loopback.
Paul Sundling
-----Original Message-----
From: climbingrose [mailto:climbingrose gmail.com]
Sent: Monday, August 27, 2007 12:22 AM
To: solr-user lucene.apache.org
Subject: Re: Embedded about 50% faster for indexing
Haven't tried the embedded server but I think I have to
agree with Mike.
We're currently sending 2000 job batches to SOLR server and
the amount
of time required to transfer documents over http is
insignificant
compared with the time required to index them. So I do think
unless you
are sending document one by one, embedded SOLR shouldn't
give you much
more performance boost.
On 8/25/07, Mike Klaas <mike.klaas gmail.com> wrote:
>
> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:
>
> >> -----Original Message-----
> >> From: yseeley gmail.com
[mailto:yseeley gmail.com] On Behalf Of
> >> Yonik Seeley
> >> Sent: Friday, August 24, 2007 2:07 PM
> >> To: solr-user lucene.apache.org
> >> Subject: Re: Embedded about 50% faster for
indexing
> >>
> >> One thing I'd like to avoid is everyone trying
to embed just for
> >> performance gains. If there is really that
much difference, then we
> >> need a better way for people to get that
without resorting to Java
> >> code.
> >>
> >> -Yonik
> >>
> >
> > Theoretically and practically, embedded solution
will be faster than
> > going through http/xml.
>
> This is only true if the http interface adds
significant overhead to
> the cost of indexing a document, and I don't see why
this should be
> so, as indexing is relatively heavyweight. setting up
the connection
> could be expensive, but this can be greatly mitigated
by sending more
> than one doc per http request, using persistent
connections, and
> threading.
>
> -Mike
>
--
Regards,
Cuong Hoang
|
|
| RE: Embedded about 50% faster for
indexing |

|
2007-08-27 16:35:17 |
Sorry I got mixed up with the numbers it was faster than 200
records
(2:37:17) than with 10 records (3:21:36), but still
definitely slower
than embedded (2:10:23) and requires a larger memory
footprint.
Embedded and post with 10 records was run with 64M, but the
later 200
records were run with 128M. I'm not sure if embedded would
benefit much
for more memory. Also with 10K, double buffering with
threading and
even higher memory would help greatly.
In the end indexing performance isn't probably that critical
unless
you're talking many millions and you need full indexes
often. So
generally this is moot, but it's still interesting.
Paul Sundling
-----Original Message-----
From: Sundling, Paul
Sent: Monday, August 27, 2007 12:44 PM
To: solr-user lucene.apache.org
Subject: RE: Embedded about 50% faster for indexing
Whether embedded solr should give me a performance boost or
not, it did.
I'm
not surprised, since it skips XML parsing. Although you
never
know where cycles are used for sure until you profile.
I tried doing more records per post (200) and it was
actually slightly
slower and seemed to require more memory. This makes sense
because you
have to take up more memory for the StringBuilder to store
the much
larger XML. For 10,000 it was much slower. For that size I
would need
to XML streaming or something to make it work.
The solr war was on the same machine, so network overhead
was only from
using loopback.
Paul Sundling
-----Original Message-----
From: climbingrose [mailto:climbingrose gmail.com]
Sent: Monday, August 27, 2007 12:22 AM
To: solr-user lucene.apache.org
Subject: Re: Embedded about 50% faster for indexing
Haven't tried the embedded server but I think I have to
agree with Mike.
We're currently sending 2000 job batches to SOLR server and
the amount
of time required to transfer documents over http is
insignificant
compared with the time required to index them. So I do think
unless you
are sending document one by one, embedded SOLR shouldn't
give you much
more performance boost.
On 8/25/07, Mike Klaas <mike.klaas gmail.com> wrote:
>
> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:
>
> >> -----Original Message-----
> >> From: yseeley gmail.com
[mailto:yseeley gmail.com] On Behalf Of
> >> Yonik Seeley
> >> Sent: Friday, August 24, 2007 2:07 PM
> >> To: solr-user lucene.apache.org
> >> Subject: Re: Embedded about 50% faster for
indexing
> >>
> >> One thing I'd like to avoid is everyone trying
to embed just for
> >> performance gains. If there is really that
much difference, then we
> >> need a better way for people to get that
without resorting to Java
> >> code.
> >>
> >> -Yonik
> >>
> >
> > Theoretically and practically, embedded solution
will be faster than
> > going through http/xml.
>
> This is only true if the http interface adds
significant overhead to
> the cost of indexing a document, and I don't see why
this should be
> so, as indexing is relatively heavyweight. setting up
the connection
> could be expensive, but this can be greatly mitigated
by sending more
> than one doc per http request, using persistent
connections, and
> threading.
>
> -Mike
>
--
Regards,
Cuong Hoang
|
|
| Re: Embedded about 50% faster for
indexing |
  Canada |
2007-08-27 19:49:39 |
On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote:
> Whether embedded solr should give me a performance
boost or not, it
> did.
>
I'm not surprised, since it skips XML parsing. Although you
never
> know where cycles are used for sure until you profile.
It certainly is possible that XML parsing dwarfs indexing,
but I'd
expect that only to occur under very light analysis and
field storage
workloads.
> I tried doing more records per post (200) and it was
actually slightly
> slower and seemed to require more memory. This makes
sense because
> you
> have to take up more memory for the StringBuilder to
store the much
> larger XML. For 10,000 it was much slower. For that
size I would
> need
> to XML streaming or something to make it work.
>
> The solr war was on the same machine, so network
overhead was only
> from
> using loopback.
The big question is still your connection handling strategy:
are you
using persistent http connections? Are you threadedly
indexing?
cheers,
-Mike
> Paul Sundling
>
> -----Original Message-----
> From: climbingrose [mailto:climbingrose gmail.com]
> Sent: Monday, August 27, 2007 12:22 AM
> To: solr-user lucene.apache.org
> Subject: Re: Embedded about 50% faster for indexing
>
>
> Haven't tried the embedded server but I think I have to
agree with
> Mike.
> We're currently sending 2000 job batches to SOLR server
and the amount
> of time required to transfer documents over http is
insignificant
> compared with the time required to index them. So I do
think unless
> you
> are sending document one by one, embedded SOLR
shouldn't give you much
> more performance boost.
>
> On 8/25/07, Mike Klaas <mike.klaas gmail.com> wrote:
>>
>> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:
>>
>>>> -----Original Message-----
>>>> From: yseeley gmail.com
[mailto:yseeley gmail.com] On Behalf Of
>>>> Yonik Seeley
>>>> Sent: Friday, August 24, 2007 2:07 PM
>>>> To: solr-user lucene.apache.org
>>>> Subject: Re: Embedded about 50% faster for
indexing
>>>>
>>>> One thing I'd like to avoid is everyone
trying to embed just for
>>>> performance gains. If there is really that
much difference, then we
>
>>>> need a better way for people to get that
without resorting to Java
>>>> code.
>>>>
>>>> -Yonik
>>>>
>>>
>>> Theoretically and practically, embedded
solution will be faster than
>
>>> going through http/xml.
>>
>> This is only true if the http interface adds
significant overhead to
>> the cost of indexing a document, and I don't see
why this should be
>> so, as indexing is relatively heavyweight. setting
up the connection
>> could be expensive, but this can be greatly
mitigated by sending more
>> than one doc per http request, using persistent
connections, and
>> threading.
>>
>> -Mike
>>
>
>
>
> --
> Regards,
>
> Cuong Hoang
|
|
[1-3]
|
|