List Info

Thread: RE: dataset parameters suitable for lucene application




RE: dataset parameters suitable for lucene application
country flaguser name
United States
2007-09-26 10:52:39
My experiences so far with this level of data have been
good.

Number of records: Maxed out at 8.8 million
Database size: friggin huge (100+ GB)
Index size: ~24 GB

1) It took me about a day to index 8 million docs using a
non-optimized
program I wrote. It's non-optimized in the sense that it's
not
multi-threaded. It batched together groups of about 5,000
docs at a time
to be indexed.

2) Search times for a basic search are almost always
sub-second. If we
toss in some faceting, it takes a little longer, but I've
hardly ever
seen it go above 1-2 seconds even with the most advanced
queries. 

Hope that helps.


Charlie

____________________________________________

-----Original Message-----
From: Law, John [mailto:John.Lawil.proquest.com] 
Sent: Wednesday, September 26, 2007 9:28 AM
To: solr-userlucene.apache.org
Subject: dataset parameters suitable for lucene application

I am new to the list and new to lucene and solr. I am
considering Lucene
for a potential new application and need to know how well it
scales. 

Following are the parameters of the dataset.

Number of records: 7+ million
Database size: 13.3 GB
Index Size:  10.9 GB 

My questions are simply:

1) Approximately how long would it take Lucene to index
these documents?
2) What would the approximate retrieval time be (i.e. search
response
time)?

Can someone provide me with some informed guidance in this
regard?

Thanks in advance,
John

______________________________________________
John Law
Director, Platform Management
ProQuest
789 Eisenhower Parkway
Ann Arbor, MI 48106
734-997-4877
john.lawil.proquest.com
www.proquest.com
www.csa.com

ProQuest... Start here.




Re: dataset parameters suitable for lucene application
user name
2007-09-26 11:49:27
By "maxed out" do you mean that Solr's performance
became unacceptable
beyond 8.8M records, or that you only had 8.8M records to
index? If
the former, can you share the particular symptoms?

On 9/26/07, Charlie Jackson <Charlie.Jacksoncision.com> wrote:
> My experiences so far with this level of data have been
good.
>
> Number of records: Maxed out at 8.8 million
> Database size: friggin huge (100+ GB)
> Index size: ~24 GB
>
> 1) It took me about a day to index 8 million docs using
a non-optimized
> program I wrote. It's non-optimized in the sense that
it's not
> multi-threaded. It batched together groups of about
5,000 docs at a time
> to be indexed.
>
> 2) Search times for a basic search are almost always
sub-second. If we
> toss in some faceting, it takes a little longer, but
I've hardly ever
> seen it go above 1-2 seconds even with the most
advanced queries.
>
> Hope that helps.
>
>
> Charlie
>
> ____________________________________________
>
> -----Original Message-----
> From: Law, John [mailto:John.Lawil.proquest.com]
> Sent: Wednesday, September 26, 2007 9:28 AM
> To: solr-userlucene.apache.org
> Subject: dataset parameters suitable for lucene
application
>
> I am new to the list and new to lucene and solr. I am
considering Lucene
> for a potential new application and need to know how
well it scales.
>
> Following are the parameters of the dataset.
>
> Number of records: 7+ million
> Database size: 13.3 GB
> Index Size:  10.9 GB
>
> My questions are simply:
>
> 1) Approximately how long would it take Lucene to index
these documents?
> 2) What would the approximate retrieval time be (i.e.
search response
> time)?
>
> Can someone provide me with some informed guidance in
this regard?
>
> Thanks in advance,
> John
>
> ______________________________________________
> John Law
> Director, Platform Management
> ProQuest
> 789 Eisenhower Parkway
> Ann Arbor, MI 48106
> 734-997-4877
> john.lawil.proquest.com
> www.proquest.com
> www.csa.com
>
> ProQuest... Start here.
>
>
>
>

RE: dataset parameters suitable for lucene application
country flaguser name
United States
2007-09-26 11:58:31
My experience so far:
200k number of indexes were created in 90 mins(including db
time), index
size is 200m, query a key word on all string fields(30)
takes 0.3-1 sec,
query a key word on one field takes tens of mill seconds.



-----Original Message-----
From: Charlie Jackson [mailto:Charlie.Jacksoncision.com] 
Sent: Wednesday, September 26, 2007 8:53 AM
To: solr-userlucene.apache.org
Subject: RE: dataset parameters suitable for lucene
application

My experiences so far with this level of data have been
good.

Number of records: Maxed out at 8.8 million
Database size: friggin huge (100+ GB)
Index size: ~24 GB

1) It took me about a day to index 8 million docs using a
non-optimized
program I wrote. It's non-optimized in the sense that it's
not
multi-threaded. It batched together groups of about 5,000
docs at a time
to be indexed.

2) Search times for a basic search are almost always
sub-second. If we
toss in some faceting, it takes a little longer, but I've
hardly ever
seen it go above 1-2 seconds even with the most advanced
queries. 

Hope that helps.


Charlie

____________________________________________

-----Original Message-----
From: Law, John [mailto:John.Lawil.proquest.com] 
Sent: Wednesday, September 26, 2007 9:28 AM
To: solr-userlucene.apache.org
Subject: dataset parameters suitable for lucene application

I am new to the list and new to lucene and solr. I am
considering Lucene
for a potential new application and need to know how well it
scales. 

Following are the parameters of the dataset.

Number of records: 7+ million
Database size: 13.3 GB
Index Size:  10.9 GB 

My questions are simply:

1) Approximately how long would it take Lucene to index
these documents?
2) What would the approximate retrieval time be (i.e. search
response
time)?

Can someone provide me with some informed guidance in this
regard?

Thanks in advance,
John

______________________________________________
John Law
Director, Platform Management
ProQuest
789 Eisenhower Parkway
Ann Arbor, MI 48106
734-997-4877
john.lawil.proquest.com
www.proquest.com
www.csa.com

ProQuest... Start here.



Re: dataset parameters suitable for lucene application
user name
2007-10-02 17:18:22
Hi There,

Would you mind if I pasted your data onto the wiki page at

http:
//wiki.apache.org/solr/SolrPerformanceData

I think it would be helpful to get some more numbers on that
page, so
people can help decide if Solr is the right application for
them.

Thanks,
Chris Harris, new Solr user

On 9/26/07, Xuesong Luo <xluosuccessfactors.com>
wrote:
> My experience so far:
> 200k number of indexes were created in 90
mins(including db time), index
> size is 200m, query a key word on all string fields(30)
takes 0.3-1 sec,
> query a key word on one field takes tens of mill
seconds.
>
>
>
> -----Original Message-----
> From: Charlie Jackson [mailto:Charlie.Jacksoncision.com]
> Sent: Wednesday, September 26, 2007 8:53 AM
> To: solr-userlucene.apache.org
> Subject: RE: dataset parameters suitable for lucene
application
>
> My experiences so far with this level of data have been
good.
>
> Number of records: Maxed out at 8.8 million
> Database size: friggin huge (100+ GB)
> Index size: ~24 GB
>
> 1) It took me about a day to index 8 million docs using
a non-optimized
> program I wrote. It's non-optimized in the sense that
it's not
> multi-threaded. It batched together groups of about
5,000 docs at a time
> to be indexed.
>
> 2) Search times for a basic search are almost always
sub-second. If we
> toss in some faceting, it takes a little longer, but
I've hardly ever
> seen it go above 1-2 seconds even with the most
advanced queries.
>
> Hope that helps.
>
>
> Charlie
>
> ____________________________________________
>
> -----Original Message-----
> From: Law, John [mailto:John.Lawil.proquest.com]
> Sent: Wednesday, September 26, 2007 9:28 AM
> To: solr-userlucene.apache.org
> Subject: dataset parameters suitable for lucene
application
>
> I am new to the list and new to lucene and solr. I am
considering Lucene
> for a potential new application and need to know how
well it scales.
>
> Following are the parameters of the dataset.
>
> Number of records: 7+ million
> Database size: 13.3 GB
> Index Size:  10.9 GB
>
> My questions are simply:
>
> 1) Approximately how long would it take Lucene to index
these documents?
> 2) What would the approximate retrieval time be (i.e.
search response
> time)?
>
> Can someone provide me with some informed guidance in
this regard?
>
> Thanks in advance,
> John
>
> ______________________________________________
> John Law
> Director, Platform Management
> ProQuest
> 789 Eisenhower Parkway
> Ann Arbor, MI 48106
> 734-997-4877
> john.lawil.proquest.com
> www.proquest.com
> www.csa.com
>
> ProQuest... Start here.
>
>
>

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )