List Info

Thread: Basic Question in Lucene Indexing.




Basic Question in Lucene Indexing.
country flaguser name
United States
2007-04-12 12:12:42
I have one million records to index, each of which have
"Tiltle",
"Desciption" and "Identifier". If take
each document and try to index these
fields my program was very slow. So I took 100,000 records
and get the value
of these fields, add them to the addDocument() method. Then
I use the Index
writer to write this document. So by doing this looks like
it creates only
one document id and have all contents in that.I repeat this
writing for
700,000 records so 70 doc ids are craeted in total. Till now
no
issue(atleast I assumed)

Then I tried to search the for some value, I was getting
Hits whose length
would be some number say 21 and when i try to retrieve the
documents
assuming all 21 documents have matches they actually dont
have, so whats
happening is, it just gets the docs from same document id.
Luke was helpful
in finding this issue. Later I took just around 20 records
and tried to
index then separately and tried to retrieve and it worked
fine. 

Now my major issue is when I try to open index 700,000
times, it will be
really very slow. I am wondering what is the ideal way to do
this.

Thanks in Advance.
-- 
View this message in context: http://www.nabble.com/Basic-Ques
tion-in-Lucene-Indexing.-tf3566940.html#a9964017
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Basic Question in Lucene Indexing.
user name
2007-04-12 12:27:45
Don't do that <G> Why are you trying to open the index
700,000 times?
During indexing or searching? In either case, there's no
reason
to. You should be able to open the index and keep it open as
long
as you want.

I still don't understand why you can't index the records
individually,
but I'll assume you've proven it to your own satisfaction.
But I
will mention that if you were opening/closing your index
each
time, *that* was your problem....

Erick.

On 4/12/07, Lokeya <lokeyagmail.com> wrote:
>
>
> I have one million records to index, each of which have
"Tiltle",
> "Desciption" and "Identifier". If
take each document and try to index
> these
> fields my program was very slow. So I took 100,000
records and get the
> value
> of these fields, add them to the addDocument() method.
Then I use the
> Index
> writer to write this document. So by doing this looks
like it creates only
> one document id and have all contents in that.I repeat
this writing for
> 700,000 records so 70 doc ids are craeted in total.
Till now no
> issue(atleast I assumed)
>
> Then I tried to search the for some value, I was
getting Hits whose length
> would be some number say 21 and when i try to retrieve
the documents
> assuming all 21 documents have matches they actually
dont have, so whats
> happening is, it just gets the docs from same document
id. Luke was
> helpful
> in finding this issue. Later I took just around 20
records and tried to
> index then separately and tried to retrieve and it
worked fine.
>
> Now my major issue is when I try to open index 700,000
times, it will be
> really very slow. I am wondering what is the ideal way
to do this.
>
> Thanks in Advance.
> --
> View this message in context:
> http://www.nabble.com/Basic-Ques
tion-in-Lucene-Indexing.-tf3566940.html#a9964017
> Sent from the Lucene - Java Users mailing list archive
at Nabble.com.
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )