List Info

Thread: Get the TokenStream of an indexed but unstored field




Get the TokenStream of an indexed but unstored field
country flaguser name
United States
2007-08-03 04:18:24
Hi,

I indexed a large number of large documents, but I did not
index the
document themselves.
Now I am interested in getting the vector (i.e.: the terms
indexed and the
frequency) of that indexed but unstored field.
doc.getField (fieldname) returns null.
How can I get the data? It must be there, since it's a part
of the index, or
am I wrong?

Would be grateful for a quick result (need to submit data
for a conference
this weekend).
thanks,
Nir.
-- 
View this message in context: http://www.na
bble.com/Get-the-TokenStream-of-an-indexed-but-unstored-fiel
d-tf4211430.html#a11980001
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.
Re: Get the terms and frequency vector of an indexed but unstored field
country flaguser name
United States
2007-08-03 06:36:48
you can use IndexReader.getTermFreqVectors(int n) to get all
terms and their
frequencies. Make sure when you create an index, you choose
option to store
it by specifying Field.TermVector option.
Check out http://www.cnlp.org/presentations/slides/AdvancedLu
ceneEU.pdf



tierecke wrote:
> 
> Hi,
> 
> I indexed a large number of large documents, but I did
not store the
> document themselves, just indexed them.
> Now I am interested in getting the vector (i.e.: the
terms indexed and the
> frequency) of that indexed but unstored field.
> doc.getField (fieldname) returns null.
> How can I get the data? It must be there, since it's a
part of the index,
> or am I wrong?
> 
> Would be grateful for a quick result (need to submit
data for a conference
> this weekend).
> thanks,
> Nir.
> 

-- 
View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11981677

Sent from the Lucene - Java Users mailing list archive at
Nabble.com.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Get the terms and frequency vector of an indexed but unstored field
country flaguser name
United States
2007-08-03 08:35:22
Thanks a lot, that works 100%!...
Fortunately, I did use the flag to state that Lucene should
store the term
frequency vector. Otherwise, I'd have to index 77GB right
now... 
-- 
View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11983495

Sent from the Lucene - Java Users mailing list archive at
Nabble.com.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Get the TokenStream of an indexed but unstored field
user name
2007-08-03 09:15:06
<<<I indexed a large number of large documents, but
I did not index the
document themselves.>>>

This is really confusing since it's self-contradictory.
Could you
post the lines where you do the document.add() for the
fields in
question?

Best
Erick

On 8/3/07, tierecke <nir.nussbaumgmail.com> wrote:
>
>
> Hi,
>
> I indexed a large number of large documents, but I did
not index the
> document themselves.
> Now I am interested in getting the vector (i.e.: the
terms indexed and the
> frequency) of that indexed but unstored field.
> doc.getField (fieldname) returns null.
> How can I get the data? It must be there, since it's a
part of the index,
> or
> am I wrong?
>
> Would be grateful for a quick result (need to submit
data for a conference
> this weekend).
> thanks,
> Nir.
> --
> View this message in context:
> http://www.na
bble.com/Get-the-TokenStream-of-an-indexed-but-unstored-fiel
d-tf4211430.html#a11980001
> Sent from the Lucene - Java Users mailing list archive
at Nabble.com.
>
Re: Get the TokenStream of an indexed but unstored field
country flaguser name
United States
2007-08-03 09:29:50
I fixed my question later. I meant I did not STORE the
document themselves.
Anyway - the issue is already solved, thank to testn.
But there are new hard (for me) questions.
Thanks a lot!

Erick Erickson wrote:
> 
> I indexed a large number of large documents, but I did
not index the
> document themselves.
> 
> This is really confusing since it's self-contradictory.
Could you
> post the lines where you do the document.add() for the
fields in
> question?
> 
> Best
> Erick
> 
-- 
View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11984434

Sent from the Lucene - Java Users mailing list archive at
Nabble.com.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: Get the terms and frequency vector of an indexed but unstored field
user name
2007-11-06 02:51:01
Hi,
If while indexing we have not set this flag, then is there
any other way to
get this info, i mean the TermFreqVector for a document ??



On 8/3/07, testn <test1doramail.com> wrote:
>
>
> you can use IndexReader.getTermFreqVectors(int n) to
get all terms and
> their
> frequencies. Make sure when you create an index, you
choose option to
> store
> it by specifying Field.TermVector option.
> Check out http://www.cnlp.org/presentations/slides/AdvancedLu
ceneEU.pdf
>
>
>
> tierecke wrote:
> >
> > Hi,
> >
> > I indexed a large number of large documents, but I
did not store the
> > document themselves, just indexed them.
> > Now I am interested in getting the vector (i.e.:
the terms indexed and
> the
> > frequency) of that indexed but unstored field.
> > doc.getField (fieldname) returns null.
> > How can I get the data? It must be there, since
it's a part of the
> index,
> > or am I wrong?
> >
> > Would be grateful for a quick result (need to
submit data for a
> conference
> > this weekend).
> > thanks,
> > Nir.
> >
>
> --
> View this message in context:
> http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11981677

> Sent from the Lucene - Java Users mailing list archive
at Nabble.com.
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
Re: Get the terms and frequency vector of an indexed but unstored field
country flaguser name
Netherlands
2007-11-06 05:35:38
6 nov 2007 kl. 09.51 skrev Shailendra Mudgal:

> Hi,
> If while indexing we have not set this flag, then is
there any  
> other way to
> get this info, i mean the TermFreqVector for a document
??

See TermVectorAccessor in JIRA.

http
://issues.apache.org/jira/browse/LUCENE-1016

The highligher also has some ad hoc code for extracting the
data from  
the inverted index using TermEnum and TermDocs. It can
however take  
quite some time.

-- 
karl


>
>
>
> On 8/3/07, testn <test1doramail.com> wrote:
>>
>>
>> you can use IndexReader.getTermFreqVectors(int n)
to get all terms  
>> and
>> their
>> frequencies. Make sure when you create an index,
you choose option to
>> store
>> it by specifying Field.TermVector option.
>> Check out http://www.
cnlp.org/presentations/slides/ 
>> AdvancedLuceneEU.pdf
>>
>>
>>
>> tierecke wrote:
>>>
>>> Hi,
>>>
>>> I indexed a large number of large documents,
but I did not store the
>>> document themselves, just indexed them.
>>> Now I am interested in getting the vector
(i.e.: the terms  
>>> indexed and
>> the
>>> frequency) of that indexed but unstored field.
>>> doc.getField (fieldname) returns null.
>>> How can I get the data? It must be there, since
it's a part of the
>> index,
>>> or am I wrong?
>>>
>>> Would be grateful for a quick result (need to
submit data for a
>> conference
>>> this weekend).
>>> thanks,
>>> Nir.
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Get-the-terms-and-frequency
-vector-of-an- 
>>
indexed-but-unstored-field-tf4211430.html#a11981677
>> Sent from the Lucene - Java Users mailing list
archive at Nabble.com.
>>
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
>> For additional commands, e-mail: java-user-helplucene.apache.org
>>
>>


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )