|
List Info
Thread: Get the TokenStream of an indexed but unstored field
|
|
| Get the TokenStream of an indexed but
unstored field |
  United States |
2007-08-03 04:18:24 |
Hi,
I indexed a large number of large documents, but I did not
index the
document themselves.
Now I am interested in getting the vector (i.e.: the terms
indexed and the
frequency) of that indexed but unstored field.
doc.getField (fieldname) returns null.
How can I get the data? It must be there, since it's a part
of the index, or
am I wrong?
Would be grateful for a quick result (need to submit data
for a conference
this weekend).
thanks,
Nir.
--
View this message in context: http://www.na
bble.com/Get-the-TokenStream-of-an-indexed-but-unstored-fiel
d-tf4211430.html#a11980001
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.
|
|
| Re: Get the terms and frequency vector
of an indexed but unstored field |
  United States |
2007-08-03 06:36:48 |
you can use IndexReader.getTermFreqVectors(int n) to get all
terms and their
frequencies. Make sure when you create an index, you choose
option to store
it by specifying Field.TermVector option.
Check out http://www.cnlp.org/presentations/slides/AdvancedLu
ceneEU.pdf
tierecke wrote:
>
> Hi,
>
> I indexed a large number of large documents, but I did
not store the
> document themselves, just indexed them.
> Now I am interested in getting the vector (i.e.: the
terms indexed and the
> frequency) of that indexed but unstored field.
> doc.getField (fieldname) returns null.
> How can I get the data? It must be there, since it's a
part of the index,
> or am I wrong?
>
> Would be grateful for a quick result (need to submit
data for a conference
> this weekend).
> thanks,
> Nir.
>
--
View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11981677
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: Get the terms and frequency vector
of an indexed but unstored field |
  United States |
2007-08-03 08:35:22 |
Thanks a lot, that works 100%!...
Fortunately, I did use the flag to state that Lucene should
store the term
frequency vector. Otherwise, I'd have to index 77GB right
now...
--
View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11983495
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: Get the TokenStream of an indexed
but unstored field |

|
2007-08-03 09:15:06 |
<<<I indexed a large number of large documents, but
I did not index the
document themselves.>>>
This is really confusing since it's self-contradictory.
Could you
post the lines where you do the document.add() for the
fields in
question?
Best
Erick
On 8/3/07, tierecke <nir.nussbaum gmail.com> wrote:
>
>
> Hi,
>
> I indexed a large number of large documents, but I did
not index the
> document themselves.
> Now I am interested in getting the vector (i.e.: the
terms indexed and the
> frequency) of that indexed but unstored field.
> doc.getField (fieldname) returns null.
> How can I get the data? It must be there, since it's a
part of the index,
> or
> am I wrong?
>
> Would be grateful for a quick result (need to submit
data for a conference
> this weekend).
> thanks,
> Nir.
> --
> View this message in context:
> http://www.na
bble.com/Get-the-TokenStream-of-an-indexed-but-unstored-fiel
d-tf4211430.html#a11980001
> Sent from the Lucene - Java Users mailing list archive
at Nabble.com.
>
|
|
| Re: Get the TokenStream of an indexed
but unstored field |
  United States |
2007-08-03 09:29:50 |
I fixed my question later. I meant I did not STORE the
document themselves.
Anyway - the issue is already solved, thank to testn.
But there are new hard (for me) questions.
Thanks a lot!
Erick Erickson wrote:
>
> I indexed a large number of large documents, but I did
not index the
> document themselves.
>
> This is really confusing since it's self-contradictory.
Could you
> post the lines where you do the document.add() for the
fields in
> question?
>
> Best
> Erick
>
--
View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11984434
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: Get the terms and frequency vector
of an indexed but unstored field |

|
2007-11-06 02:51:01 |
Hi,
If while indexing we have not set this flag, then is there
any other way to
get this info, i mean the TermFreqVector for a document ??
On 8/3/07, testn <test1 doramail.com> wrote:
>
>
> you can use IndexReader.getTermFreqVectors(int n) to
get all terms and
> their
> frequencies. Make sure when you create an index, you
choose option to
> store
> it by specifying Field.TermVector option.
> Check out http://www.cnlp.org/presentations/slides/AdvancedLu
ceneEU.pdf
>
>
>
> tierecke wrote:
> >
> > Hi,
> >
> > I indexed a large number of large documents, but I
did not store the
> > document themselves, just indexed them.
> > Now I am interested in getting the vector (i.e.:
the terms indexed and
> the
> > frequency) of that indexed but unstored field.
> > doc.getField (fieldname) returns null.
> > How can I get the data? It must be there, since
it's a part of the
> index,
> > or am I wrong?
> >
> > Would be grateful for a quick result (need to
submit data for a
> conference
> > this weekend).
> > thanks,
> > Nir.
> >
>
> --
> View this message in context:
> http://www.nabble.com/Get-the-terms-and-frequency-vector-o
f-an-indexed-but-unstored-field-tf4211430.html#a11981677
> Sent from the Lucene - Java Users mailing list archive
at Nabble.com.
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> For additional commands, e-mail: java-user-help lucene.apache.org
>
>
|
|
| Re: Get the terms and frequency vector
of an indexed but unstored field |
  Netherlands |
2007-11-06 05:35:38 |
6 nov 2007 kl. 09.51 skrev Shailendra Mudgal:
> Hi,
> If while indexing we have not set this flag, then is
there any
> other way to
> get this info, i mean the TermFreqVector for a document
??
See TermVectorAccessor in JIRA.
http
://issues.apache.org/jira/browse/LUCENE-1016
The highligher also has some ad hoc code for extracting the
data from
the inverted index using TermEnum and TermDocs. It can
however take
quite some time.
--
karl
>
>
>
> On 8/3/07, testn <test1 doramail.com> wrote:
>>
>>
>> you can use IndexReader.getTermFreqVectors(int n)
to get all terms
>> and
>> their
>> frequencies. Make sure when you create an index,
you choose option to
>> store
>> it by specifying Field.TermVector option.
>> Check out http://www.
cnlp.org/presentations/slides/
>> AdvancedLuceneEU.pdf
>>
>>
>>
>> tierecke wrote:
>>>
>>> Hi,
>>>
>>> I indexed a large number of large documents,
but I did not store the
>>> document themselves, just indexed them.
>>> Now I am interested in getting the vector
(i.e.: the terms
>>> indexed and
>> the
>>> frequency) of that indexed but unstored field.
>>> doc.getField (fieldname) returns null.
>>> How can I get the data? It must be there, since
it's a part of the
>> index,
>>> or am I wrong?
>>>
>>> Would be grateful for a quick result (need to
submit data for a
>> conference
>>> this weekend).
>>> thanks,
>>> Nir.
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Get-the-terms-and-frequency
-vector-of-an-
>>
indexed-but-unstored-field-tf4211430.html#a11981677
>> Sent from the Lucene - Java Users mailing list
archive at Nabble.com.
>>
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
>> For additional commands, e-mail: java-user-help lucene.apache.org
>>
>>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
[1-7]
|
|