|
List Info
Thread: Lucene newbee quesiton- Term Positions
|
|
| Lucene newbee quesiton- Term Positions |

|
2007-10-07 11:22:03 |
Hello,
I have simple lucene 2.2 index created. I want to list all
the terms and
their positions in a document. how can I do it ?
Can you please provide some sample code.
Thanks !
|
|
| Re: Lucene newbee quesiton- Term
Positions |

|
2007-10-07 11:38:56 |
I suspect that this is more work than you think, not to
mention
very slow. This is just due to the nature of an inverted
index....
To see what I mean, get a copy of Luke and have it
reconstruct one of your documents and you'll see what the
performance is like.
I think Luke has all the example code you could ask for,
that's
the place I'd look first. See:
http://lucene.apache.org/java/docs/contributions.html
Why do you want to do this and is it really necessary? You
could think about storing the entire document, then when
you
needed to count terms, just using one of the tokenizers and
counting them yourself....
Best
Erick
On 10/7/07, Developer Developer <devquestions gmail.com> wrote:
>
> Hello,
>
> I have simple lucene 2.2 index created. I want to list
all the terms and
> their positions in a document. how can I do it ?
>
> Can you please provide some sample code.
>
> Thanks !
>
|
|
| Re: Lucene newbee quesiton- Term
Positions |
  Sweden |
2007-10-07 11:37:17 |
7 okt 2007 kl. 18.38 skrev Erick Erickson:
> I suspect that this is more work than you think, not to
mention
> very slow. This is just due to the nature of an
inverted
> index....
>
> To see what I mean, get a copy of Luke and have it
> reconstruct one of your documents and you'll see what
the
> performance is like.
Also, I recently posted this transparent code for
TermVectorMapper
that will build the term vector space model if it was not
cached
(Field.TermVector.NO):
https://issues.apache.org/jira/secure/attachment/12366
959/
LUCENE-1016.txt
--
karl
>
> I think Luke has all the example code you could ask
for, that's
> the place I'd look first. See:
> http://lucene.apache.org/java/docs/contributions.html
>
> Why do you want to do this and is it really necessary?
You
> could think about storing the entire document, then
when you
> needed to count terms, just using one of the tokenizers
and
> counting them yourself....
>
> Best
> Erick
>
> On 10/7/07, Developer Developer <devquestions gmail.com> wrote:
>>
>> Hello,
>>
>> I have simple lucene 2.2 index created. I want to
list all the
>> terms and
>> their positions in a document. how can I do it ?
>>
>> Can you please provide some sample code.
>>
>> Thanks !
>>
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|
|
| Re: Lucene newbee quesiton- Term
Positions |

|
2007-10-07 11:48:50 |
Hi Eric,
Thanks for the quick reply. My index does not return any
hits when i
search for certain phrases . I am very sure that the indexed
documents does
have those phrases in them.
Therefore i want to just list all the terms and their
postions for given
document just to make sure that the indexed document does
have those terms
indexed in the correct order.
I did check with luke and came up with the following code
that does not seem
to be working !!. positions.next()) returns flase !. Do you
see anything
wrong in this code?
Directory dir = FSDirectory.getDirectory(args[0]);
IndexReader reader = IndexReader.open(dir);
TermPositions positions = reader.termPositions();
while(positions.next())
{
positions.nextPosition();
positions.nextPosition();
byte b[] = positions.getPayload(null, 0);
System.out.println(b);
}
On 10/7/07, Erick Erickson <erickerickson gmail.com> wrote:
>
> I suspect that this is more work than you think, not to
mention
> very slow. This is just due to the nature of an
inverted
> index....
>
> To see what I mean, get a copy of Luke and have it
> reconstruct one of your documents and you'll see what
the
> performance is like.
>
> I think Luke has all the example code you could ask
for, that's
> the place I'd look first. See:
> http://lucene.apache.org/java/docs/contributions.html
>
> Why do you want to do this and is it really necessary?
You
> could think about storing the entire document, then
when you
> needed to count terms, just using one of the tokenizers
and
> counting them yourself....
>
> Best
> Erick
>
> On 10/7/07, Developer Developer <devquestions gmail.com> wrote:
> >
> > Hello,
> >
> > I have simple lucene 2.2 index created. I want to
list all the terms
> and
> > their positions in a document. how can I do it ?
> >
> > Can you please provide some sample code.
> >
> > Thanks !
> >
>
|
|
| Re: Lucene newbee quesiton- Term
Positions |

|
2007-10-09 08:02:27 |
I certainly applaud your effort to dig in and find out
what's going on!
However, I suspect you'll get farther faster by trying one
of several
tactics:
1> post the indexing code and the searching code in
snippet form. This
kind of issue is usually a problem with analyzers. That
is, perhaps
you're using one analyzer for indexing and a different
one for
searching.
Or you've made a typo in, say, the field name. Or....
Phrases certainly
work for many people <G>.
2> Just let Luke reconstruct the document in question for
you and inspect
the reconstructed document. You can cut-n-paste the
contents of a
field into an editor and just search......
3> back out any complex analyzers you're using and just
go with
something like SimpleAnalyzer. Once that's working,
work up
from there. A unit test and/or small self-contained
program
will work well for you here.
Best
Erick
On 10/7/07, Developer Developer <devquestions gmail.com> wrote:
>
> Hi Eric,
>
> Thanks for the quick reply. My index does not return
any hits when i
> search for certain phrases . I am very sure that the
indexed documents
> does
> have those phrases in them.
>
> Therefore i want to just list all the terms and their
postions for given
> document just to make sure that the indexed document
does have those terms
> indexed in the correct order.
>
> I did check with luke and came up with the following
code that does not
> seem
> to be working !!. positions.next()) returns flase !.
Do you see anything
> wrong in this code?
>
> Directory dir = FSDirectory.getDirectory(args[0]);
> IndexReader reader = IndexReader.open(dir);
> TermPositions positions = reader.termPositions();
>
> while(positions.next())
> {
> positions.nextPosition();
>
> positions.nextPosition();
> byte b[] = positions.getPayload(null, 0);
> System.out.println(b);
> }
>
>
>
>
>
> On 10/7/07, Erick Erickson <erickerickson gmail.com> wrote:
> >
> > I suspect that this is more work than you think,
not to mention
> > very slow. This is just due to the nature of an
inverted
> > index....
> >
> > To see what I mean, get a copy of Luke and have
it
> > reconstruct one of your documents and you'll see
what the
> > performance is like.
> >
> > I think Luke has all the example code you could
ask for, that's
> > the place I'd look first. See:
> > http://lucene.apache.org/java/docs/contributions.html
> >
> > Why do you want to do this and is it really
necessary? You
> > could think about storing the entire document,
then when you
> > needed to count terms, just using one of the
tokenizers and
> > counting them yourself....
> >
> > Best
> > Erick
> >
> > On 10/7/07, Developer Developer
<devquestions gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I have simple lucene 2.2 index created. I
want to list all the terms
> > and
> > > their positions in a document. how can I do
it ?
> > >
> > > Can you please provide some sample code.
> > >
> > > Thanks !
> > >
> >
>
|
|
[1-5]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|