On Jul 18, 2007, at 6:27 PM, Nathan Kurz wrote:
> I've rearranged my responses to emphasize our agreement
.
Well done! ;)
>> Lastly, Posting and PostingList also happen to
align well with IR
>> theory, making for what seems to me is a more
coherent conceptual OO
>> model than Lucene's TermDocs/TermPositions.
>
> Yes, I agree. I'm not intimately familiar with
Lucene's model apart
> from via yours, but Posting and PostingList inherently
make sense.
In the TermDocs/TermPositions model, the traits are added to
the
iterator itself.
while (termDocs.next()) {
system.out.println("DOC: " +
termDocs.doc());
system.out.println("FREQ: " +
termDocs.freq());
}
while (termPositions.next()) {
system.out.println("DOC: " +
termPositions.doc());
int freq = termPositions.freq());
system.out.println("FREQ: " + freq);
while (freq--) {
int position = termPositions.nextPosition();
system.out.println("POS: " + position);
if (termPositions.isPayloadAvailable()) {
byte[] payload = termPositions.getPayload(null,
0);
printPayloadSomeHow(payload);
}
}
}
There isn't an object which represents a posting.
Another significant difference is that Lucene iterates over
positions
one at a time via nextPosition(), while KS loads them all
into memory
at once.
>> * a write method
>> * a read method
>> * a make_scorer method
>> * a TermScorer subclass that overrides
Scorer_Tally
>
> Here's where we separate a little. I'd like to make it
even simpler,
> and require only that it define a read method (and
presumably a write
> method, although I've thought very little about that
side).
Yes, you could do that. Presumably, the subclass would
interpret the
same postings file data differently somehow from the parent
class.
> A new scorer could be defined to make use of new
information in new
> Posting,
> but this would be optional.
You're right. In general that would work, provided that the
subclass
was serious about fulfilling the parent class's interface.
> A subclassed Posting can continue to use
> the Scorer used by its parent. Thus if if ScorePosting
is a
> descendant of MatchPosting, MatchPostingScorer can
call
> ScorePosting->read() and end up with a Posting it
can handle.
I can't think of a reason why this wouldn't work.
Boilerplater
implements single inheritance only, a very limited OO model.
There's
a little trickiness in there -- RichPosting's file format
doesn't
"inherit" from ScorePosting's, for instance...
<doc, freq, shared_boost, <position>+>+
<doc, freq, <position, boost>+>+
... and the generated posting->impact would presumably
differ (that's
the whole point of RichPosting after all). But the C
structs would
be compatible.
>> The intent is that each Posting subclass will have
a fixed
>> association with a corresponding TermScorer
subclass. You're not
>> supposed to be able to override that association
without additional
>> subclassing.
>
> This I don't like. I can see how you got here, but I
think there is
> a better solution: the TermScorers depend only on the
format of the
> Posting struct, and Posting->read() is the sole
point of conversion
> from Index as file to Posting as object.
Well put. You've persuaded me.
Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|