On Sep 6, 2007, at 12:17 AM, Nathan Kurz wrote:
>> One of your inventions is Scorer_Advance. I like
it as a substitute
>> for Scorer_Next, and it might be worth a global
search and replace
>> since that method isn't public yet. However,
in your code it
>> appears to be a substitute for Scorer_Skip_To.
>
> I'm hoping to collapse those two down to a single
function.
That would be very nice. I tried to pull that off, but I
ran into
some problem. I don't remember what it was, though. :(
> Yes, I think that PhraseScorer should use a subscorer
and not
> PostingLists.
> That said, it may be simpler to restrict complexity of
that subscorer
> at least temporarily so that we don't have to start
with a fully
> recursive phrase scorer.
>
> Something like allowing:
> PhraseScorer -> AndScorer -> [TermScorer
TermScorer TermScorer]
Good plan. Can we do that now, in isolation from the rest
of the
changes?
>> I think similar reasoning led you to Match and me
to Tally.
>
> Well, that and the hope that if I paralleled Match and
Tally you'd
> like the idea better .
Heh.
>>> The trickiness (and I don't like trickiness) is
that each Match is
>>> allowed to contain either an array of
positions, or an array of
>>> Match
>>> structs:
>>
>> I doubt that's necessary. Just create a default
wrapper at the
>> lowest level. That's how TermScorer does things
presently.
>
> I fear the trickiness is still necessary at some level,
but I think
> I've managed to hide it in a place you'll like better.
Essentially,
> I'm going to propose two main subclasses for Scorer,
MultiScorer and
> MatchScorer. MultiScorer's contain a public VArray of
other Scorer's,
> while MatchScorer's contain a public Match struct.
Interesting. Do you end up with more subscorers than
before?
>> This variable name violates my "avoid overload
overload" rule.
>> "field" has a very specific meaning in
the context of KS and this
>> isn't it.
>
> I agree with you in general, but I thought this was the
specific
> meaning. It's removed from Match in my new
incarnation, but would
> would you prefer it to be called: 'index_field',
'field_num'?
field_num.
"field", when it's used at all, means "field
name". It used to mean
a Field object -- before I killed that class -- and that's
still the
place it holds in concept-space.
>> This was the driving factor behind the ScoreProx
class.
>
> I've forgotten the details, but I came to the
conclusion that
> ScoreProx was at odds with Rich Positions, and that to
allow a
> Proximity type scorer to use Positions specific weights
some wider
> interface was needed.
I'm having trouble visualizing this. I wish there was a way
to
divide and conquer this problem more effectively.
>> Collation of positions gets complicated when these
scorers are
>> nested.
>
> It's possible we are defining terms differently here,
but my current
> plan is that there never will be any collation.
Instead, the
> MultiScorer's (AndScorer, OrScorer) will allow their
children's Match
> structs to be accessed directly.
I think you need collation for the PhraseScorer. Say you're
iterating over positions in several subscorers. You have
position 35
and 36; now you need 37. If you haven't kept track of where
each
subscorer is at, you'll have to start from scratch with each
one.
If you don't, and the subscorer has multiple subscorers
itself, you
might miss something.
> I tried to pursue collation at one
> point, and gave up: positions from multiple fields,
phrases of
> different lengths.
Yes, this is the same problem that thwarted me in my first
go.
Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|