List Info

Thread: Re: passing positions




Re: passing positions
country flaguser name
United States
2007-09-07 02:53:25
On Sep 6, 2007, at 12:17 AM, Nathan Kurz wrote:

>> One of your inventions is Scorer_Advance.  I like
it as a substitute
>> for Scorer_Next, and it might be worth a global
search and replace
>> since that method isn't public yet.   However,
in your code it
>> appears to be a substitute for Scorer_Skip_To.
>
> I'm hoping to collapse those two down to a single
function.

That would be very nice.  I tried to pull that off, but I
ran into  
some problem.  I don't remember what it was, though.  :(

> Yes, I think that PhraseScorer should use a subscorer
and not  
> PostingLists.
> That said, it may be simpler to restrict complexity of
that subscorer
> at least temporarily so that we don't have to start
with a fully
> recursive phrase scorer.
>
> Something like allowing:
> PhraseScorer -> AndScorer -> [TermScorer
TermScorer TermScorer]

Good plan.  Can we do that now, in isolation from the rest
of the  
changes?

>> I think similar reasoning led you to Match and me
to Tally.
>
> Well, that and the hope that if I paralleled Match and
Tally you'd
> like the idea better .

Heh.

>>> The trickiness (and I don't like trickiness) is
that each Match is
>>> allowed to contain either an array of
positions, or an array of  
>>> Match
>>> structs:
>>
>> I doubt that's necessary.  Just create a default
wrapper at the
>> lowest level.  That's how TermScorer does things
presently.
>
> I fear the trickiness is still necessary at some level,
but I think
> I've managed to hide it in a place you'll like better. 
Essentially,
> I'm going to propose two main subclasses for Scorer,
MultiScorer and
> MatchScorer.  MultiScorer's contain a public VArray of
other Scorer's,
> while MatchScorer's contain a public Match struct.

Interesting. Do you end up with more subscorers than
before?

>> This variable name violates my "avoid overload
overload" rule. 
>> "field" has a very specific meaning in
the context of KS and this
>> isn't it.
>
> I agree with you in general, but I thought this was the
specific
> meaning.   It's removed from Match in my new
incarnation, but would
> would you prefer it to be called:  'index_field',
'field_num'?

field_num.

"field", when it's used at all, means "field
name".  It used to mean  
a Field object -- before I killed that class -- and that's
still the  
place it holds in concept-space.

>> This was the driving factor behind the ScoreProx
class.
>
> I've forgotten the details, but I came to the
conclusion that
> ScoreProx was at odds with Rich Positions, and that to
allow a
> Proximity type scorer to use Positions specific weights
some wider
> interface was needed.

I'm having trouble visualizing this.  I wish there was a way
to  
divide and conquer this problem more effectively.

>> Collation of positions gets complicated when these
scorers are  
>> nested.
>
> It's possible we are defining terms differently here,
but my current
> plan is that there never will be any collation.  
Instead, the
> MultiScorer's (AndScorer, OrScorer) will allow their
children's Match
> structs to be accessed directly.

I think you need collation for the PhraseScorer.  Say you're
 
iterating over positions in several subscorers.  You have
position 35  
and 36; now you need  37. If you haven't kept track of where
each  
subscorer is at, you'll have to start from scratch with each
one.    
If you don't, and the subscorer has multiple subscorers
itself, you  
might miss something.

> I tried to pursue collation at one
> point, and gave up: positions from multiple fields,
phrases of
> different lengths.

Yes, this is the same problem that thwarted me in my first
go.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: passing positions
user name
2007-09-07 22:24:58
On 9/7/07, Marvin Humphrey <marvinrectangular.com> wrote:
> > Something like allowing:
> > PhraseScorer -> AndScorer -> [TermScorer
TermScorer TermScorer]
>
> Good plan.  Can we do that now, in isolation from the
rest of the
> changes?

It's possible with your greater familiarity with the current
code that
you could do so, but I haven't found a comfortable way to do
it.

> > I'm going to propose two main subclasses for
Scorer, MultiScorer and
> > MatchScorer.  MultiScorer's contain a public
VArray of other Scorer's,
> > while MatchScorer's contain a public Match
struct.
>
> Interesting. Do you end up with more subscorers than
before?

I'm not done yet, but I think it will end up the same or
fewer than
before.   The parent classes are new, but ANDOR and ANDNOT
go away,
replaced by simple combinations of And, Or, and Not.  Not is
new (and
not yet done), but BooleanScorer goes.  More importantly, I
think
things like PhraseScorer and its unborn ilk will be simpler
as they
won't have to duplicate the low-level work.

> > It's removed from Match in my new incarnation, but
would
> > would you prefer it to be called:  'index_field',
'field_num'?
>
> field_num.

Agreed to be better, and changed.

> "field", when it's used at all, means
"field name".  It used to mean
> a Field object -- before I killed that class -- and
that's still the
> place it holds in concept-space.

At one point I asked my grandfather about some directions he
gave me
based on "the road where the bridge is out", and
was suprised to learn
he'd never actually seen the bridge in the half-century he'd
been in
that area, and that it must have washed out before he was
born.
That was just how people referred to that road.  

> I'm having trouble visualizing this.  I wish there was
a way to
> divide and conquer this problem more effectively.

I haven't found it yet.  I think it might be possible,
though, at
least in pieces.  I'm hoping to get it working independent
from the
existing code, and then work with you to integrate it. 
Hopefully once
I have a working model (ie, once I have the ocean at a
rolling boil),
the ways it can be incrementally incorporated will become
clearer.

> I think you need collation for the PhraseScorer.  Say
you're
> iterating over positions in several subscorers.  You
have position 35
> and 36; now you need  37. If you haven't kept track of
where each
> subscorer is at, you'll have to start from scratch with
each one.
> If you don't, and the subscorer has multiple subscorers
itself, you
> might miss something.

We might just be defining 'collation' differently.  I agree
that one
needs to keep track the current position within each Scorer,
but I
think this can be done with a pointer rather than a copy. 
I'll fire
off another message with my current version of PhraseScorer
so you can
see what I'm doing.  Likely we mean the same thing but are
just
describing it differently.

Nathan Kurz
nateverse.com

_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )