List Info

Thread: API for subclassing Scorer (was adding a proximity scorer)




API for subclassing Scorer (was adding a proximity scorer)
country flaguser name
United States
2007-06-16 16:31:46
On Jun 16, 2007, at 12:05 AM, Nathan Kurz wrote:

>> Ideally, our discussion will result in an
improvement upon that
>> scheme that will allow you to write your ORScorer
subclass without
>> touching BoilerPlater.  Something like this:
>>
>>    package MyORScorer;
>>    use base qw( KinoSearch::Search::ORScorer );
>>
>>    __PACKAGE__->register_c_method( tally =>
'my_tally' );
>>
>>    use Inline => C << 'END_C';
>>
>>    kino_Tally*
>>    my_tally(kino_OrScorer *self) {
>>        /* ... */
>>    }
>>
>>    END_C
> That seems like a great goal.  For now I'm happy
writing C.

OK, check.  But I want to make it easier for you to maintain
your  
Scorer subclass, and I want to make it easier for other
people to  
write them.

> Perhaps
> more useful for most people would be the ability to
override a
> BoilerPlated C method with a Perl function, with it
automatically
> wrapped in just enough C to push the args.

Yes, I absolutely agree we should do that.

In this particular case, adding Perl's function call
overhead to  
Scorer_Tally() would be a disaster for search-time
performance,  
because it's inner loop code.  But that's not true
everywhere, and  
for rapid prototyping taking the performance hit would be
acceptable.

> You aren't already doing this anywhere, are you?

No, but the prime candidate would be
Similarity->length_norm.

> Personally, though, I'd probably rather see a greater
split between
> the Perl and the C. I love them both individually, but
I'd be more
> comfortable with a standard C library (libidf?) with a
Perl wrapper
> and a clearly defined boundary.

This is clearly the direction that KS is headed.

   ...  ...

Let's design the ideal API for subclassing Scorer, then work
 
backwards to implement it and see how close we can get.

   * It should be possible to implement a Scorer class
entirely in
     Perl and have KS use it.  (Schema and FieldSpec sort of
work
     this way.)
   * It should be possible to override individual methods
used by
     a Scorer implemented in C with wrapped Perl
subroutines.
   * It should be possible to override individual methods
used by
     a Scorer implemented in C with C functions, as in the
code
     block at the top of this post.  (This is fairly easy.)
   * It should be possible to add additional Perl member
variables
     to a Scorer implemented in C.
   * It should be possible to add additional C member
variables to
     a Scorer implemented in C.
   * It _must_ be possible to upgrade KS without
encountering binary
     compatibility problems such as reordered vtables or
object
     structs.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/




_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: API for subclassing Scorer (was adding a proximity scorer)
user name
2007-06-17 04:08:01
On 6/16/07, Marvin Humphrey <marvinrectangular.com> wrote:
> > Personally, though, I'd probably rather see a
greater split between
> > the Perl and the C. I love them both individually,
but I'd be more
> > comfortable with a standard C library (libidf?)
with a Perl wrapper
> > and a clearly defined boundary.
>
> This is clearly the direction that KS is headed.

>From the outside, I'm not sure that this is clear.
Currently, the C
code (which I take to be proto-Lucy) seems very intimately
tied to the
KinoSearch (and Lucene, and presumably Ferret) class
hierarchies, and
the boundaries between the layers seem pretty malleable. Not
that this
is a direction you want to go, but I'd be more comfortable
with a
standard procedural C (hard to override) library with
bindings that
allow the object hierarchy to be created in Perl or
whatever.

Without prejudice, I can see why you've taken the route you
have, but
I'd hesitate to call it standard.   I think a worthwhile
question is
to ask whether an outsider considering implementing
full-text-search
in another language would find it advantageous to link to
your library
rather than implementing just the parts they felt they
needed.  For
example (in a direction I've considered) if I were designing
a search
component using an Apache module done purely in C, would I
link to
this?

> Let's design the ideal API for subclassing Scorer, then
work
> backwards to implement it and see how close we can
get.

Probably only semantics, but I'd start by defining the
problem a
little differently:  the goal is to allow someone to easily
change the
way in which scoring happens.  Subclassing the existing
Scorer is one
way to do this, but making it the scoring procedure simple
and clear
enough that they can implement their own Scorer should be a
priority.
And making it possible to change the operation of an
existing class
without subclassing is nice too.  That said...

>    * It should be possible to implement a Scorer class
entirely in
>      Perl and have KS use it.  (Schema and FieldSpec
sort of work
>      this way.)

Yes, that would be useful.  Even having examples of the code
done in
Perl would be useful to make it easier to understand whats
happening.
If the default KinoSearch could be those Perl examples
selectively
overriden with C using the same mechanism that a user would
use to
customize, that would be fantastic.

>    * It should be possible to override individual
methods used by
>      a Scorer implemented in C with wrapped Perl
subroutines.

This would be impressive.  I'd agree this would be ideal,
but I'd be
willing to make this a lower priority --- the kind of thing
one
designs well enough to make possible in the future but
doesn't
implement right now.  Are there examples of this in other
software
that could be used as a pattern?

>    * It should be possible to override individual
methods used by
>      a Scorer implemented in C with C functions, as in
the code
>      block at the top of this post.  (This is fairly
easy.)

Yes, this seems like appropriate fruit.  In addition to the
inline
approach, I'd like to see it possible to load an external
shared
library and use a method in that. If possible, I'd also like
to see it
possible to  override the method directly in the base class
(or
perhaps one instance of it), rather than only in the
subclass.

Currently, it's often difficult to get your subclass to be
actually
used.  Thus I'd also like the code to avoid hardcoded
constructors,
and provide a similar override mechanism to call your custom
subclass
constructor:
$Kinosearch::Search::BooleanScorer->override(newORScorer,


                     'MyORScorer_new')
Which is to say, hardcoded constructors should become class
methods.

>    * It should be possible to add additional Perl
member variables
>      to a Scorer implemented in C.
>    * It should be possible to add additional C member
variables to
>      a Scorer implemented in C.

I can see why you are interested in an inside-out object
model.  I
wasn't familiar with it before you mentioned it.  I can see
why it's
appealing, but it's still too new for me to evaluate.  At
first
glance, this seems like it would be complex.

>    * It _must_ be possible to upgrade KS without
encountering binary
>      compatibility problems such as reordered vtables
or object
>      structs.

I'm sure you've thought about this part much more than I
have.  Do you
mean that it must be possible to upgrade the Perl portion
only while
leaving the C portion untouched?  Or vice versa?  Or both? 
(perhaps
this is obvious --- I'm getting tired)

Have a good night,

Nathan Kurz
nateverse.com

_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )