List Info

Thread: Searching subset of documents




Searching subset of documents
user name
2006-06-02 00:11:04

On Jun 1, 2006, at 12:45 PM, Olly Betts wrote:


The place to start is common/postlist.h - this is the base class which
you need to derive from.  Some methods have defaults in that header,
others aren't really relevant for a postlist always used for filtering
- all the weight methods can just give a weight of zero.  If there's
no way to more usefully implement skip_to(), just use a while loop to
call next() if we aren't already at the required position.

Then you "just" need to get this postlist created in the right place
inside Enquire and hooked into the query tree!

For this to worth adding the release tree, I think it would need to be
wrapped in an API akin to that of MatchDecider - an API class which can
be used to implement the customisation you want by subclassing.  But you
don't need that to prove the idea works - you can just jury rig it
inside Enquire.

Hi Olly & the rest of xapian-devel,

Thanks for your help so far.  I'm a stuck on the problem of how to properly expose this new class, called ExternalSourcePostList to public users of the Xapian API.   I've created matcher/externalsourcepostlist.cc and matcher/externalsourcepostlist.h.

It seems the header "postlist.h" isn't installed when Xapian is installed, so there needs to be some housekeeping allow this class to be used but not show its internal bits.  This I'd like a little help with.

I've attached my source files for the new class to this email.  I've kept it really simple, just expecting an array of Xapian::docids to be passed to the constructor, where they are copied and sorted, and the rest of the iterator functions are implemented correctly as far as I can tell.

Thanks again,

Rusty
--
Rusty Conover
InfoGears Inc.


Searching subset of documents
user name
2006-06-02 01:01:13
On Thu, Jun 01, 2006 at 06:11:04PM -0600, Rusty Conover
wrote:
> Thanks for your help so far.  I'm a stuck on the
problem of how to  
> properly expose this new class, called
ExternalSourcePostList to  
> public users of the Xapian API.   I've created
matcher/ 
> externalsourcepostlist.cc and
matcher/externalsourcepostlist.h.
> 
> It seems the header "postlist.h" isn't
installed when Xapian is  
> installed, so there needs to be some housekeeping allow
this class to  
> be used but not show its internal bits.  This I'd like
a little help  
> with.

The PostList class is an internal detail currently, and I
think probably
should stay that way.  We do want to expose a similar
interface to that
white PostList currently has, but we don't need all the
methods of
PostList and it may be unhelpful to limit changes to
PostList's
interface by exposing it directly.

Having had this circulating the recesses of my brain for a
few hours, I
think the best way to fit this into the external API would
probably
look something like this in use (names are the first that
came to mind,
so could no doubt be improved upon):

    class MySQLFilter : public Xapian::ExternalPostingSource
{
	    // any private SQL-related data
	public:
	    // ctor
	    // dtor
	    // size reporting methods
	    // next()
	    // skip_to()
	    // at_end()
	    // optionally weight/max_weight, defaulting to
unweighted
    };

Then:

    Xapian::QueryParser qp;
    // configure qp
    Xapian::Query query = qp.parse_query(query_string);
    Xapian::Query sql_filter(new MySQLFilter(/* some
parameters */));
    query = Xapian::Query(OP_FILTER, query, sql_filter);

And then use query as usual...

> I've attached my source files for the new class to
this email.  I've  
> kept it really simple, just expecting an array of
Xapian::docids to  
> be passed to the constructor, where they are copied and
sorted, and  
> the rest of the iterator functions are implemented
correctly as far  
> as I can tell.

Not a bad way to go for testing, but for real world use
sucking
everything into an array doesn't scale so well, and sorting
scales even
worse.  I'd try to arrange that the ids come out of SQL
sorted, and
stream them through the "MySQLFilter" class. 
The matcher may be able to
terminate early in which case you'll never need the tail
end of the ids
from SQL (sorry if this is obvious - a lot of people don't
seem to
appreciate this trick is even possible!)

Cheers,
    Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Searching subset of documents
user name
2006-06-02 04:42:30
On Jun 1, 2006, at 7:01 PM, Olly Betts wrote:

> On Thu, Jun 01, 2006 at 06:11:04PM -0600, Rusty Conover
wrote:
>> Thanks for your help so far.  I'm a stuck on the
problem of how to
>> properly expose this new class, called
ExternalSourcePostList to
>> public users of the Xapian API.   I've created
matcher/
>> externalsourcepostlist.cc and
matcher/externalsourcepostlist.h.
>>
>> It seems the header "postlist.h" isn't
installed when Xapian is
>> installed, so there needs to be some housekeeping
allow this class to
>> be used but not show its internal bits.  This I'd
like a little help
>> with.
>
> The PostList class is an internal detail currently, and
I think  
> probably
> should stay that way.  We do want to expose a similar
interface to  
> that
> white PostList currently has, but we don't need all
the methods of
> PostList and it may be unhelpful to limit changes to
PostList's
> interface by exposing it directly.
>
> Having had this circulating the recesses of my brain
for a few  
> hours, I
> think the best way to fit this into the external API
would probably
> look something like this in use (names are the first
that came to  
> mind,
> so could no doubt be improved upon):
>
>     class MySQLFilter : public
Xapian::ExternalPostingSource {
> 	    // any private SQL-related data
> 	public:
> 	    // ctor
> 	    // dtor
> 	    // size reporting methods
> 	    // next()
> 	    // skip_to()
> 	    // at_end()
> 	    // optionally weight/max_weight, defaulting to
unweighted
>     };
>

Olly,

What's the right way to specify the inheritance for this
interface so  
that this can be passed to the Query constructor, so far
this is the  
interface I'm envisioning:

class ExternalPostingSource  {
     public:
	/** Decide whether we want this document to be in the mset.
	 */
	virtual ExternalPostingSource *next(Xapian::weight w_min);

	virtual ExternalPostingSource *skip_to(Xapian::docid did,  
Xapian::weight w_min);
	
	virtual bool at_end();

	virtual Xapian::docid get_docid();
	virtual std::string get_description();

	virtual ~ExternalPostingSource();
	virtual ExternalPostingSource();
};

Thanks,

Rusty
--
Rusty Conover
CTO, InfoGears Inc.
Web: http://www.infogears.com
Phone: 406-587-5432




_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Searching subset of documents
user name
2006-06-02 09:15:12
On Fri, Jun 02, 2006 at 02:01:13AM +0100, Olly Betts wrote:

>     class MySQLFilter : public
Xapian::ExternalPostingSource;
> 
>     Xapian::QueryParser qp;
>     // configure qp
>     Xapian::Query query = qp.parse_query(query_string);
>     Xapian::Query sql_filter(new MySQLFilter(/* some
parameters */));

Surely the type will be Xapian::ExternalPostingSource, not
Xapian::Query?

>     query = Xapian::Query(OP_FILTER, query,
sql_filter);

J

-- 
/-----------------------------------------------------------
---------------\
  James Aylett                                              
   xapian.org
  jamestartarus.org                              
uncertaintydivision.org

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Searching subset of documents
user name
2006-06-02 10:34:38
On Jun 2, 2006, at 3:15 AM, James Aylett wrote:

> On Fri, Jun 02, 2006 at 02:01:13AM +0100, Olly Betts
wrote:
>
>>     class MySQLFilter : public
Xapian::ExternalPostingSource;
>>
>>     Xapian::QueryParser qp;
>>     // configure qp
>>     Xapian::Query query =
qp.parse_query(query_string);
>>     Xapian::Query sql_filter(new MySQLFilter(/*
some parameters */));
>
> Surely the type will be Xapian::ExternalPostingSource,
not  
> Xapian::Query?
>
>>     query = Xapian::Query(OP_FILTER, query,
sql_filter);

Olly and others,

Please disregard my previous email, after some hours of
hacking I got  
it all working, I think. Including a nice wrapper in perl
with  
Search::Xapian.  I'll try and clean it up and send a diff
later today  
after I finish testing it for a while.

I did have to hack the Xapian::Query object to add a new OP
type of  
OP_EXTERNAL_POST_LIST, and store a reference to the
implementation of  
the ExternalPostingSource interface in the Internal class. 
Then some  
changes in localmatch.cc, to have postlist_from_query just
create a  
new instance of ExternalSourcePostList with the reference to
the  
implementation provided to the Query constructor.

Thanks for all of your help,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com




_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Searching subset of documents
user name
2006-06-02 13:43:15
On Fri, Jun 02, 2006 at 04:34:38AM -0600, Rusty Conover
wrote:
> 
> On Jun 2, 2006, at 3:15 AM, James Aylett wrote:
> 
> >On Fri, Jun 02, 2006 at 02:01:13AM +0100, Olly
Betts wrote:
> >
> >>    class MySQLFilter : public
Xapian::ExternalPostingSource;
> >>
> >>    Xapian::QueryParser qp;
> >>    // configure qp
> >>    Xapian::Query query =
qp.parse_query(query_string);
> >>    Xapian::Query sql_filter(new MySQLFilter(/*
some parameters */));
> >
> >Surely the type will be
Xapian::ExternalPostingSource, not
> >Xapian::Query?

No, not if Xapian::Query has a constructor such as:

Xapian::Query::Query(Xapian::ExternalPostingSource *);

> Please disregard my previous email, after some hours of
hacking I got
> it all working, I think. Including a nice wrapper in
perl with
> Search::Xapian.  I'll try and clean it up and send a
diff later today
> after I finish testing it for a while.

Cool.

> I did have to hack the Xapian::Query object to add a
new OP type of
> OP_EXTERNAL_POST_LIST, and store a reference to the
implementation of
> the ExternalPostingSource interface in the Internal
class.  Then some
> changes in localmatch.cc, to have postlist_from_query
just create a
> new instance of ExternalSourcePostList with the
reference to the
> implementation provided to the Query constructor.

That sounds about right.  I suspect a new OP_* code isn't
necessarily
required since you could just look if the
ExternalPostingSource pointer
is non-NULL, but if it's cleaner to have a new OP_* code I
doubt it's
a problem.

Cheers,
    Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Searching subset of documents
user name
2006-06-02 14:01:51
On Fri, Jun 02, 2006 at 02:43:15PM +0100, Olly Betts wrote:

> > >Surely the type will be
Xapian::ExternalPostingSource, not
> > >Xapian::Query?
> 
> No, not if Xapian::Query has a constructor such as:
> 
> Xapian::Query::Query(Xapian::ExternalPostingSource *);

Aha - I've suddenly seen what you're thinking of
internally, which was
different to what I'd imagined. Cool 

J

-- 
/-----------------------------------------------------------
---------------\
  James Aylett                                              
   xapian.org
  jamestartarus.org                              
uncertaintydivision.org

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )