List Info

Thread: Composition of NEAR and OR




Composition of NEAR and OR
user name
2006-11-15 10:50:13
The following piece of code triggers an 'unimplemented'
exception with the
message:
 "Can't use NEAR/PHRASE with a subexpression containing
NEAR or PHRASE"

      Xapian::Query or1(Xapian::Query::OP_OR, 
		    Xapian::Query("one"), 
		    Xapian::Query("two"));
      Xapian::Query or2(Xapian::Query::OP_OR, 
		    Xapian::Query("three"), 
		    Xapian::Query("four"));
      Xapian::Query near(Xapian::Query::OP_NEAR, or1, or2);

I can't decide by looking at the code in omqueryinternal.cc
if this is
intentional or not. In debug mode, it does trigger the NEAR
or PHRASE
assertion at the top of flatten_subqs(), which gets called
at some point
for the query: 
    ((one NEAR 2 three) OR (one NEAR 2 four))

which does not seem right or needed, 

Is this "(x or y) near (z or t)" query supposed to
work or not ? I'm
willing to try and fix it if it should work, but this area
of the xapian
code certainly does not suffer from an excess of comments
...

Regards,

J.F. Dockes


_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Composition of NEAR and OR
user name
2006-11-15 18:28:15
On Wed, Nov 15, 2006 at 11:50:13AM +0100, Jean-Francois
Dockes wrote:
> The following piece of code triggers an 'unimplemented'
exception with the
> message:
>  "Can't use NEAR/PHRASE with a subexpression
containing NEAR or PHRASE"
> 
>       Xapian::Query or1(Xapian::Query::OP_OR, 
> 		    Xapian::Query("one"), 
> 		    Xapian::Query("two"));
>       Xapian::Query or2(Xapian::Query::OP_OR, 
> 		    Xapian::Query("three"), 
> 		    Xapian::Query("four"));
>       Xapian::Query near(Xapian::Query::OP_NEAR, or1,
or2);
> 
> I can't decide by looking at the code in
omqueryinternal.cc if this is
> intentional or not.

It looks like it will flatten "(one OR two) NEAR
three", but not with
an OR subquery on either side.

Looking at the history of this code, it's been essentially
the same
since revision 3194 (over 5 years ago) when Richard created
this file.
It looks like this was mostly restructuring, and there's
similar code
in omquery.cc (but not in its own method) prior to this, but
I think
that has the same behaviour as currently.  Looks like I
originally
wrote it (over 6 years ago!)

I'm not sure this flattening is really the best way to
handle this -
fixing NearPostList to handle non-LeafPostLists would be
more efficient
I think.  I think all that really needs is a PositionList
subclass which
can return (in order) all the positions in any of a list of
PositionLists, which isn't too hard.

> In debug mode, it does trigger the NEAR or PHRASE
> assertion at the top of flatten_subqs(), which gets
called at some point
> for the query: 
>     ((one NEAR 2 three) OR (one NEAR 2 four))
> 
> which does not seem right or needed, 

That must be from a recursive call, since the only
non-recursive call
only happens for NEAR or PHRASE.

> Is this "(x or y) near (z or t)" query
supposed to work or not ? I'm
> willing to try and fix it if it should work, but this
area of the xapian
> code certainly does not suffer from an excess of
comments ...

I think it should be supported, but whether the current code
is meant to
support it I'm less clear about!  I suspect we might have
decided to
handle the easier case with a leaf query on one side of the
NEAR/PHRASE
"for now".

However, you could try just returning from flatten_subqs if
op isn't
OP_NEAR or OP_PHRASE and see if that does the job.  Calling
"get_description" on the restructured query should
show if it worked
or not.

Cheers,
    Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Composition of NEAR and OR
user name
2006-11-15 20:24:55
Olly Betts writes:
 > I'm not sure this flattening is really the best way to
handle this -
 > fixing NearPostList to handle non-LeafPostLists would
be more efficient
 > I think.  I think all that really needs is a
PositionList subclass which
 > can return (in order) all the positions in any of a
list of
 > PositionLists, which isn't too hard.

About the NearPostList code, I've tried to read
phrasepostlist.cc, and
there is one thing at least which I don't understand, which
is
how/when/whether the 'terms' postlists get positionned to
the right
document before read_position_list() is called for each (in
phrasepostlist.cc, NearPostList::test_doc()). Which tends to
indicate that
I'm not quite ready to write a new PositionList subclass...

 > However, you could try just returning from
flatten_subqs if op isn't
 > OP_NEAR or OP_PHRASE and see if that does the job. 
Calling
 > "get_description" on the restructured query
should show if it worked
 > or not.

I'll try this tomorrow (and have another pass at trying to
understand the
code now that I know it's not explicitely designed not to
support this).

Thanks.
J.F. Dockes

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Composition of NEAR and OR
user name
2006-11-15 20:53:28
On Wed, Nov 15, 2006 at 09:24:55PM +0100, Jean-Francois
Dockes wrote:
> Olly Betts writes:
>  > I'm not sure this flattening is really the best
way to handle this -
>  > fixing NearPostList to handle non-LeafPostLists
would be more efficient
>  > I think.  I think all that really needs is a
PositionList subclass which
>  > can return (in order) all the positions in any of
a list of
>  > PositionLists, which isn't too hard.
> 
> About the NearPostList code, I've tried to read
phrasepostlist.cc, and
> there is one thing at least which I don't understand,
which is
> how/when/whether the 'terms' postlists get positionned
to the right
> document before read_position_list() is called for each
(in
> phrasepostlist.cc, NearPostList::test_doc()).

What happens is that the PostList is positioned on each
document which
matches an AND query, and then test_doc() is called.  See
SelectPostList
(parent class of NearPostList and PhrasePostList) for where
this
happens.

>  > However, you could try just returning from
flatten_subqs if op isn't
>  > OP_NEAR or OP_PHRASE and see if that does the
job.  Calling
>  > "get_description" on the restructured
query should show if it worked
>  > or not.
> 
> I'll try this tomorrow (and have another pass at trying
to understand the
> code now that I know it's not explicitely designed not
to support this).

Cool.

Cheers,
    Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Composition of NEAR and OR
user name
2006-11-16 12:38:22
Olly Betts writes:
 > What happens is that the PostList is positioned on
each document which
 > matches an AND query, and then test_doc() is called. 
See SelectPostList
 > (parent class of NearPostList and PhrasePostList) for
where this
 > happens.

Because the "source" and "terms"
postlists are references to an AndPostlist
and its components, the "terms" lists get
positionned automagically when
next() is called on source ? Or what ? 


I think that I have fixed the NEAR distribution code. It's
probably not
optimal but it seems to work for me.

The trick as I see it was that flatten_subqs() must not be
called
resursively on the object itself *which is not a NEAR query*
anymore after
the first transformation. 

flatten_subqs() is called on each of the subqueries instead,
after the
transformation. 

I have a patch against the 0.9 svn branch which is mostly
comments. I also
left in place the two tracing fprintf() calls (which you'll
probably want to
remove before possibly committing), just in case you want to
see the thing
in action.

There is also a small program with a few test cases.

Here are both links (can't remember if this list accepts
attachments...):
 htt
p://www.recoll.org/xapian/xapNearDistrib.patch
 http:
//www.recoll.org/xapian/xapNearDistrib.cpp

Regards,
jf


_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Composition of NEAR and OR
user name
2006-11-16 16:47:51
On Thu, Nov 16, 2006 at 01:38:22PM +0100, Jean-Francois
Dockes wrote:
> Olly Betts writes:
>  > What happens is that the PostList is positioned
on each document which
>  > matches an AND query, and then test_doc() is
called.  See SelectPostList
>  > (parent class of NearPostList and PhrasePostList)
for where this
>  > happens.
> 
> Because the "source" and "terms"
postlists are references to an AndPostlist
> and its components, the "terms" lists get
positionned automagically when
> next() is called on source ? Or what ? 

Oh, I see why you are confused!

The terms vector is set up when the
NearPostList/PhrasePostList is
constructed - it simply contains the pointers to *the same*
PostLists
which the AndPostList tree uses, but in the original query
order
(AndPostList is reordered so that the least frequent terms
are checked
first, as that will generally minimise the work done).  So
when the
AndPostList is advanced, all the PostLists in terms are
because they're
just the same PostLists!

> The trick as I see it was that flatten_subqs() must not
be called
> resursively on the object itself *which is not a NEAR
query* anymore after
> the first transformation. 
> 
> flatten_subqs() is called on each of the subqueries
instead, after the
> transformation. 

That sounds about right.

> Here are both links (can't remember if this list
accepts attachments...):
>  htt
p://www.recoll.org/xapian/xapNearDistrib.patch
>  http:
//www.recoll.org/xapian/xapNearDistrib.cpp

Thanks, I'll take a look.

I think text attachments are currently accepted, though for
non-trivial
patches I've started to put them on the web and post a link
instead.
Partly to avoid filling up subscribers mail boxes, and also
because it
can be tricky to get a true copy of the patch from web-based
list
archives (e.g. spam-protect email addresses can modify a
patch).

Cheers,
    Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )