List Info

Thread: Created: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause




Created: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
user name
2006-04-04 10:33:43
Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT
clause
------------------------------------------------------------
-------

         Key: LUCENE-538
         URL: http:
//issues.apache.org/jira/browse/LUCENE-538
     Project: Lucene - Java
        Type: Bug

  Components: Search  
    Versions: 1.9    
 Environment: Ubuntu Linux, java version 1.5.0_04
    Reporter: Helen Warren
 Attachments: TestMultiSearchWildCard.java

We are searching across multiple indices using a
MultiSearcher. There seems to be a problem when we use a
WildcardQuery to exclude documents from the result set. I
attach a set of unit tests illustrating the problem.

In these tests, we have two indices. Each index contains a
set of documents with fields for 'title',  'section' and
'index'. The final aim is to do a keyword search, across
both indices, on the title field and be able to exclude
documents from certain sections (and their subsections)
using a
WildcardQuery on the section field.
 
 e.g. return documents from both indices which have the
string 'xyzpqr' in their title but which do not lie
 in the news section or its subsections (section = /news/*).
 
The first unit test (testExcludeSectionsWildCard) fails
trying to do this.
 If we relax any of the constraints made above, tests pass:
 
* Don't use WildcardQuery, but pass in the news section and
it's child section to exclude explicitly
(testExcludeSectionsExplicit)</li>
* Exclude results from just one section, not it's children
too i.e. don't use
WildcardQuery(testExcludeSingleSection)</li>
* Do use WildcardQuery, and exclude a section and its
children, but just use one index thereby using the simple
   IndexReader and IndexSearcher objects
(testExcludeSectionsOneIndex).
* Try the boolean MUST clause rather than MUST_NOT using the
WildcardQuery i.e. only include results from the /news/
section
   and its children.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see:
   http://www.atl
assian.com/software/jira


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

Commented: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
user name
2006-04-04 20:54:43
    [ http://issues.apache.org/jira/brows
e/LUCENE-538?page=comments#action_12373180 ] 

paul.elschot commented on LUCENE-538:
-------------------------------------

With this code in doSearch():

		System.err.println("Executing query:
"+overallQuery);
		Query qrw = overallQuery.rewrite(reader);
		System.err.println("rewritten      : "+qrw);
		Hits results = searcher.search(qrw);

the test passes.

During searcher.search(), the query is once more rewritten,
under the covers.
I don't know why rewriting the overallQuery twice does not
work, this may
be a bug.

Anyway, there should be no need to rewrite it explicitly.

For convenience, I put the test in package
org.apache.lucene.search,
so I could run the  test by:
ant -Dtestcase=TestMultiSearchWildCard test

Regards,
Paul Elschot


> Using WildcardQuery with MultiSearcher, and Boolean
MUST_NOT clause
>
------------------------------------------------------------
-------
>
>          Key: LUCENE-538
>          URL: http:
//issues.apache.org/jira/browse/LUCENE-538
>      Project: Lucene - Java
>         Type: Bug

>   Components: Search
>     Versions: 1.9
>  Environment: Ubuntu Linux, java version 1.5.0_04
>     Reporter: Helen Warren
>  Attachments: TestMultiSearchWildCard.java
>
> We are searching across multiple indices using a
MultiSearcher. There seems to be a problem when we use a
WildcardQuery to exclude documents from the result set. I
attach a set of unit tests illustrating the problem.
> In these tests, we have two indices. Each index
contains a set of documents with fields for 'title', 
'section' and 'index'. The final aim is to do a keyword
search, across both indices, on the title field and be able
to exclude documents from certain sections (and their
subsections) using a
> WildcardQuery on the section field.
>  
>  e.g. return documents from both indices which have the
string 'xyzpqr' in their title but which do not lie
>  in the news section or its subsections (section =
/news/*).
>  
> The first unit test (testExcludeSectionsWildCard) fails
trying to do this.
>  If we relax any of the constraints made above, tests
pass:
>  
> * Don't use WildcardQuery, but pass in the news
section and it's child section to exclude explicitly
(testExcludeSectionsExplicit)</li>
> * Exclude results from just one section, not it's
children too i.e. don't use
WildcardQuery(testExcludeSingleSection)</li>
> * Do use WildcardQuery, and exclude a section and its
children, but just use one index thereby using the simple
>    IndexReader and IndexSearcher objects
(testExcludeSectionsOneIndex).
> * Try the boolean MUST clause rather than MUST_NOT
using the WildcardQuery i.e. only include results from the
/news/ section
>    and its children.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see:
   http://www.atl
assian.com/software/jira


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

Commented: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
user name
2006-11-21 19:17:03
    [ http://issues.apache.org/jira/brows
e/LUCENE-538?page=comments#action_12451744 ] 
            
Michael Busch commented on LUCENE-538:
--------------------------------------

The reason for this problem is how the MultiSearcher
rewrites queries. It calls rewrite() on all Searchables and
combines the rewritten queries thereafter. 

And here is the bug: 
Lets say we have the query +a -b* and two Searchables. The
dictionary of the first Searchable's index has two
expansions for b*, so calling rewrite on the first
Searchable results in the query +a -(b1 b2). However the
dictionary of the second Searchable's index does not have
any expansions, so the second rewritten query is +a -(). To
combine these two queries the MultiSearcher now creates a
new BooleanQuery and adds both rewritten queries as SHOULD
clauses, so the combined query looks like: (+a -(b1 b2)) (+a
-()). This query is used to search in both indexes. So now
all documents that contain 'a' are found, because the
negative clause within the second SHOULD clause is empty.
Thats why too many results from the first index are
returned, the -b* has no effect at all anymore.

The workaround Paul suggested works, because it calls
rewrite on MultiReader instead MultiSearcher. Then the b* is
expanded using the merged dictionaries from both indexes. So
this workaround simply hides the problem in MultiSearcher.

> Using WildcardQuery with MultiSearcher, and Boolean
MUST_NOT clause
>
------------------------------------------------------------
-------
>
>                 Key: LUCENE-538
>                 URL: http:
//issues.apache.org/jira/browse/LUCENE-538
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Ubuntu Linux, java version
1.5.0_04
>            Reporter: Helen Warren
>         Attachments: TestMultiSearchWildCard.java
>
>
> We are searching across multiple indices using a
MultiSearcher. There seems to be a problem when we use a
WildcardQuery to exclude documents from the result set. I
attach a set of unit tests illustrating the problem.
> In these tests, we have two indices. Each index
contains a set of documents with fields for 'title', 
'section' and 'index'. The final aim is to do a keyword
search, across both indices, on the title field and be able
to exclude documents from certain sections (and their
subsections) using a
> WildcardQuery on the section field.
>  
>  e.g. return documents from both indices which have the
string 'xyzpqr' in their title but which do not lie
>  in the news section or its subsections (section =
/news/*).
>  
> The first unit test (testExcludeSectionsWildCard) fails
trying to do this.
>  If we relax any of the constraints made above, tests
pass:
>  
> * Don't use WildcardQuery, but pass in the news section
and it's child section to exclude explicitly
(testExcludeSectionsExplicit)</li>
> * Exclude results from just one section, not it's
children too i.e. don't use
WildcardQuery(testExcludeSingleSection)</li>
> * Do use WildcardQuery, and exclude a section and its
children, but just use one index thereby using the simple
>    IndexReader and IndexSearcher objects
(testExcludeSectionsOneIndex).
> * Try the boolean MUST clause rather than MUST_NOT
using the WildcardQuery i.e. only include results from the
/news/ section
>    and its children.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )