|
List Info
Thread: Created: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
|
|
| Created: (LUCENE-538) Using
WildcardQuery with MultiSearcher, and
Boolean MUST_NOT clause |

|
2006-04-04 10:33:43 |
Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT
clause
------------------------------------------------------------
-------
Key: LUCENE-538
URL: http:
//issues.apache.org/jira/browse/LUCENE-538
Project: Lucene - Java
Type: Bug
Components: Search
Versions: 1.9
Environment: Ubuntu Linux, java version 1.5.0_04
Reporter: Helen Warren
Attachments: TestMultiSearchWildCard.java
We are searching across multiple indices using a
MultiSearcher. There seems to be a problem when we use a
WildcardQuery to exclude documents from the result set. I
attach a set of unit tests illustrating the problem.
In these tests, we have two indices. Each index contains a
set of documents with fields for 'title', 'section' and
'index'. The final aim is to do a keyword search, across
both indices, on the title field and be able to exclude
documents from certain sections (and their subsections)
using a
WildcardQuery on the section field.
e.g. return documents from both indices which have the
string 'xyzpqr' in their title but which do not lie
in the news section or its subsections (section = /news/*).
The first unit test (testExcludeSectionsWildCard) fails
trying to do this.
If we relax any of the constraints made above, tests pass:
* Don't use WildcardQuery, but pass in the news section and
it's child section to exclude explicitly
(testExcludeSectionsExplicit)</li>
* Exclude results from just one section, not it's children
too i.e. don't use
WildcardQuery(testExcludeSingleSection)</li>
* Do use WildcardQuery, and exclude a section and its
children, but just use one index thereby using the simple
IndexReader and IndexSearcher objects
(testExcludeSectionsOneIndex).
* Try the boolean MUST clause rather than MUST_NOT using the
WildcardQuery i.e. only include results from the /news/
section
and its children.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Commented: (LUCENE-538) Using
WildcardQuery with MultiSearcher, and
Boolean MUST_NOT clause |

|
2006-04-04 20:54:43 |
[ http://issues.apache.org/jira/brows
e/LUCENE-538?page=comments#action_12373180 ]
paul.elschot commented on LUCENE-538:
-------------------------------------
With this code in doSearch():
System.err.println("Executing query:
"+overallQuery);
Query qrw = overallQuery.rewrite(reader);
System.err.println("rewritten : "+qrw);
Hits results = searcher.search(qrw);
the test passes.
During searcher.search(), the query is once more rewritten,
under the covers.
I don't know why rewriting the overallQuery twice does not
work, this may
be a bug.
Anyway, there should be no need to rewrite it explicitly.
For convenience, I put the test in package
org.apache.lucene.search,
so I could run the test by:
ant -Dtestcase=TestMultiSearchWildCard test
Regards,
Paul Elschot
> Using WildcardQuery with MultiSearcher, and Boolean
MUST_NOT clause
>
------------------------------------------------------------
-------
>
> Key: LUCENE-538
> URL: http:
//issues.apache.org/jira/browse/LUCENE-538
> Project: Lucene - Java
> Type: Bug
> Components: Search
> Versions: 1.9
> Environment: Ubuntu Linux, java version 1.5.0_04
> Reporter: Helen Warren
> Attachments: TestMultiSearchWildCard.java
>
> We are searching across multiple indices using a
MultiSearcher. There seems to be a problem when we use a
WildcardQuery to exclude documents from the result set. I
attach a set of unit tests illustrating the problem.
> In these tests, we have two indices. Each index
contains a set of documents with fields for 'title',
'section' and 'index'. The final aim is to do a keyword
search, across both indices, on the title field and be able
to exclude documents from certain sections (and their
subsections) using a
> WildcardQuery on the section field.
>
> e.g. return documents from both indices which have the
string 'xyzpqr' in their title but which do not lie
> in the news section or its subsections (section =
/news/*).
>
> The first unit test (testExcludeSectionsWildCard) fails
trying to do this.
> If we relax any of the constraints made above, tests
pass:
>
> * Don't use WildcardQuery, but pass in the news
section and it's child section to exclude explicitly
(testExcludeSectionsExplicit)</li>
> * Exclude results from just one section, not it's
children too i.e. don't use
WildcardQuery(testExcludeSingleSection)</li>
> * Do use WildcardQuery, and exclude a section and its
children, but just use one index thereby using the simple
> IndexReader and IndexSearcher objects
(testExcludeSectionsOneIndex).
> * Try the boolean MUST clause rather than MUST_NOT
using the WildcardQuery i.e. only include results from the
/news/ section
> and its children.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Commented: (LUCENE-538) Using
WildcardQuery with MultiSearcher, and
Boolean MUST_NOT clause |

|
2006-11-21 19:17:03 |
[ http://issues.apache.org/jira/brows
e/LUCENE-538?page=comments#action_12451744 ]
Michael Busch commented on LUCENE-538:
--------------------------------------
The reason for this problem is how the MultiSearcher
rewrites queries. It calls rewrite() on all Searchables and
combines the rewritten queries thereafter.
And here is the bug:
Lets say we have the query +a -b* and two Searchables. The
dictionary of the first Searchable's index has two
expansions for b*, so calling rewrite on the first
Searchable results in the query +a -(b1 b2). However the
dictionary of the second Searchable's index does not have
any expansions, so the second rewritten query is +a -(). To
combine these two queries the MultiSearcher now creates a
new BooleanQuery and adds both rewritten queries as SHOULD
clauses, so the combined query looks like: (+a -(b1 b2)) (+a
-()). This query is used to search in both indexes. So now
all documents that contain 'a' are found, because the
negative clause within the second SHOULD clause is empty.
Thats why too many results from the first index are
returned, the -b* has no effect at all anymore.
The workaround Paul suggested works, because it calls
rewrite on MultiReader instead MultiSearcher. Then the b* is
expanded using the merged dictionaries from both indexes. So
this workaround simply hides the problem in MultiSearcher.
> Using WildcardQuery with MultiSearcher, and Boolean
MUST_NOT clause
>
------------------------------------------------------------
-------
>
> Key: LUCENE-538
> URL: http:
//issues.apache.org/jira/browse/LUCENE-538
> Project: Lucene - Java
> Issue Type: Bug
> Components: Search
> Affects Versions: 1.9
> Environment: Ubuntu Linux, java version
1.5.0_04
> Reporter: Helen Warren
> Attachments: TestMultiSearchWildCard.java
>
>
> We are searching across multiple indices using a
MultiSearcher. There seems to be a problem when we use a
WildcardQuery to exclude documents from the result set. I
attach a set of unit tests illustrating the problem.
> In these tests, we have two indices. Each index
contains a set of documents with fields for 'title',
'section' and 'index'. The final aim is to do a keyword
search, across both indices, on the title field and be able
to exclude documents from certain sections (and their
subsections) using a
> WildcardQuery on the section field.
>
> e.g. return documents from both indices which have the
string 'xyzpqr' in their title but which do not lie
> in the news section or its subsections (section =
/news/*).
>
> The first unit test (testExcludeSectionsWildCard) fails
trying to do this.
> If we relax any of the constraints made above, tests
pass:
>
> * Don't use WildcardQuery, but pass in the news section
and it's child section to exclude explicitly
(testExcludeSectionsExplicit)</li>
> * Exclude results from just one section, not it's
children too i.e. don't use
WildcardQuery(testExcludeSingleSection)</li>
> * Do use WildcardQuery, and exclude a section and its
children, but just use one index thereby using the simple
> IndexReader and IndexSearcher objects
(testExcludeSectionsOneIndex).
> * Try the boolean MUST clause rather than MUST_NOT
using the WildcardQuery i.e. only include results from the
/news/ section
> and its children.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
[1-3]
|
|