List Info

Thread: BooleanWeight.normalize(float) doesn't normalize prohibited clauses?




BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-10 23:39:36
I'm looking into some of the issues with LUCENE-557 and it
seems that a
lot of them are triggered by the way BooleanWeight.normalize
is
implimented...

    public void normalize(float norm) {
      norm *= getBoost();                         //
incorporate boost
      for (int i = 0 ; i < weights.size(); i++) {
        BooleanClause c =
(BooleanClause)clauses.elementAt(i);
        Weight w = (Weight)weights.elementAt(i);
        if (!c.isProhibited())
          w.normalize(norm);
      }
    }

...since prohibited clauses aren't normalized, they
sub-weights don't get
their weights set properly, which means that when the
Explanation is
claculated, they tend to result in an Explanation with a
value of "0" ...
and since they are prohibited, the Explanation for
BooleanQuery thinks
that is a good thing.

Does anyone know why normalize ignores the prohibited
clauses?  was that
just intended to be an optimization (save time calculating
stuff for
clauses we don't care about scoring in depth) ... ?



-Hoss


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-11 06:56:33
On Thursday 11 May 2006 01:39, Chris Hostetter wrote:
> 
> I'm looking into some of the issues with LUCENE-557
and it seems that a
> lot of them are triggered by the way
BooleanWeight.normalize is
> implimented...
> 
>     public void normalize(float norm) {
>       norm *= getBoost();                         //
incorporate boost
>       for (int i = 0 ; i < weights.size(); i++) {
>         BooleanClause c =
(BooleanClause)clauses.elementAt(i);
>         Weight w = (Weight)weights.elementAt(i);
>         if (!c.isProhibited())
>           w.normalize(norm);
>       }
>     }
> 
> ...since prohibited clauses aren't normalized, they
sub-weights don't get
> their weights set properly, which means that when the
Explanation is
> claculated, they tend to result in an Explanation with
a value of "0" ...
> and since they are prohibited, the Explanation for
BooleanQuery thinks
> that is a good thing.
> 
> Does anyone know why normalize ignores the prohibited
clauses?  was that
> just intended to be an optimization (save time
calculating stuff for
> clauses we don't care about scoring in depth) ... ?

A prohibited clause will never occur in any matching
document, so it
will never need to take part in any score value calculation.

Regards,
Paul Elschot

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-11 07:49:27
: > Does anyone know why normalize ignores the prohibited
clauses?  was that
: > just intended to be an optimization (save time
calculating stuff for
: > clauses we don't care about scoring in depth) ... ?
:
: A prohibited clause will never occur in any matching
document, so it
: will never need to take part in any score value
calculation.

that's true ... but when calculating an Explanation, the
only way
BooleanWeight.explain has to determine wether or not a
clause matched, is
by looking at the value of it's Explanation -- if the
clause has never
been normalized, it's Explanation may be inacurate.  As i
mentioned...

: > ...since prohibited clauses aren't normalized, they
sub-weights don't get
: > their weights set properly, which means that when the
Explanation is
: > claculated, they tend to result in an Explanation
with a value of "0" ...
: > and since they are prohibited, the Explanation for
BooleanQuery thinks
: > that is a good thing.

To elaborate: if a prohibited clause is a TermQuery, and
that TermQuery
isn't normalized, it will return an Explanation with a
value of 0.0f for
both matches and non matches.  When BooleanWeight.explain
sees that the
Explanation for the prohibited clause has a value of 0.0f it
assumes that
ment the clause didn't match.

In essense, BooleanWeight.explain relies on a precondition
BooleanWeight.normalize doesn't establish.

The attachments in LUCENE-557 demonstrate this problem.  In
particular,
take a look at the testBQ* functions in
TestSimpleExplanations.java



-Hoss


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-11 18:46:20
On Thursday 11 May 2006 09:49, Chris Hostetter wrote:
> 
> : > Does anyone know why normalize ignores the
prohibited clauses?  was that
> : > just intended to be an optimization (save time
calculating stuff for
> : > clauses we don't care about scoring in depth)
... ?
> :
> : A prohibited clause will never occur in any matching
document, so it
> : will never need to take part in any score value
calculation.
> 
> that's true ... but when calculating an Explanation,
the only way
> BooleanWeight.explain has to determine wether or not a
clause matched, is
> by looking at the value of it's Explanation -- if the
clause has never
> been normalized, it's Explanation may be inacurate. 
As i mentioned...

If class Explanation would have a boolean attribute
indicating whether
or not there was a match, the Explanation for BooleanQuery
could
simply use this value from the Explanation of the prohibited
clause.

Regards,
Paul Elschot

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-11 19:51:04
: If class Explanation would have a boolean attribute
indicating whether
: or not there was a match, the Explanation for BooleanQuery
could
: simply use this value from the Explanation of the
prohibited clause.

I've definitely thought about that a lot initially.  But my
gut reaction
was to try and fix the broken explain methods using the
current
limitations of the Explanation class to reduce the size of
the patch.

Unfortunately there are still some cases that can't be
solved without that
information, ie...

     "w1 w2^0.0"    (testBQ12 from the bug)
     "+w1^0.0 w2"   (testBQ18 from the bug)

In the first case, documents which match both terms get
their score
divided in half because the the explain method can't tell
the score of 0.0
is becuase of the boost, so the coord factor gets applied by
mistake.

In the second case, the explain method assumes a total
failure even
if a document matches both terms because it got a 0.0 score
from a
required clause.

Other then the (somewhat obscure) cases where a clause has a
boost of 0.0,
I managed to fix all of the BooleanQuery explain bugs by
normalizing all
clauses (even if they are prohibited) and fixing some poor
assumptions in
the explain method itself.

I'm going to set BooleanQuery aside for a little bit and
focus on some of
the other query classes, but here's what i had in mind for
changing the
Explanation class, if anyone sees any problems please let me
know...

1) Add the following to Explanation...

   Boolean match = null;
   public void setMatch(boolean b) { match = new Boolean(b);
}
   public Boolean getMatch() { return match; }
   public boolean isMatch() {
     return (null != match) ? match.booleanValue() : (0.0f
< getValue());
   }

2) change Explanation.toString and toHtml to have something
along the
lines of ...

    if (null != match)
       buffer.append("Definite
"+(match.booleanValue()?"":"NON-&qu
ot;)+"match");
    else
       buffer.append("Assuming Match");

3) change all explain implimentations in lucene core to call
setMatch when
they call setValue.

4) change BooleanWeight.explain to call isMatch on the
sub-explanations
when testing prohibited/required clauses.

4) change all of my Explanation tests to call isMatch.

...this would be backwards compatible for any non-core Query
classes out
there, and (as far as i can figure) be no worse then the
current behavior
of testing an explanation.getValue() == 0.0f  9since that's
the fallback
inside of isMatch())



	thoughts?


-Hoss


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-11 21:23:32
On Thursday 11 May 2006 21:51, Chris Hostetter wrote:
> 
> : If class Explanation would have a boolean attribute
indicating whether
> : or not there was a match, the Explanation for
BooleanQuery could
> : simply use this value from the Explanation of the
prohibited clause.
> 
> I've definitely thought about that a lot initially. 
But my gut reaction
> was to try and fix the broken explain methods using the
current
> limitations of the Explanation class to reduce the size
of the patch.
> 
> Unfortunately there are still some cases that can't be
solved without that
> information, ie...
> 
>      "w1 w2^0.0"    (testBQ12 from the bug)
>      "+w1^0.0 w2"   (testBQ18 from the bug)
> 
> In the first case, documents which match both terms get
their score
> divided in half because the the explain method can't
tell the score of 0.0
> is becuase of the boost, so the coord factor gets
applied by mistake.
> 
> In the second case, the explain method assumes a total
failure even
> if a document matches both terms because it got a 0.0
score from a
> required clause.
> 
> Other then the (somewhat obscure) cases where a clause
has a boost of 0.0,
> I managed to fix all of the BooleanQuery explain bugs
by normalizing all
> clauses (even if they are prohibited) and fixing some
poor assumptions in
> the explain method itself.
> 
> I'm going to set BooleanQuery aside for a little bit
and focus on some of
> the other query classes, but here's what i had in mind
for changing the
> Explanation class, if anyone sees any problems please
let me know...
> 
> 1) Add the following to Explanation...
> 
>    Boolean match = null;

As for the thoughts question below: this java-dev, not c-dev


>    public void setMatch(boolean b) { match = new
Boolean(b); }
>    public Boolean getMatch() { return match; }
>    public boolean isMatch() {
>      return (null != match) ? match.booleanValue() :
(0.0f < getValue());
>    }

As long as there is no match, there will be no score, and no
score could
also be represented by NaN, so one might by default
initialize the score
value to NaN, drop setMatch() and isMatch() above, and have
only:

public Boolean getMatch() { return ! Float.isNaN(score); }

But I'm not yet sure wether that would work in all cases.
Is it possible/thinkable for a (sub)query to have a score
value for a
document, but no match against the same document?
 
> 2) change Explanation.toString and toHtml to have
something along the
> lines of ...
> 
>     if (null != match)
>        buffer.append("Definite
"+(match.booleanValue()?"":"NON-&qu
ot;)+"match");
>     else
>        buffer.append("Assuming Match");
> 
> 3) change all explain implimentations in lucene core to
call setMatch when
> they call setValue.

That would be avoided by having getMatch() only. Once
setMatch is called,
getMatch would return false, except when setMatch is given a
NaN, but
that is probably not done in the current Lucene code.

> 
> 4) change BooleanWeight.explain to call isMatch on the
sub-explanations
> when testing prohibited/required clauses.

Or call getMatch(), whichever is implemented. This makes
explaining the
score of a BooleanQuery much more natural than it is now.
It might even become practical to use the explain() methods
of the scorers
that BooleanScorer2 is using. Only ConjunctionScorer would
need
an implementation of explain() in that case.
 
> 4) change all of my Explanation tests to call isMatch.

Yes.
 
> ...this would be backwards compatible for any non-core
Query classes out
> there, and (as far as i can figure) be no worse then
the current behavior
> of testing an explanation.getValue() == 0.0f  9since
that's the fallback
> inside of isMatch())

With the implementation above, the current code would have
to be
changed for the case when a 0.0f score value is used to
indicate no match
in an explanation: in that case no call to setValue() should
be done.

> 
> 
> 
> 	thoughts?

null for false: long time no see...

Regards,
Paul Elschot

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
user name
2006-05-11 22:12:27
: >    Boolean match = null;
:
: As for the thoughts question below: this java-dev, not
c-dev 

i could not for the life of me understand this comment
untill i got to the
end of your message...

: null for false: long time no see...

...i'm not trying to use null for false, i'm using null to
indicate that
wether or not a match occured has not been explicitly
specified -- it can
only be infered from the "value" of the
explanation.  "true" means a
definitive match, and "false" means a definitive
non-match.

: As long as there is no match, there will be no score, and
no score could
: also be represented by NaN, so one might by default
initialize the score
: value to NaN, drop setMatch() and isMatch() above, and
have only:
:
: public Boolean getMatch() { return ! Float.isNaN(score); }

I assume by "score" you mean "value"
(Explanations don't have a score
attribute, just a value attribute).  I don't want to go
down the road of
assuming a match based on some special value of of the value
-- that's the
cause of the current problems.  NaN is admitedly a better
choice then
"0.0", but it's still a value that could
concievably come up when scoring
a document for some as yet non-existent Scorer.

what i really want is for the Explanation class to precisely
model the
same information as a Scorer returns for each doc...

   1) if scorer.doc() would ever return X, then the
Explanation for that X
      should have a boolean indicating a "match"
   2) whatever value is returned by scorer.score() when
scorer.doc() is
      returning X should be what the Explanation for X
returns when you
      call getValue().

: But I'm not yet sure wether that would work in all cases.
: Is it possible/thinkable for a (sub)query to have a score
value for a
: document, but no match against the same document?

I'm not sure if it can ever exist, since you currently
can't ask a Scorer
for the score of a document unless the document matches, but
the converse
is certainly true: a document can match on a query but have
a score of 0,
or less then 0, or NaN ... that's what i'm trying to deal
with, i want to
be able to model all of those cases in an Explanation
object.

: That would be avoided by having getMatch() only. Once
setMatch is called,
: getMatch would return false, except when setMatch is given
a NaN, but
: that is probably not done in the current Lucene code.

Right.  currently *most* Explanations get a 0.0 value set if
it's a non
match (some of them don't work at all for non-matches) ..
which is why if
an explicit match boolean isn't specified, I want the fall
back assumption
to be based on wether the value is 0.0 -- because that's
the current
method for determining a match, and it will work with legacy
custom Query
types people may have written which aren't in the lucene
code base.

: > 4) change BooleanWeight.explain to call isMatch on
the sub-explanations
: > when testing prohibited/required clauses.
:
: Or call getMatch(), whichever is implemented. This makes
explaining the

i want to impliment both ... getMatch() existing for people
that want to
know the exact state of the match (and will check for null
to determine if
hte exact state is unknown) and isMatch for people who want
the "best
guess" behavior ... which is not encapsulated in a
method, instead of it
just being "convention"



: It might even become practical to use the explain()
methods of the scorers
: that BooleanScorer2 is using. Only ConjunctionScorer would
need
: an implementation of explain() in that case.

you lost me there ... BooleanScorer2 doesn't impliment
explain (it throws
UnsupportedOperationException).  As far as i can tell almost
none of the
Scorers in lucene core have good explain(int)
implimentations.  most throw
UnsupportedOperationException, and the ones that don't tend
to not be
correct (DisjunctionSumScorer for example never sets the
value, so under
the current semantics it's explanations allways indicate
failure)

Fortunately, I've yet to see a single Weight that delegates
to it's Scorer
for building an Explanation.

: > ...this would be backwards compatible for any
non-core Query classes out
: > there, and (as far as i can figure) be no worse then
the current behavior
: > of testing an explanation.getValue() == 0.0f  9since
that's the fallback
: > inside of isMatch())
:
: With the implementation above, the current code would have
to be
: changed for the case when a 0.0f score value is used to
indicate no match
: in an explanation: in that case no call to setValue()
should be done.

I'm not following you ... even if existing code doesn't
call setValue, the
additions i'm suggestion would assume "no
match" unless setMatch was
called -- but it would expose the fact that this was only an
assumption
for anyone who wanted to know (which isn't currently
possible)



-Hoss


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

Explaining a filter; Scorer extending Matcher; (was: BooleanWeight.normalize(float) doesn't normali
user name
2006-05-21 12:15:45
On Friday 12 May 2006 00:12, Chris Hostetter wrote:
> 
> : >    Boolean match = null;
> :
> : As for the thoughts question below: this java-dev,
not c-dev 
> 
...
> 
> ...i'm not trying to use null for false, i'm using
null to indicate that
> wether or not a match occured has not been explicitly
specified -- it can
> only be infered from the "value" of the
explanation.  "true" means a
> definitive match, and "false" means a
definitive non-match.
> 
> : As long as there is no match, there will be no score,
and no score could
> : also be represented by NaN, so one might by default
initialize the score
> : value to NaN, drop setMatch() and isMatch() above,
and have only:
> :
> : public Boolean getMatch() { return !
Float.isNaN(score); }
> 
> I assume by "score" you mean
"value" (Explanations don't have a score
> attribute, just a value attribute).  I don't want to
go down the road of
> assuming a match based on some special value of of the
value -- that's the
> cause of the current problems.  NaN is admitedly a
better choice then
> "0.0", but it's still a value that could
concievably come up when scoring
> a document for some as yet non-existent Scorer.
> 
> what i really want is for the Explanation class to
precisely model the
> same information as a Scorer returns for each doc...
> 
>    1) if scorer.doc() would ever return X, then the
Explanation for that X
>       should have a boolean indicating a
"match"
>    2) whatever value is returned by scorer.score() when
scorer.doc() is
>       returning X should be what the Explanation for X
returns when you
>       call getValue().
> 
> : But I'm not yet sure wether that would work in all
cases.
> : Is it possible/thinkable for a (sub)query to have a
score value for a
> : document, but no match against the same document?
> 
> I'm not sure if it can ever exist, since you currently
can't ask a Scorer
> for the score of a document unless the document
matches, but the converse
> is certainly true: a document can match on a query but
have a score of 0,
> or less then 0, or NaN ... that's what i'm trying to
deal with, i want to
> be able to model all of those cases in an Explanation
object.

Having a score value without a match not normally possible
when searching
a query, but for Filter this is actually the normal case: a
Filter may match a 
document, but it does not provide a score value.

> 
> : That would be avoided by having getMatch() only. Once
setMatch is called,
> : getMatch would return false, except when setMatch is
given a NaN, but
> : that is probably not done in the current Lucene code.
> 
> Right.  currently *most* Explanations get a 0.0 value
set if it's a non
> match (some of them don't work at all for non-matches)
.. which is why if
> an explicit match boolean isn't specified, I want the
fall back assumption
> to be based on wether the value is 0.0 -- because
that's the current
> method for determining a match, and it will work with
legacy custom Query
> types people may have written which aren't in the
lucene code base.
> 
> : > 4) change BooleanWeight.explain to call isMatch
on the sub-explanations
> : > when testing prohibited/required clauses.
> :
> : Or call getMatch(), whichever is implemented. This
makes explaining the
> 
> i want to impliment both ... getMatch() existing for
people that want to
> know the exact state of the match (and will check for
null to determine if
> hte exact state is unknown) and isMatch for people who
want the "best
> guess" behavior ... which is not encapsulated in
a method, instead of it
> just being "convention"

In case Explanation is also to explain what a Filter does,
it would need to
have both a match flag and a score value.

At the moment I'm trying to implement filters by
refactoring Scorer to have an
abstract superclass Matcher that could also become a
superclass for filter
implementations (instead of DocNrSkipper).

This Matcher class has all methods of Scorer that are not
using score
values: doc(), next(), skipTo(docNr).
An explain() method is also useful in such a Matcher, but it
has
no score value available, it only knows whether or not a
document matches.

The implementation also has an interface MatcherProvider
(instead of
SkipFilter):
   public Matcher getMatcher(IndexReader reader) throws ioe;
Filter could implement MatcherProvider as an alternative to
the
SkipFilter1.patch here:
http:
//issues.apache.org/jira/browse/LUCENE-328

Any thoughts on whether such a Matcher would be preferable
to 
a DocNrSkipper that only has this method:
  int nextDocNr(int docNr)
?

Regards,
Paul Elschot

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

Explaining a filter; Scorer extending Matcher; (was: BooleanWeight.normalize(float) doesn't normali
user name
2006-05-21 19:06:32
"Any thoughts on whether such a Matcher would be
preferable to 
a DocNrSkipper that only has this method:
  int nextDocNr(int docNr)
?"

As far as I can comprehend, it makes a lot of sense to
decouple Scoring from Matching (of course their intermixing
as well).  This would practically mean that complete Query
execution  mechanics could be reused instead of making
"paralel world" of non scoring
queries/ChainedFilters... 

Please correct me if understanding is wrong:
- It would enable one Clause in BooleanQuery to be
"pure boolean" (aka constant scoring)
- it would permit Filter-s to be combined without
"external helpers" like ChainedFilter by using
Matcher only, by purely using existing Query


DocNrSkipper and such Matcher are semantically the same,  it
couldd work just the other way round  as well where  Matcher
implements  DocNrSkipper. DocNrSkipper just skipps docs,
Matcher can do u bit more so the other option could even be
ok? Anyhow, sounds like style question    






------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

Explaining a filter; Scorer extending Matcher; (was: BooleanWeight.normalize(float) doesn't normali
user name
2006-05-21 21:04:02
On Sunday 21 May 2006 21:06, eks dev wrote:
> 
> "Any thoughts on whether such a Matcher would be
preferable to 
> a DocNrSkipper that only has this method:
>   int nextDocNr(int docNr)
> ?"
> 
> As far as I can comprehend, it makes a lot of sense to
decouple Scoring from 
Matching (of course their intermixing as well).  This would
practically mean 
that complete Query execution  mechanics could be reused
instead of making 
"paralel world" of non scoring
queries/ChainedFilters... 

Query execution is always done by a Scorer.

> 
> Please correct me if understanding is wrong:
> - It would enable one Clause in BooleanQuery to be
"pure boolean" (aka 
constant scoring)

No, for the reason above. There already is a
ConstantScoreQuery that
has a constructor that takes a Filter argument.

> - it would permit Filter-s to be combined without
"external helpers" like 
ChainedFilter by using Matcher only, by purely using
existing Query

Same situation. However, it might be possible to refactor
the Matcher
logic out of the various scorers used by boolean query to be
reused for
boolean operations on filter like things, but I wouldn't
expect much to
be gained from that.

> 
> 
> DocNrSkipper and such Matcher are semantically the
same,  it couldd work 
just the other way round  as well where  Matcher implements 
DocNrSkipper. 
DocNrSkipper just skipps docs, Matcher can do u bit more so
the other option 
could even be ok? Anyhow, sounds like style question    

A Matcher needs a little bit of dynamic state in the form of
the current
document number, and DocNrSkipper does not have that, so
they
are not completely equivalent. 

As for the style, I have no preference either way. Doing a
filtered search
in IndexSearcher takes only a few lines more code than a non
filtered search
in any case.

Regards,
Paul Elschot

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org

[1-10] [11-17]

about | contact  Other archives ( Real Estate discussion Medical topics )