Email lists > Mailing list for Xapian developers > Re: [Xapian-devel] [Xapian-commits] 10821: trunk/xapian-core/ trunk/xapian-core/api/ > Re: [Xapian-devel] [Xapian-commits] 10821: trunk/xapian-core/ trunk/xapian-core/api/

Re: [Xapian-devel] [Xapian-commits] 10821: trunk/xapian-core/ trunk/xapian-core/api/




This post if a part of  this thread

2008-07-13 04:38:45
Re: 10821: trunk/xapian-core/ trunk/xapian-core/api/
On Mon, Jul 07, 2008 at 08:06:32AM +0100, Richard Boulton
wrote:
> Olly Betts wrote:
> > On Sun, Jul 06, 2008 at 11:57:40PM +0100, richard
wrote:
> >> api/omenquire.cc: When calculating
percentages, round to the
> >> nearest integer, rather than rounding down. 
There was a FIXME
> >> about this, but no explanation of why it
hadn't already been
> >> done, and I can see no bad side effects so
far.  The most obvious
> >> positive effect is that queries which should
get precisely 100%
> >> will no longer be assigned 99% due to rounding
errors.
> > 
> > Well, one issue is that queries which shouldn't
get precisely 100% now
> > can...
> > 
> > I don't know how common an issue that is, but then
I don't know how
> > common the issue you mention is either.
> 
> The test case I committed yesterday suffered from this
problem for me, 
> and I've certainly seen it before (generally with large
queries), but I 
> couldn't guess at a rate at which it occurs.

I can't reproduce this issue with the patch reversed, and it
makes
handling of percentage cutoff inconsistent - setting the
cutoff to n%
doesn't return documents which would have got n% by being
rounded up.

So I've reversed it for now (and added a testcase pctcutoff3
to show the
issue, which failed with the patch applied).

> I don't think it's unreasonable to return 100% for a
document which 
> matches well enough to get 99.5%; and it's certainly
more reasonable 
> than returning 99% for a document which actually got
99.999999%.
> 
> I suppose we could instead round up only very slightly,
so that a 
> document needed to get at least 99.9999% or so to be
returned with 100%. 

If it's going to be a threshold, we should pick one
appropriate for the
rounding errors that can happen rather than something
arbitrary.  Or else
ensure that "matches all documents" is handled
specially such that
rounding isn't an issue (which I thought already happened as
a short cut
actually, so I'm not sure how we can get rounding errors
here - excess
precision on x86 maybe?)

If we're going to round, we need to fix how the percentage
cut-off is
handled by the matches to account for the 0.5% shift.

Do you have a repeatable testcase where this happens?

Cheers,
    Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel

about | contact  Other archives ( Real Estate discussion Medical topics )