List Info

Thread: Re: Fwd: Decouple Filter from BitSet: API change and xml query parser




Re: Fwd: Decouple Filter from BitSet: API change and xml query parser
country flaguser name
United Kingdom
2007-08-10 06:12:00
>>Could someone give me a clue as to why the test case
TestRemoteCachingWrapperFilter fails with the patch applied?

Regardless of the reasons for this particular test failure,
this code is not safe in other ways which the test cases
don't test for.

To restate the issue: Matcher is not designed to be
threadsafe and CachingWrapperFilter (or any other example of
existing caching strategies) cannot therefore simply be
changed to cache Matchers in place of the existing scheme of
caching bitsets (which are currently used in a thread-safe
manner by all Lucene code). Bitsets don't offer the notion
of a cursor (required for "next" methods) while
Matcher does which spoils it's potential for reuse/shared
use. The remoting test code you refer to uses your modified
CachingWrapperFilter which has swapped Matchers for BitSets
and so I would anticipate thread safety issues if the tests
actually tried to share/reuse the same Matcher.

>>Finally, are DocIdSet and DocIdSetIterator currently
part of Lucene? I don't know how to go about these.

These are two of the names I gave to a notional set of 3
services that I outlined here:

 https://issues.apache.org/jira/browse/LUCENE-584
#action_12518642

I introduced this terminology to the discussion because:
1) It describes 2 services that are currently combined in
Matcher that I feel need to be separated
2) It uses a more generic description of the services
offered that can be useful when considering other
applications of the services (e.g. category count and
filtering logic both can use cached sets of doc IDs.
DocIdSet seemed to describe the service more generically
than "Matcher") 

I'm happy to drop use of these terms from this discussion if
you feel they are not useful.

Cheers
Mark


----- Original Message ----
From: Paul Elschot <paul.elschotxs4all.nl>
To: java-devlucene.apache.org
Sent: Friday, 10 August, 2007 8:45:09 AM
Subject: Fwd: Decouple Filter from BitSet: API change and
xml query parser

Taking this to java-dev only.

As I said at the jira issue, I'd like to have all test cases
pass again,
and I'm not happy with the current version of the patch to
the xml query 
parser either.

Some test cases currently fail maybe because they use RMI
and the
new version of Filter does serialize well because the result
of getMatcher()
is not serializable.
It should be possible to fix this by moving Filter to
BitSetFilter in these 
cases, see also below.
The problem is that I don't know how to do this because I
have never
used java RMI myself.

Could someone give me a clue as to why the test case
TestRemoteCachingWrapperFilter fails with the patch applied?

As for the API change, how to move from the current:

public class Filter {
  abstract public BitSet bits(IndexReader); 
}

to:

public class Filter {
  abstract public Matcher getMatcher(IndexReader); 
}

The patch proposes to do this by moving all current use of
Filter to
BitSetFilter:

public class BitSetFilter extends Filter {
  abstract public BitSet bits(IndexReader); 
}


Would it be good to have an intermediate version of Filter
like this
one:

public class Filter {
  /** deprecated, use class BitSetFilter instead */
  public BitSet bits(IndexReader); {return null;}
  abstract public Matcher getMatcher(IndexReader); 
}


Finally, are DocIdSet and DocIdSetIterator currently part of
Lucene?
I don't know how to go about these.


Regards,
Paul Elschot








----------  Forwarded Message  ----------

Subject: [jira] Commented: (LUCENE-584) Decouple Filter from
BitSet
Date: Friday 10 August 2007 01:15
From: "Mark Harwood (JIRA)" <jiraapache.org>
To: java-devlucene.apache.org


    
[ https://issues.apache.org/jira/browse/
LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12518868 ] 

Mark Harwood commented on LUCENE-584:
-------------------------------------

OK, I appreciate caching may not be a top priority in this
proposal but I have 
live systems in production using XMLQueryParser and which
use the existing 
core facilities for caching. As it stands this proposal
breaks this 
functionality (see "FIXME" in contrib's
CachedFilterBuilder and my concerns 
over use of  unthreadsafe Matcher in the core class
CachingWrapperFilter)

I am obviously concerned by this and keen to help shape a
solution which 
preserves the existing capabilities while adding your new
functionality. I'm 
not sure I share your view that support for caching can be
treated as a 
separate issue to be dealt with at a later date. There are a
larger number of 
changes proposed in this patch and if the design does not at
least consider 
future caching issues now, I suspect much will have to be
reworked later. The 
change I can envisage most clearly is expressed in my
concern that the 
DocIdSet and DocIdSetIterator services I outlined are being
combined in 
Matcher as it stands now and these functions will have to be
separated.

Cheers
Mark

> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, 
Matcher1-ground-20070730.patch,
Matcher2-default-20070730.patch, 
Matcher3-core-20070730.patch,
Matcher4-contrib-misc-20070730.patch, 
Matcher5-contrib-queries-20070730.patch,
Matcher6-contrib-xml-20070730.patch, 
Some Matchers.zip
>
>
> 
> package org.apache.lucene.search;
> public abstract class Filter implements
java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader
reader) throws 
IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> 
> It would be useful if the method =Filter.bits()=
returned an abstract 
interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending
on the user's 
privileges, only a small portion of the index is actually
visible.
> Sparsely populated =java.util.BitSet=s are not
efficient and waste lots of 
memory. It would be desirable to have an alternative BitSet
implementation 
with smaller memory footprint.
> Though it _is_ possibly to derive classes from
=java.util.BitSet=, it was 
obviously not designed for that purpose.
> That's why I propose to use an interface instead. The
default implementation 
could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org





-------------------------------------------------------

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org






     
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the
answer. Try it
now.
http://uk.answers.yahoo.
com/

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Re: Fwd: Decouple Filter from BitSet: API change and xml query parser
country flaguser name
Netherlands
2007-08-10 11:31:02

On Friday 10 August 2007 13:12, mark harwood wrote:
> >>Could someone give me a clue as to why the test
case 
TestRemoteCachingWrapperFilter fails with the patch
applied?
> 
> Regardless of the reasons for this particular test
failure, this code is not 
safe in other ways which the test cases don't test for.
> 
> To restate the issue: Matcher is not designed to be
threadsafe

A Matcher is almost a Scorer, the only difference is that it
does not
have a score() method. Scorers are not threadsafe, they are
used
once during a query search. The intention is to use
Matchers
in the same way: once during a query search in case no score
value
is needed.

> and CachingWrapperFilter (or any other example of
existing 
> caching strategies) cannot therefore simply be changed
to
> cache Matchers in place of the existing scheme of
caching bitsets 
> (which are currently used in a thread-safe manner by
all Lucene code). 
> Bitsets don't offer the notion of a cursor (required
for "next"
> methods) while Matcher does which spoils it's potential
for 
> reuse/shared use.     

The idea is not to cache the Matchers, but the underlying
data structure.

> The remoting test code you refer to uses your modified

> CachingWrapperFilter which has swapped Matchers for
BitSets 
> and so I would anticipate thread safety issues if the
tests actually 
> tried to share/reuse the same Matcher.     

Thanks for taking a look at the code.
I'll change the CachingWrapperFilter to use a BitSetFilter,
and then hopefully more test cases will pass.
 
> >>Finally, are DocIdSet and DocIdSetIterator
currently part of Lucene? I 
don't know how to go about these.
> 
> These are two of the names I gave to a notional set of
3 services that I 
outlined here:
> 
>  https://issues.apache.org/jira/browse/LUCENE-584
#action_12518642
> 
> I introduced this terminology to the discussion
because:
> 1) It describes 2 services that are currently combined
in Matcher
> that I feel need to be separated 

The idea of Matcher is that it is a Scorer without a score()
method,
and no more.

> 2) It uses a more generic description of the services
offered that can be 
useful when considering other applications of the services
(e.g. category 
count and filtering logic both can use cached sets of doc
IDs. DocIdSet 
seemed to describe the service more generically than
"Matcher") 
> 
> I'm happy to drop use of these terms from this
discussion if you 
> feel they are not useful. 

I think that DocIdSet has the role of the underlying data
structure that
would be cached, and that DocIdSetIterator is something very
close
to Matcher or even the same thing.

Which brings me to another question: which data structure
would
you like to have cached for filtering in the xml query
parser?
I think initially BitSet would do nicely, but one could also
take
the opportunity to use more compact data structures when
possible.


Finally one of the examples classes I gave is incomplete,
see below.
I wrote:
> 
...
> As for the API change, how to move from the current:
> 
> public class Filter {
>   abstract public BitSet bits(IndexReader); 
> }
> 
> to:
> 
> public class Filter {
>   abstract public Matcher getMatcher(IndexReader); 
> }
> 
> The patch proposes to do this by moving all current use
of Filter to
> BitSetFilter:
> 
> public class BitSetFilter extends Filter {
>   abstract public BitSet bits(IndexReader); 

   // BitSetFilter also has:

   public Matcher getMatcher(IndexReader reader) {
      return DefaultMatcher.defaultMatcher(bits());
   }

> }

Regards,
Paul Elschot


> 
> Would it be good to have an intermediate version of
Filter like this
> one:
> 
> public class Filter {
>   /** deprecated, use class BitSetFilter instead */
>   public BitSet bits(IndexReader); {return null;}
>   abstract public Matcher getMatcher(IndexReader); 
> }
> 
> 
...
> 
> 
> Regards,
> Paul Elschot
> 
> 
> 
> 
> 
> 
> 
> 
> ----------  Forwarded Message  ----------
> 
> Subject: [jira] Commented: (LUCENE-584) Decouple Filter
from BitSet
> Date: Friday 10 August 2007 01:15
> From: "Mark Harwood (JIRA)" <jiraapache.org>
> To: java-devlucene.apache.org
> 
> 
>     
> 
[ https://issues.apache.org/jira/browse/
LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12518868 ] 
> 
> Mark Harwood commented on LUCENE-584:
> -------------------------------------
> 
> OK, I appreciate caching may not be a top priority in
this proposal but I 
have 
> live systems in production using XMLQueryParser and
which use the existing 
> core facilities for caching. As it stands this proposal
breaks this 
> functionality (see "FIXME" in contrib's
CachedFilterBuilder and my concerns 
> over use of  unthreadsafe Matcher in the core class
CachingWrapperFilter)
> 
> I am obviously concerned by this and keen to help shape
a solution which 
> preserves the existing capabilities while adding your
new functionality. I'm 
> not sure I share your view that support for caching can
be treated as a 
> separate issue to be dealt with at a later date. There
are a larger number 
of 
> changes proposed in this patch and if the design does
not at least consider 
> future caching issues now, I suspect much will have to
be reworked later. 
The 
> change I can envisage most clearly is expressed in my
concern that the 
> DocIdSet and DocIdSetIterator services I outlined are
being combined in 
> Matcher as it stands now and these functions will have
to be separated.
> 
> Cheers
> Mark
> 
...

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )