We have an application where we index documents that can
exist in many (at
least 2) languages.
We have 1 SolrCore per language using the same field names
in their schemas
(different stopwords , synonyms & stemmers), the
benefits for content
maintenance overweighting (at least) complexity.
Using EN & FR as an example, a document always exist in
EN as a reference
and some of them - not all - are translated in FR; the same
document unique
id is used for the reference & the translation.
If a user performs a query in FR, FR documents and EN
documents are
searched.
FR docs are seeked first; the same query is also run against
EN removing
from the document set those returned by the FR query. That
is, if document
id 'AZ123' is retrieved through the FR query, it can't be
retrieved by the
EN query. Removing the FR returned documents ids from the EN
searchable
document set guarantees that the 2 results sets are
disjoint.
1/ Anyone with the same kind of functional requirements? Is
using multiple
cores a bad idea for this need ?
On the practical side, this lead me to a handler that needs
to restrict the
document set through an externally defined list of Solr
unique ids (we also
need to deal with some upfront ACL management to top it
all).
However, I'm missing a small method that would nicely
complete the
SolrIndexSearcher.getListDoc*.
public DocList getDocList(Query query, DocSet filter, Sort
lsort, int
offset, int len, int flags) throws IOException {
DocListAndSet answer = new DocListAndSet();
getDocListC(answer,query,null,filter,lsort,offset,len,flags)
;
return answer.docList;
}
I intend to use this after I intersect potential filter
queries & the
restricted document set in the request handler; the Query
filter version of
the method is exposed, this would be the DocSet version of
it.
2/ Any reason not to do this? {Sh,C}ould this method be
included -or should
I create an enhancement request ?
My current idea to create the DocSet from the document ids
is the following:
DocSet keyFilter(org.apache.lucene.index.IndexReader
reader,
String keyField,
java.util.Iterator<String> ikeys) throws
java.io.IOException {
org.apache.solr.util.OpenBitSet bits = new
org.apache.solr.util.OpenBitSet(reader.maxDoc());
if (ikeys.hasNext()) {
org.apache.lucene.index.Term term = new
org.apache.lucene.index.Term(keyField,ikeys.next());
org.apache.lucene.index.TermDocs termDocs =
reader.termDocs(term);
try {
if (termDocs.next())
bits.fastSet(termDocs.doc());
while(ikeys.hasNext()) {
termDocs.seek(term.createTerm(ikeys.next()));
if(termDocs.next())
bits.fastSet(termDocs.doc());
}
}
finally {
termDocs.close();
}
}
return new org.apache.solr.search.BitDocSet(bits);
}
3/ Any better/faster way to create a DocSet from a list of
unique ids?
Comments & questions welcome.
Thanks
--
View this message in context: http://www.na
bble.com/query-handling---multiple-languages---multiple-core
s-tf4646246.html#a13272209
Sent from the Solr - Dev mailing list archive at
Nabble.com.
|