List Info

Thread: multiple analyzers




multiple analyzers
user name
2006-11-15 00:39:50
On Nov 14, 2006, at 8:33 AM, Peter Sinnott wrote:

> The KinoSearch::Searcher docs say
>
> "analyzer - An object which subclasses
KinoSearch::Analysis::Analyer,
> such as a PolyAnalyzer. This must be identical to the
Analyzer used at
> index-time, or the results won't match up."
>
> Does this mean if I use different analyzers for
different fields
> when creating the index then I can not search it
properly?

Depends.  You definitely *can* search it properly, but you
may need  
to get sophisticated about how you build your queries.

I'll give an example that doesn't use Analyzers, but
illustrates the  
principle.

    my $polyanalyzer =
KinoSearch::Analysis::PolyAnalyzer->new(
         language => 'en' );

    my $invindexer = KinoSearch::InvIndexer->new(
        analyzer => $polyanalyzer,
        invindex => '/path/to/invindex',
    );

    $invindexer->spec_field( name => 'body' );
    $invindexer->spec_field(
       name     => 'category'
       analyzed => 0,
    );

Now, say we add a document with the category of 'books'. 
Because the  
category field doesn't get analyzed, the string 'books'
makes it  
intact into the index.  However, if the word 'books' ever
appears  
anywhere in the body, it will get stemmed down to 'book' by
the  
PolyAnalyzer.

Because the following search will make use of the english  
PolyAnalyzer, it will return only matches on 'book' -- NOT
'books'...

    my $searcher = KinoSearch::Searcher->new(
        analyzer => $polyanalyzer,
        invindex => '/path/to/invindex',
    );
    my $hits = $searcher->search('books');

... so it will never match a document where the category is
'books'.

However, there are a number of ways to construct your query
so that  
you match the category 'books'.  Here's one:

    my $category_query_parser =
KinoSearch::QueryParser::QueryParser- 
 >new(
        analyzer =>
KinoSearch::Analysis::Analyzer->new, # no-op
        fields   => [ 'category' ],
    );
    my $main_query_parser =
KinoSearch::QueryParser::QueryParser->new(
        analyzer => $poly_analyzer,
        fields   => [ 'body' ],
    );

    my $bool_query =
KinoSearch::Search::BooleanQuery->new;

    # search category field for the unstemmed 'books'
    my $cat_query =
$category_query_parser->parse('books');
    $bool_query->add_clause( query => $cat_query,
occur => 'SHOULD' );

    # search body field for the stemmed 'book'
    my $main_query = $main_query_parser->parse('books');
    $bool_query->add_clause( query => $main_query,
occur => 'SHOULD' );

    my $hits = $searcher->search( query => $bool_query
);

Snoop the _prepare_simple_search() method in
KinoSearch::Searcher to  
see what KS is doing behind the scenes to build a query when
you  
supply only a query string.

HTH,

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )