List Info

Thread: "Fuzzy" searching




"Fuzzy" searching
user name
2007-03-28 14:32:02
Has anyone implemented Swish or any other method for
"fuzzy"
searching?  Swish describes fuzzy searching this way:

Word stemming, soundex, metaphone, and double-metaphone
indexing for
"fuzzy" searching

I'm mainly interested in dealing with the plural/singular
problem.

- Grant
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
user name
2007-03-28 15:12:37
On Mar 28, 2007, at 3:32 PM, Grant wrote:

> Has anyone implemented Swish or any other method for
"fuzzy"
> searching?  Swish describes fuzzy searching this way:
>
> Word stemming, soundex, metaphone, and double-metaphone
indexing for
> "fuzzy" searching
>
> I'm mainly interested in dealing with the
plural/singular problem.
I rolled my own based on Lingua::Stem and Unicode::Normalize
(I've  
got a lot of diacritic marks in the wine names).

The basic strategy I'm using is to come up with a method for
 
normalizing the data. I mainly use the perl modules above to
do that.  
I place the normalized data in a column in my products table
named  
search_blob with a FULLTEXT index. When a user conducts a
search I  
run their query  input through my normalization method and
see if it  
matches my search_blob. I'd be happy to give more details as
needed  
if this looks like a strategy that could work for you.

Bill Carr
Bottlenose - Wine & Spirits eBusiness Specialists
(877) 857-6700
http://www.bottlenose-
wine.com

_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
country flaguser name
United Kingdom
2007-03-28 17:50:38
Grant <emailgrantgmail.com> wrote:
> Has anyone implemented Swish or any other method for
"fuzzy"
> searching?  Swish describes fuzzy searching this way:
> 
> Word stemming, soundex, metaphone, and double-metaphone
indexing for
> "fuzzy" searching
> 
> I'm mainly interested in dealing with the
plural/singular problem.
> 
I am using "FuzzyIndexingMode Stemming_en2" with
Swish-e on my RTFM
website.  No problems at all.

-- 
   _/   _/  _/_/_/_/  _/    _/  _/_/_/  _/    _/
  _/_/_/   _/_/      _/    _/    _/    _/_/  _/   K e v i n 
 W a l s h
 _/ _/    _/          _/ _/     _/    _/  _/_/    kevincursor.biz
_/   _/  _/_/_/_/      _/    _/_/_/  _/    _/
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
user name
2007-03-28 18:07:56
> > Has anyone implemented Swish or any other method
for "fuzzy"
> > searching?  Swish describes fuzzy searching this
way:
> >
> > Word stemming, soundex, metaphone, and
double-metaphone indexing for
> > "fuzzy" searching
> >
> > I'm mainly interested in dealing with the
plural/singular problem.
> >
> I am using "FuzzyIndexingMode Stemming_en2"
with Swish-e on my RTFM
> website.  No problems at all.

I'm actually hoping to search the title and description
fields of my
products table.  Would Swish work well for that or is it
mainly
designed to crawl and index HTML pages?

Besides fuzzy searching, I'm eager to get something to match
the
search "big widgets" to the description "big
blue widgets".  I'm
currently using op=rm to search and it doesn't seem to make
that
connection.

- Grant
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
country flaguser name
United Kingdom
2007-03-28 18:23:17
Grant <emailgrantgmail.com> wrote:
> > > Has anyone implemented Swish or any other
method for "fuzzy"
> > > searching?  Swish describes fuzzy searching
this way:
> > >
> > > Word stemming, soundex, metaphone, and
double-metaphone indexing for
> > > "fuzzy" searching
> > >
> > > I'm mainly interested in dealing with the
plural/singular problem.
> > >
> > I am using "FuzzyIndexingMode
Stemming_en2" with Swish-e on my RTFM
> > website.  No problems at all.
> 
> I'm actually hoping to search the title and description
fields of my
> products table.  Would Swish work well for that or is
it mainly
> designed to crawl and index HTML pages?
>
There are no "pages" in the RTFM website.  All of
the text is in
various database tables.  Crawling tables is just as easy as
crawling
HTML pages.  In fact, it's quicker and easier to select from
a table.

> 
> Besides fuzzy searching, I'm eager to get something to
match the
> search "big widgets" to the description
"big blue widgets".  I'm
> currently using op=rm to search and it doesn't seem to
make that
> connection.
> 
I use "op=aq", and I'm happy with that.

-- 
   _/   _/  _/_/_/_/  _/    _/  _/_/_/  _/    _/
  _/_/_/   _/_/      _/    _/    _/    _/_/  _/   K e v i n 
 W a l s h
 _/ _/    _/          _/ _/     _/    _/  _/_/    kevincursor.biz
_/   _/  _/_/_/_/      _/    _/_/_/  _/    _/
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
user name
2007-03-28 21:56:38
> > > > Has anyone implemented Swish or any
other method for "fuzzy"
> > > > searching?  Swish describes fuzzy
searching this way:
> > > >
> > > > Word stemming, soundex, metaphone, and
double-metaphone indexing for
> > > > "fuzzy" searching
> > > >
> > > > I'm mainly interested in dealing with
the plural/singular problem.
> > > >
> > > I am using "FuzzyIndexingMode
Stemming_en2" with Swish-e on my RTFM
> > > website.  No problems at all.
> >
> > I'm actually hoping to search the title and
description fields of my
> > products table.  Would Swish work well for that or
is it mainly
> > designed to crawl and index HTML pages?
> >
> There are no "pages" in the RTFM website. 
All of the text is in
> various database tables.  Crawling tables is just as
easy as crawling
> HTML pages.  In fact, it's quicker and easier to select
from a table.

Nice.  Do you use it for speed and fuzzy searching?

> > Besides fuzzy searching, I'm eager to get
something to match the
> > search "big widgets" to the description
"big blue widgets".  I'm
> > currently using op=rm to search and it doesn't
seem to make that
> > connection.
> >
> I use "op=aq", and I'm happy with that.

I installed Text::Query::Advanced but op=aq doesn't seem to
be
working.  Is there any special configuration that needs to
be done?

- Grant
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
country flaguser name
United Kingdom
2007-03-29 01:02:07
Grant <emailgrantgmail.com> wrote:
> > > I'm actually hoping to search the title and
description fields of my
> > > products table.  Would Swish work well for
that or is it mainly
> > > designed to crawl and index HTML pages?
> > >
> > There are no "pages" in the RTFM
website.  All of the text is in
> > various database tables.  Crawling tables is just
as easy as crawling
> > HTML pages.  In fact, it's quicker and easier to
select from a table.
> >
> Nice.  Do you use it for speed and fuzzy searching?
> 
I use it for fuzzy searching and for the general accuracy of
its
results.  It's very quick too, which is always a bonus.

> >
> > I use "op=aq", and I'm happy with that.
> >
> I installed Text::Query::Advanced but op=aq doesn't
seem to be
> working.  Is there any special configuration that needs
to be done?
> 
I have "Require module Text::Query" in
interchange.cfg.  I don't
think that that is required;  The directive is really there
to give
me a kick if I move the website to another server and forget
to
install the module.

Did you install Text::Query or just Text::Query::Advanced? 
I think
Text::Query comes with both ::Advanced and and ::Simple
sub-modules.
You'll need the base module.

-- 
   _/   _/  _/_/_/_/  _/    _/  _/_/_/  _/    _/
  _/_/_/   _/_/      _/    _/    _/    _/_/  _/   K e v i n 
 W a l s h
 _/ _/    _/          _/ _/     _/    _/  _/_/    kevincursor.biz
_/   _/  _/_/_/_/      _/    _/_/_/  _/    _/
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
user name
2007-03-29 10:02:55
> > > > I'm actually hoping to search the title
and description fields of my
> > > > products table.  Would Swish work well
for that or is it mainly
> > > > designed to crawl and index HTML pages?
> > > >
> > > There are no "pages" in the RTFM
website.  All of the text is in
> > > various database tables.  Crawling tables is
just as easy as crawling
> > > HTML pages.  In fact, it's quicker and easier
to select from a table.
> > >
> > Nice.  Do you use it for speed and fuzzy
searching?
> >
> I use it for fuzzy searching and for the general
accuracy of its
> results.  It's very quick too, which is always a
bonus.
>
> > >
> > > I use "op=aq", and I'm happy with
that.
> > >
> > I installed Text::Query::Advanced but op=aq
doesn't seem to be
> > working.  Is there any special configuration that
needs to be done?
> >
> I have "Require module Text::Query" in
interchange.cfg.  I don't
> think that that is required;  The directive is really
there to give
> me a kick if I move the website to another server and
forget to
> install the module.
>
> Did you install Text::Query or just
Text::Query::Advanced?  I think
> Text::Query comes with both ::Advanced and and ::Simple
sub-modules.
> You'll need the base module.

I wasn't using Gentoo's g-cpan tool properly before. 
Installing
Text::Query installed both Simple.pm and Advanced.pm.  op=tq
is
working great and I'm very happy with the results.  op=aq
isn't
working but that's ok.  Have you compared tq vs. aq?

The CPAN page for Text::Query::Simple describes it this
way:

"Match text against simple query expression and return
relevance value
for ranking"

Do you pull that relevancy data into IC for usage there?

- Grant
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
user name
2007-03-29 10:43:12
> > > > I'm actually hoping to search the title
and description fields of my
> > > > products table.  Would Swish work well
for that or is it mainly
> > > > designed to crawl and index HTML pages?
> > > >
> > > There are no "pages" in the RTFM
website.  All of the text is in
> > > various database tables.  Crawling tables is
just as easy as crawling
> > > HTML pages.  In fact, it's quicker and easier
to select from a table.
> > >
> > Nice.  Do you use it for speed and fuzzy
searching?
> >
> I use it for fuzzy searching and for the general
accuracy of its
> results.  It's very quick too, which is always a
bonus.
>
> > >
> > > I use "op=aq", and I'm happy with
that.
> > >
> > I installed Text::Query::Advanced but op=aq
doesn't seem to be
> > working.  Is there any special configuration that
needs to be done?
> >
> I have "Require module Text::Query" in
interchange.cfg.  I don't
> think that that is required;  The directive is really
there to give
> me a kick if I move the website to another server and
forget to
> install the module.
>
> Did you install Text::Query or just
Text::Query::Advanced?  I think
> Text::Query comes with both ::Advanced and and ::Simple
sub-modules.
> You'll need the base module.

I spoke a little too soon.  op=tq is failing intermittently
and
leaving an error in error.log.  op=aq appears to leave the
same error
each time.  Here it is:

search error: Limit subroutine creation: Bad code: syntax
error at
(tag 'value') line 8, near "] tqsearch term "

If I use op=rm it never fails or produces that error in
error.log.
Could this be because I'm on IC 5.2?  Is there a particular
file or
set of files I should grab from CVS to try and fix this?

- Grant
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

Re: "Fuzzy" searching
user name
2007-03-29 11:21:38
> > I spoke a little too soon.  op=tq is failing
intermittently and
> > leaving an error in error.log.  op=aq appears to
leave the same error
> > each time.  Here it is:
> >
> > search error: Limit subroutine creation: Bad code:
syntax error at
> > (tag 'value') line 8, near "] tqsearch term
"
> >
> > If I use op=rm it never fails or produces that
error in error.log.
> > Could this be because I'm on IC 5.2?  Is there a
particular file or
> > set of files I should grab from CVS to try and fix
this?
> >
> > - Grant
>
> Hi Grant,
>
> Yes, this error was fixed previously, however I'm not
sure which files it involved.  I looked through the archives
and could not find the information.  Sorry I can't be of
more help.
>
> Are you in RPC mode?

Thanks Ron.  I am in RPC mode, but I changed it a bit:

PreFork                 Yes
StartServers            5
MaxServers              100
MaxRequestsPerChild     100
ChildLife               3660
HouseKeeping            2
PIDcheck                120

- Grant
_______________________________________________
interchange-users mailing list
interchange-usersicdevgroup.org
http://www.icdevgroup.org/mailman/listinfo/interchan
ge-users

[1-10] [11-12]

about | contact  Other archives ( Real Estate discussion Medical topics )