List Info

Thread: RE: a question for french analyzer




RE: a question for french analyzer
country flaguser name
United States
2007-07-30 16:25:20
Being a French speaker, I will mention the following special
cases:

- "plus ça change" -> "plus ca
change"
- "œuf" -> "oeuf"
- "lætitia" -> "laetitia"

But I just looked, and it looks like ISOLatin1AccentFilter
handles these.
Better test to be sure...

--Renaud
 

-----Original Message-----
From: Chris Lu [mailto:chris.lugmail.com] 
Sent: Monday, July 30, 2007 1:36 PM
To: java-userlucene.apache.org
Subject: Re: a question for french analyzer

Hi, Erick,

I added ISOLatin1AccentFilter to FrenchAnalyzer following
Samir's tip, and
it works great! And I think it's the right way to go.
Problems like "You
have to store the data raw for display purposes if you want
the accents to
show though" will go away since Analyzer already have
the original text and
analyzed token mechanism built-in. And it's pretty easy to
do!

However, is there any special case that you have? Not really
knowing French,
I only tested one word, "fenêtre", and it's
analyzed into "fenetre".

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any
Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?ti
tle=Create_Lucene_Database_Search_in_3_m
inutes


On 7/30/07, Erick Erickson <erickericksongmail.com> wrote:
> Gosh, I sure hope not, because that would mean that we
rolled our
> own for no good reason. We wound up just collapsing
> the input stream by substituting plain old 'e' for all
the accented
> variants before indexing and before searching. Be
*really* careful
> what character set you're using.
>
> Actually, we would have still had to roll our own
because the
> character mapping was...er...wonky <G>....
>
> You have to store the data raw for display purposes if
you want the
> accents to show though...
>
> Best
> Erick
>
> On 7/30/07, Chris Lu <chris.lugmail.com> wrote:
> >
> > Hi,
> >
> > I am not a French speaker, but here are some
questions regarding
> > French analyzer:
> >
> > Is there any analyzer that can do this? Analyze
accentuated letters to
> > non accentuated corresponding letters (é,è,ê,ë
-> e), so that
> >
> > search "fenêtre" (=window) found all
docs with "fenêtre" or "fenetre"
> > and
> > search "fenetre" found the same result,
all docs with "fenêtre" or
> > "fenetre"
> >
> > Current analyzers, Snowball-French and
FrenchAnalyzer don't have this
> > feature.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any
Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> >
http://wiki.dbsight.com/index.php?ti
tle=Create_Lucene_Database_Search_in_3_m
inutes
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> > For additional commands, e-mail:
java-user-helplucene.apache.org
> >
> >
>

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org




------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )