Isn't this what ISOLatin1Filter does? Turn Björk into
Bjork? This should be much faster than
PatternReplaceFilterFactory.
-----Original Message-----
From: Matthias Eireiner [mailto:matthias.eireiner abitero.com]
Sent: Wednesday, October 24, 2007 1:47 PM
To: solr-user lucene.apache.org
Subject: AW: Converting German special characters / umlaute
Dear list,
it has been some time, but here is what I did.
I had a look at Thomas Traeger's tip to use the
SnowballPorterFilterFactory, which does not actually do the
job.
Its purpose is to convert regular ASCII into special
characters.
And I want it the other way, such that all special character
are converted to regular ASCII.
The tip of J.J. Larrea, to use the
PatternReplaceFilterFactory, solved the problem.
And as Chris Hostetter noted, stored fields always return
the initial value, which turned the second part of my
question obsolete.
Thanks a lot for your help!
best
Matthias
-----Ursprüngliche Nachricht-----
Von: Thomas Traeger [mailto:t.traeger kabuco.de]
Gesendet: Mittwoch, 26. September 2007 23:44
An: solr-user lucene.apache.org
Betreff: Re: Converting German special characters / umlaute
Try the SnowballPorterFilterFactory described here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenF
ilters
You should use the German2 variant that converts ä and ae to
a, ö and oe
to o and so on. More details:
http://snowball.tartarus.org/algorithms/german2/stem
mer.html
Every document in solr can have any number of fields which
might have
the same source but have different field types and are
therefore handled
differently (stored as is, analyzed in different ways...).
Use copyField
in your schema.xml to feed your data into multiple fields.
During
searching you decide which fields you like to search on
(usually the
analyzed ones) and which you retrieve when getting the
document back.
Tom
Matthias Eireiner schrieb:
> Dear list,
>
> I have two questions regarding German special
characters or umlaute.
>
> is there an analyzer which automatically converts all
german special
> characters to their specific dissected from, such as ü
to ue and ä to
> ae, etc.?!
>
> I also would like to have, that the search is always
run against the
> dissected data. But when the results are returned the
initial data
> with the non modified data should be returned.
>
> Does lucene GermanAnalyzer this job? I run across it,
but I could not
> figure out from the documentation whether it does the
job or not.
>
> thanks a lot in advance.
>
> Matthias
>
|