List Info

Thread: Re: Converting German special characters / umlaute




Re: Converting German special characters / umlaute
country flaguser name
Germany
2007-09-26 16:44:25
Try the SnowballPorterFilterFactory described here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenF
ilters

You should use the German2 variant that converts ä and ae to
a, ö and oe 
to o and so on. More details:
http://snowball.tartarus.org/algorithms/german2/stem
mer.html

Every document in solr can have any number of fields which
might have 
the same source but have different field types and are
therefore handled 
differently (stored as is, analyzed in different ways...).
Use copyField 
in your schema.xml to feed your data into multiple fields.
During 
searching you decide which fields you like to search on
(usually the 
analyzed ones) and which you retrieve when getting the
document back.

Tom

Matthias Eireiner schrieb:
> Dear list,
>
> I have two questions regarding German special
characters or umlaute.
>
> is there an analyzer which automatically converts all
german special
> characters to their specific dissected from, such as ü
to ue and ä to
> ae, etc.?!
>
> I also would like to have, that the search is always
run against the
> dissected data. But when the results are returned the
initial data with
> the non modified data should be returned. 
>
> Does lucene GermanAnalyzer this job? I run across it,
but I could not
> figure out from the documentation whether it does the
job or not.
>
> thanks a lot in advance.
>
> Matthias
>   

AW: Converting German special characters / umlaute
user name
2007-10-24 15:46:30
Dear list,

it has been some time, but here is what I did.
I had a look at Thomas Traeger's tip to use the
SnowballPorterFilterFactory, which does not actually do the
job.
Its purpose is to convert regular ASCII into special
characters. 

And I want it the other way, such that all special character
are
converted to regular ASCII.
The tip of J.J. Larrea, to use the
PatternReplaceFilterFactory, solved
the problem. 
 
And as Chris Hostetter noted, stored fields always return
the initial
value, which turned the second part of my question
obsolete.

Thanks a lot for your help!

best 
Matthias



-----Ursprüngliche Nachricht-----
Von: Thomas Traeger [mailto:t.traegerkabuco.de] 
Gesendet: Mittwoch, 26. September 2007 23:44
An: solr-userlucene.apache.org
Betreff: Re: Converting German special characters / umlaute


Try the SnowballPorterFilterFactory described here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenF
ilters

You should use the German2 variant that converts ä and ae to
a, ö and oe

to o and so on. More details:
http://snowball.tartarus.org/algorithms/german2/stem
mer.html

Every document in solr can have any number of fields which
might have 
the same source but have different field types and are
therefore handled

differently (stored as is, analyzed in different ways...).
Use copyField

in your schema.xml to feed your data into multiple fields.
During 
searching you decide which fields you like to search on
(usually the 
analyzed ones) and which you retrieve when getting the
document back.

Tom

Matthias Eireiner schrieb:
> Dear list,
>
> I have two questions regarding German special
characters or umlaute.
>
> is there an analyzer which automatically converts all
german special 
> characters to their specific dissected from, such as ü
to ue and ä to 
> ae, etc.?!
>
> I also would like to have, that the search is always
run against the 
> dissected data. But when the results are returned the
initial data 
> with the non modified data should be returned.
>
> Does lucene GermanAnalyzer this job? I run across it,
but I could not 
> figure out from the documentation whether it does the
job or not.
>
> thanks a lot in advance.
>
> Matthias
>   



RE: Converting German special characters / umlaute
country flaguser name
United States
2007-10-24 16:18:56
Isn't this what ISOLatin1Filter does?  Turn Björk into
Bjork?  This should be much faster than
PatternReplaceFilterFactory.

-----Original Message-----
From: Matthias Eireiner [mailto:matthias.eireinerabitero.com] 
Sent: Wednesday, October 24, 2007 1:47 PM
To: solr-userlucene.apache.org
Subject: AW: Converting German special characters / umlaute

Dear list,

it has been some time, but here is what I did.
I had a look at Thomas Traeger's tip to use the
SnowballPorterFilterFactory, which does not actually do the
job.
Its purpose is to convert regular ASCII into special
characters. 

And I want it the other way, such that all special character
are converted to regular ASCII.
The tip of J.J. Larrea, to use the
PatternReplaceFilterFactory, solved the problem. 
 
And as Chris Hostetter noted, stored fields always return
the initial value, which turned the second part of my
question obsolete.

Thanks a lot for your help!

best
Matthias



-----Ursprüngliche Nachricht-----
Von: Thomas Traeger [mailto:t.traegerkabuco.de] 
Gesendet: Mittwoch, 26. September 2007 23:44
An: solr-userlucene.apache.org
Betreff: Re: Converting German special characters / umlaute


Try the SnowballPorterFilterFactory described here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenF
ilters

You should use the German2 variant that converts ä and ae to
a, ö and oe

to o and so on. More details:
http://snowball.tartarus.org/algorithms/german2/stem
mer.html

Every document in solr can have any number of fields which
might have 
the same source but have different field types and are
therefore handled

differently (stored as is, analyzed in different ways...).
Use copyField

in your schema.xml to feed your data into multiple fields.
During 
searching you decide which fields you like to search on
(usually the 
analyzed ones) and which you retrieve when getting the
document back.

Tom

Matthias Eireiner schrieb:
> Dear list,
>
> I have two questions regarding German special
characters or umlaute.
>
> is there an analyzer which automatically converts all
german special 
> characters to their specific dissected from, such as ü
to ue and ä to 
> ae, etc.?!
>
> I also would like to have, that the search is always
run against the 
> dissected data. But when the results are returned the
initial data 
> with the non modified data should be returned.
>
> Does lucene GermanAnalyzer this job? I run across it,
but I could not 
> figure out from the documentation whether it does the
job or not.
>
> thanks a lot in advance.
>
> Matthias
>   



[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )