List Info

Thread: blank space before special characters




blank space before special characters
country flaguser name
Switzerland
2007-11-05 06:02:02
Hello, 

 

I have the following problem with my lucene index.

 

When indexing fields containing special characters (like
&), a blank space
is inserted before the special character. For example: the
content
"L'article" is indexed as "L '"
(with a blank space between 'L' and
'&'). 

 

Is there any way to avoid that?

 

The characteristics of my field are the following: Indexed,
Tokenized,
Stored and Term Vector.

 

Thanks in advance for your help,

 

Leire

RE : blank space before special characters
country flaguser name
Switzerland
2007-11-05 06:09:46
Sorry, I did a mistake in my previous email.
The field "L'article" is indexed as "L
'article". The blank space is
inserted between 'L' and ''article'.

Thanks,

Leire

-----Message d'origine-----
De : Leire Urcelay [mailto:Leire.Urcelayunil.ch]

Envoyé : lundi, 5. novembre 2007 13:02
À : java-userlucene.apache.org
Objet : blank space before special characters

Hello, 

I have the following problem with my lucene index.

When indexing fields containing special characters (like
&), a blank space
is inserted before the special character. For example: the
content
"L'article" is indexed as "L '"
(with a blank space between 'L' and
'&'). 

Is there any way to avoid that?

The characteristics of my field are the following: Indexed,
Tokenized,
Stored and Term Vector.

Thanks in advance for your help,

Leire



------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: RE : blank space before special characters
user name
2007-11-05 12:34:59
There are several issues here....
1> How are you getting the entity reference? You must be
encoding
the stream (or getting it encoded for you). So the first
thing I'd do
is un-encode it.
2> After that, it's a question of what Filters/Analyzers
you're using.
Take a look at ISOLatin1AccentFilter. I'm unclear whether it
"closes up" the
case you're looking at, so be sure to check.
3> Since my peculiar situation can't use the Filter (the
character
set I'm using isn't standard), I've pre-processed the input
(both at
index and query time) to substitute the empty string for
the
apostrophe

Hope this helps
Erick

On 11/5/07, Leire Urcelay <Leire.Urcelayunil.ch> wrote:
>
> Sorry, I did a mistake in my previous email.
> The field "L'article" is indexed as "L
&apos;article". The blank space is
> inserted between 'L' and '&apos;article'.
>
> Thanks,
>
> Leire
>
> -----Message d'origine-----
> De: Leire Urcelay [mailto:Leire.Urcelayunil.ch]
> Envoyé: lundi, 5. novembre 2007 13:02
> À: java-userlucene.apache.org
> Objet: blank space before special characters
>
> Hello,
>
> I have the following problem with my lucene index.
>
> When indexing fields containing special characters
(like &), a blank space
> is inserted before the special character. For example:
the content
> "L'article" is indexed as "L
&apos;" (with a blank space between 'L' and
> '&amp;').
>
> Is there any way to avoid that?
>
> The characteristics of my field are the following:
Indexed, Tokenized,
> Stored and Term Vector.
>
> Thanks in advance for your help,
>
> Leire
>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
>
>
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )