List Info

Thread: Re: index U.K. U.S. U.N. U.V.




Re: index U.K. U.S. U.N. U.V.
user name
2007-07-16 19:12:15
Use KeywordAnalyzer to leave "U.S." as-is and
index it as-is.

Otis
--
Lucene Consulting -- http://lucene-consultin
g.com/


----- Original Message ----
From: crspan <crspangmail.com>
To: java-userlucene.apache.org
Sent: Saturday, July 14, 2007 5:18:59 PM
Subject: index U.K. U.S. U.N. U.V.

Would you please advice the best practice of indexing:

  U.S.

The standard analyzer will transform it to be
"us", which collide with 
"us"(we).

Thanks,

Charlie




------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org





------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: index U.K. U.S. U.N. U.V.
country flaguser name
United States
2007-07-16 22:16:37
Are we sure about KeywordAnalyzer here? Which suppose to 
"Tokenizes" 
the entire stream as a single token. (useful for data like
zip codes, 
ids, and some product names.)

In the scenario we are discussing,  U.S. is  just a  token
within the 
text and we still would like to leverage from
StandardAnalyzer for all 
other goodies. I am sorry for the incomplete set up in
previous message.

More or less, I expect somewhere we can instruct
StandardTokenizer.jj 
that U.S. is a special token (even it is indeed an ACRONYM)
and we 
prefer to index it as U.S. as is. Can we do that?

Charlie



Otis Gospodnetic wrote:
> Use KeywordAnalyzer to leave "U.S." as-is and
index it as-is.
>
> Otis
> --
> Lucene Consulting -- http://lucene-consultin
g.com/
>
>
> ----- Original Message ----
> From: crspan <crspangmail.com>
> To: java-userlucene.apache.org
> Sent: Saturday, July 14, 2007 5:18:59 PM
> Subject: index U.K. U.S. U.N. U.V.
>
> Would you please advice the best practice of indexing:
>
>   U.S.
>
> The standard analyzer will transform it to be
"us", which collide with 
> "us"(we).
>
> Thanks,
>
> Charlie


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )