List Info

Thread: Removing brackets before indexing




Removing brackets before indexing
user name
2006-05-31 15:47:12
Hello!

I am currently trying to index latin language documents, in
which
missing letters are appended to words by using square
brackets, like
this : "[divinit]atis". 

Could you tell me please which would be the best practice to
remove the
brackets before adding into the Lucene index? (in the
example to store
the word "divinitatis").

Thank you a lot,
Mile Rosu

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

Removing brackets before indexing
user name
2006-05-31 16:36:20
Mile,

Any Analyzer that uses a Tokenizer that throws out
non-characters will do.
For example, take a look at SimpleAnalyzer.  It uses
LowerCaseTokenizer.  If you read the javadoc for
LowerCaseTokenizer, I think you will see it suits you.

Otis

----- Original Message ----
From: Mile Rosu <mile.rosulevel7.ro>
To: java-userlucene.apache.org
Sent: Wednesday, May 31, 2006 11:47:12 AM
Subject: Removing brackets before indexing

Hello!

I am currently trying to index latin language documents, in
which
missing letters are appended to words by using square
brackets, like
this : "[divinit]atis". 

Could you tell me please which would be the best practice to
remove the
brackets before adding into the Lucene index? (in the
example to store
the word "divinitatis").

Thank you a lot,
Mile Rosu

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org





------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )