On Oct 24, 2007, at 3:07 AM, Liaqat Ali wrote:
> Hi All,
>
> I m developing a search engine for Urdu language. I
want to use
> lucene for that purpose. Now the situation is that
>
> ---I have a corpus of 2000 Urdu(Variant of Persian and
Arabic)
> documents in XML form, how i will make index of them
using Lucene.
You will have to use some sort of XML Parser (SAX or a pull
parser)
to extract the content you want and create Lucene Documents.
Have a
look at the tutorial on the Lucene home page for examples
> ---Well there will be need some stemming techniques
while indexing,
> because there is no stemmer available for Urdu
language.
You will have to write your own, more than likely. There
are some
Arabic analyzers out there, perhaps you could use them as a
starting
point.
>
> ---I have developed a GUI using HTML and have a Java
Servlets for
> searching, so how i will integrate Lucene with my own
servlets.
This really is up to you, but essentially you need to setup
an
IndexSearcher and create queries to do searches. Again,
have a look
at the tutorial as a way of getting started.
--------------------------
Grant Ingersoll
http://lucene.granti
ngersoll.com
Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://
www.apachecon.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://w
iki.apache.org/lucene-java/LuceneFAQ
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|