31 jan 2007 kl. 12.25 skrev Christoph Pächter:
>
> I was wondering, if there is anywhere a table (similar
to Table 1.2
> An overview
> of different field types, their characteristics, and
their usage in
> Lucene in
> Action), listing the possible methods and their usage.
Implementations will differ, for example:
>
> Store |TermVector |Index
|reasonable |Usage
> YES |NO |NO |1
|URLs
>
|
> telephone number
You never have to store anything in the index, perhaps that
information is persistent somewhere else?
If you use a term vector or not depends very little on what
kind of
information you store in there, it is up to what analysis
you plan to
include the documents in. Highlighting? More like this?
Neural networks?
Some are more than happy with one large token. Other people
might
want to tokenize the exact same information.
An URL in [protocol://host:port/path], a phone number in
country-,
area, and district parts.
It really up to each and every implementer to decide what
settings is
best for them.
Also, a Lucene index is not made up of static rows and
columns the
way a relational database is. The spoon does not exists. You
can bend
it any way you want. Documents in a corpus can share field
that share
names but not settings. Perhaps you only want to index phone
numbers
in a specific area code.
--
karl
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
For additional commands, e-mail: java-user-help lucene.apache.org
|