On 3/24/08, Andreas Hartmann <andreas apache.org> wrote:
> solprovider apache.org schrieb:
> > On 3/24/08, Andreas Hartmann <andreas apache.org> wrote:
> >> atm the index fields are configured for each
publication:
> >> <index id="default-live"
analyzer="stopword_en"
> >>
directory="lenya/pubs/default/work/lucene/index/live/in
dex">
> >> <structure>
> >> <field id="url"
type="keyword" />
> >> <field id="title"
type="text" storetext="true"/>
> >> <field id="description"
type="text" storetext="true"/>
> >> <field id="subject"
type="keyword" storetext="true" />
> >> <field id="body"
type="text" storetext="true"/>
> >> </structure>
> >> </index>
> >> IMO this is an inappropriate place for this
configuration. Furthermore,
> >> it has to match the index XSLTs of all
resource types.
> >>
> >> Wouldn't it be better to
> >> - index all meta data fields
> >> - configure the indexable fields for each
resource type (have to
> >> conform to the corresponding index XSLTs)
> >>
> >> The index structure would be automatically
derived from this
> >> configuration (basically the union of all
fields). Changing the meta
> >> data or resource type configuration would
certainly require to re-index
> >> the whole content of the web application,
but IMO this is not a big issue.
> >>
> >> WDYT?
> >> -- Andreas
> >
> > I agree one configuration for all publications is
a worthy goal. My
> > version was an add-on to a Lenya 1.2.2
Publication and so was not
> > concerned with integration into core Lenya. The
current
> > implementation may be derivative.
> >
> > Extracting all data from any document is great
for the search terms.
> > All text should be included, or do we have
field-level security?
> No, we only have document-level security.
The question was to provoke thought about future
enhancements. (I
have not planned to include field-level security in
Lenya-1.3.0, but
my planning includes not adding obstacles for recognized
possible
improvements.) Security requires three functions:
1. Hide unauthorized information, handled by the display
system.
2. Hide unauthorized pages from menus, handled by the
navigation system.
3. Prevent search from using unauthorized information. This
must be
handled by the search system (our current topic.) The most
difficult
aspect of developing field-level (or any) security is
preventing
search from creating security holes so mentioning possible
enhancements seemed useful to this discussion.
> > Should all properties be included? Should the
properties be
> > associated with the field (element) name?
> ATM this is up to the resource type (done using a
> 2index.xsl stylesheet), and IMO we can
leave it like this,
> e.g. map
> <person>
> <name>Henry Hamster</name>
> </person>
> to field
> <lucene:document>
> <lucene:field
name="personName">Henry
Hamster</lucene:field>
> </lucene:document>
>
> It would be nice to have namespaced field names,
though, to avoid
> clashes (see my other mail).
> -- Andreas
No special work is needed since search indexes all text and
"Henry
Hamster" is text.
Search indexing issues arise when the data is stored:
<author name="Gabby Gerbil"/>
rather than:
<author>Gabby Gerbil</author>
Or are you concerned with searches based on particular
fields, such as
searching by author? Many search systems have stopped
providing those
options from apparent lack of use; people dislike organizing
search
terms into multiple fields.
solprovider
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe lenya.apache.org
For additional commands, e-mail: dev-help lenya.apache.org
|