List Info

Thread: Commented: (NUTCH-445) Domain İndexing / Query Filter




Commented: (NUTCH-445) Domain İndexing / Query Filter
country flaguser name
United States
2007-02-28 12:34:57
    [
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-445?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12476665 ] 

DOUG CUTTING COMMENTED ON NUTCH-445:
------------------------------------

SETTING THE BOOST TO NON-ZERO PERMITS A "SITE:"
QUERY WITH NO OTHER TERMS, BUT AT THE COST OF INHIBITING THE
CONVERSION OF THE CLAUSE TO A CACHED LUCENE FILTER, WHICH
CAN BE A SUBSTANTIAL OPTIMIZATION.  I THINK IT'S BETTER TO
LEAVE THE BOOST AS ZERO, AND THEN (SEPARATELY) FIX THE
CONVERSION-TO-FILTER CODE TO NOT PERFORM THIS OPTIMIZATION
WHEN NO OTHER QUERY TERMS ARE PRESENT.

> DOMAIN ?NDEXING / QUERY FILTER
> ------------------------------
>
>                 KEY: NUTCH-445
>                 URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-445
>             PROJECT: NUTCH
>          ISSUE TYPE: NEW FEATURE
>          COMPONENTS: INDEXER, SEARCHER
>    AFFECTS VERSIONS: 0.9.0
>            REPORTER: ENIS SOZTUTAR
>         ATTACHMENTS: INDEX_QUERY_DOMAIN_V1.0.PATCH,
INDEX_QUERY_DOMAIN_V1.1.PATCH,
INDEX_QUERY_DOMAIN_V1.2.PATCH,
TRANSLATINGRAWFIELDQUERYFILTER_V1.0.PATCH
>
>
> HOSTNAME'S CONTAIN INFORMATION ABOUT THE DOMAIN OF TH
HOST, AND ALL OF THE SUBDOMAINS. INDEXING AND SEARCHING THE
DOMAINS ARE IMPORTANT FOR INTUITIVE BEHAVIOR. 
> FROM DOMAININDEXINGFILTER JAVADOC : 
> ADDS THE DOMAIN(HOSTNAME) AND ALL SUPER DOMAINS TO THE
INDEX. 
>  * <BR> FOR HTTP://LUCENE.APACHE.ORG/NUTCH/ THE 
>  * FOLLOWING WILL BE ADDED TO THE INDEX : <BR> 
>  * <UL>
>  * <LI>LUCENE.APACHE.ORG </LI>
>  * <LI>APACHE</LI>
>  * <LI>ORG </LI>
>  * </UL>
>  * ALL HOSTNAMES ARE DOMAIN NAMES, BUT NOT ALL THE
DOMAIN NAMES ARE 
>  * HOSTNAMES. IN THE ABOVE EXAMPLE HOSTNAME LUCENE IS A

>  * SUBDOMAIN OF APACHE.ORG, WHICH IS ITSELF A SUBDOMAIN
OF 
>  * ORG <BR>
>  * 
>  
> CURRENTLY BASIC INDEXING FILTER INDEXES THE HOSTNAME IN
THE SITE FIELD, AND QUERY-SITE PLUGIN 
> ALLOWS TO SEARCH IN THE SITE FIELD. HOWEVER
SITE:APACHE.ORG WILL NOT RETURN HTTP://LUCENE.APACHE.ORG
>  BY INDEXING THE DOMAIN, WE CAN BE ABLE TO SEARCH
DOMAINS. UNLIKE 
>  THE SITE FIELD (INDEXED BY BASICINDEXINGFILTER)
SEARCH, SEARCHING THE 
>  DOMAIN FIELD ALLOWS US TO RETRIEVE LUCENE.APACHE.ORG
TO THE QUERY 
>  APACHE.ORG. 
>  

-- 
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )