List Info

Thread: Re: doc2html - indexed but no hits




Re: doc2html - indexed but no hits
country flaguser name
United Kingdom
2007-05-10 10:21:17
 In this case I can be fairly sure they were not called!
Note the line that says 'not changed' ? Not sure how
extensive your
indexes are, or if you are in a production status, but you
may want to
add the -i  flag to do an index from scratch.  From memory,
the -s  flag
turns on a set of summary statistics, which may include
useful info.
During a normal run at the correct level, you should see a
line like
++++---++-
for each file that you index.  www.htdig.org  can reveal
what these
symbols mean - I can't remember off hand, but this helps to
indicate
what is actually found inside a document. Check also that
htmerge is
running at a similar verbosity setting. 

On my system, doc2html etc is called from an intermediate
DOS batch
file, which is an easy place to put in an extra bit of
logging.
Alternatively, you may be brave enough to put a debug line
into doc2html
itself - it is just a bit of PERL if I remember correctly.

Mike
NB, I have copied this back to the list - not sure if you
meant to send
this direct, I get that wrong all the time!

> -----Original Message-----
> From: CHUN KI SHIN [mailto:ckshin0121hotmail.com] 
> Sent: Thursday, May 10, 2007 4:12 PM
> To: Brockington,MJ,Michael,JPGA4X R
> Subject: Re: [htdig] doc2html - indexed but no hits
> 
> Thanks for the quick response, Mike.
> 
> Ok, I ran the script with -vv, and I don't know what
I'm 
> looking for from 
> the index log. Only thing I can see is the following:
> 
> pick: devserverxxx.com, # servers = 1
> 234:31:2:https://devserverxxx.com/library/ADJA/docs/portlet-1_
> 0-fr-spec.pdf: 
>   not changed
> 
> and the same for .doc.
> 
> Could you tell me how to make sure doc2html is being
called?
> 
> Also, what do you mean by 'statistics' in htdig?
> 
> Thanks for your time and help!
> 
> >From: <michael.brockingtonbt.com>
> >To: <htdig-generallists.sourceforge.net>
> >Subject: Re: [htdig] doc2html - indexed but no
hits
> >Date: Thu, 10 May 2007 14:14:59 +0100
> >MIME-Version: 1.0
> >Received: from lists-outbound.sourceforge.net
([66.35.250.225]) by 
> >bay0-mc10-f3.bay0.hotmail.com with Microsoft 
> SMTPSVC(6.0.3790.2668); Thu, 
> >10 May 2007 06:15:16 -0700
> >Received: from sc8-sf-list1-new.sourceforge.net 
> >(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by

> >sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid

> 05C7C12E15; Thu, 10 May 
> >2007 06:15:16 -0700 (PDT)
> >Received: from sc8-sf-mx1-b.sourceforge.net 
> >([10.3.1.91]helo=mail.sourceforge.net)by 
> sc8-sf-list1-new.sourceforge.net 
> >with esmtp (Exim 4.43)id 1Hm8U9-0004LN-Hnfor 
> >htdig-generallists.sourceforge.net; Thu, 10 May 2007
06:15:09 -0700
> >Received: from smtp2.smtp.bt.com
([217.32.164.150])by 
> mail.sourceforge.net 
> >with esmtp (Exim 4.44) id 1Hm8U7-0004Pw-NFfor 
> >htdig-generallists.sourceforge.net; Thu, 10 May 2007
06:15:09 -0700
> >Received: from I2KF03CV-UKBR.domain1.systemhost.net

> ([193.113.197.43]) 
> >bysmtp2.smtp.bt.com with Microsoft
SMTPSVC(6.0.3790.1830); 
> Thu, 10 May 2007 
> >14:15:00 +0100
> >Received: from E03MVZ4-UKDY.domain1.systemhost.net
([193.113.30.63]) 
> >byI2KF03CV-UKBR.domain1.systemhost.net with 
> MicrosoftSMTPSVC(6.0.3790.211); 
> >Thu, 10 May 2007 14:15:00 +0100
> >X-Message-Info: 
>
>LsUYwwHHNt3igTN6QK+bgHeD79v5SZW0Ne7jEEII55/mb39+2hv8+2ps
07jKcsv0
> >X-MimeOLE: Produced By Microsoft Exchange V6.5
> >Content-class: urn:content-classes:message
> >X-MS-Has-Attach: X-MS-TNEF-Correlator:
Thread-Topic: [htdig] 
> doc2html - 
> >indexed but no hits
> >Thread-Index: AceTAM4rcEeX2/+QTI2LarpwABt5LAABAOJg
> >X-OriginalArrivalTime: 10 May 2007 13:15:00.0122 
> >(UTC)FILETIME=[3676BFA0:01C79305]
> >X-Spam-Score: 1.2 (+)
> >X-Spam-Report: Spam Filtering performed by
sourceforge.net.See 
> >http://spamassassin.org/
tag/ for more details.Report problems 
> >tohttp://sf.net/tracker/?func=add&group_id=1&
amp;atid=2000010.2 
> NO_REAL_NAME    
> >        From: does not include a real name1.0
FORGED_RCVD_HELO       
> >Received: contains a forged HELO
> >X-BeenThere: htdig-generallists.sourceforge.net
> >X-Mailman-Version: 2.1.8
> >Precedence: list
> >List-Id: "A mailing list for general ht://Dig

>
>discussion"<htdig-general.lists.sourceforge.net&
gt;
> >List-Unsubscribe: 
> ><https://lists.sourceforge.net/lists/listinfo/htdig
-general>, 
> ><mailto:htdig-general-requestlists.sourceforge.net?subject=u
> nsubscribe>
> >List-Archive: 
> ><http://sourceforge.net/mailarchive/forum.php
?forum=htdig-general>
> >List-Post: <mailto:htdig-generallists.sourceforge.net>
> >List-Help: 
> ><mailto:htdig-general-requestlists.sourceforge.net?subject=help>
> >List-Subscribe: 
> ><https://lists.sourceforge.net/lists/listinfo/htdig
-general>, 
> ><mailto:htdig-general-requestlists.sourceforge.net?subject=s
> ubscribe>
> >Errors-To: htdig-general-bounceslists.sourceforge.net
> >Return-Path: htdig-general-bounceslists.sourceforge.net
> >
> >Can you tell if  doc2html is actually being called
by htdig? Just
> >because htdig is downloading the document, it does
not 
> guarantee that it
> >is being passed over for conversion to an indexable
format.
> >It might be worth decreasing the number of  v's you
are 
> using by one or
> >two so that you can see what is being found in each

> document. Not sure
> >if you have the 'statistics' turned on?
> >
> >Regards,
> >Mike
> >
> > > -----Original Message-----
> > > From: htdig-general-bounceslists.sourceforge.net
> > > [mailto:htdig-general-bounceslists.sourceforge.net] On
> > > Behalf Of CHUN KI SHIN
> > > Sent: Thursday, May 10, 2007 1:43 PM
> > > To: htdig-generallists.sourceforge.net
> > > Subject: [htdig] doc2html - indexed but no
hits
> > >
> > > I've been trying to index .pdf and .doc
documents in v. 
> 3.2.0b with
> > > doc2html/catdoc/pdf2html.
> > > I can see both types indexed fine (though I'm
not sure why
> > > log doesn't tell
> > > which words and tags have been indexed). See
below:
> > >
> >
>
>--------------------------------------------------------
-----
> ------------
> >This SF.net email is sponsored by DB2 Express
> >Download DB2 Express C - the FREE version of DB2
express and take
> >control of your XML. No limits. Just data. Click to
get it now.
> >http://sourcefor
ge.net/powerbar/db2/
> >_______________________________________________
> >ht://Dig general mailing list:
<htdig-generallists.sourceforge.net>
> >ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
> >List information (subscribe/unsubscribe, etc.)
> >https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral
> 
>
____________________________________________________________
_____
> PC Magazine's 2007 editors' choice for best Web 
> mail-award-winning Windows 
> Live Hotmail. 
> http://imagine-windowslive.com/hotmail/?locale
=en-us&ocid=TXT_
> TAGHM_migration_HM_mini_pcmag_0507
> 
> 

------------------------------------------------------------
-------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and
take
control of your XML. No limits. Just data. Click to get it
now.
http://sourcefor
ge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <htdig-generallists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral

Re: doc2html - indexed but no hits
country flaguser name
United States
2007-05-10 13:31:43
Mike,

It looks you are right. I reindexed the docs with -i -s -v
option and got 
the following:

bt.com>
>To: <htdig-generallists.sourceforge.net>
>Subject: Re: [htdig] doc2html - indexed but no hits
>Date: Thu, 10 May 2007 16:21:17 +0100
>MIME-Version: 1.0
>Received: from lists-outbound.sourceforge.net
([66.35.250.225]) by 
>bay0-mc5-f8.bay0.hotmail.com with Microsoft
SMTPSVC(6.0.3790.2668); Thu, 10 
>May 2007 08:21:49 -0700
>Received: from sc8-sf-list1-new.sourceforge.net 
>(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by 
>sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid
209C1123C2; Thu, 10 May 
>2007 08:21:49 -0700 (PDT)
>Received: from sc8-sf-mx2-b.sourceforge.net 
>([10.3.1.92]helo=mail.sourceforge.net)by
sc8-sf-list1-new.sourceforge.net 
>with esmtp (Exim 4.43)id 1HmASO-0002S7-L2for 
>htdig-generallists.sourceforge.net; Thu, 10 May 2007
08:21:28 -0700
>Received: from smtp2.smtp.bt.com ([217.32.164.150])by
mail.sourceforge.net 
>with esmtp (Exim 4.44) id 1HmASM-0006nn-Cxfor 
>htdig-generallists.sourceforge.net; Thu, 10 May 2007
08:21:28 -0700
>Received: from I2KF03BV-UKBR.domain1.systemhost.net
([193.113.197.45]) 
>bysmtp2.smtp.bt.com with Microsoft
SMTPSVC(6.0.3790.1830); Thu, 10 May 2007 
>16:21:19 +0100
>Received: from E03MVZ4-UKDY.domain1.systemhost.net
([193.113.30.63]) 
>byI2KF03BV-UKBR.domain1.systemhost.net with
MicrosoftSMTPSVC(6.0.3790.211); 
>Thu, 10 May 2007 16:21:19 +0100
>X-Message-Info: 
>LsUYwwHHNt3igTN6QK+bgFoRqCYjqfvL2Ze/1rHnaFaU0TpcCHeSaTTF
0/ZTrvaR
>X-MimeOLE: Produced By Microsoft Exchange V6.5
>Content-class: urn:content-classes:message
>X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic:
[htdig] doc2html - 
>indexed but no hits
>Thread-Index: AceTFYj6xqGt5BctR2GUiMyIWArneQAAB83g
>X-OriginalArrivalTime: 10 May 2007 15:21:19.0062 
>(UTC)FILETIME=[DBDD3760:01C79316]
>X-Spam-Score: 1.2 (+)
>X-Spam-Report: Spam Filtering performed by
sourceforge.net.See 
>http://spamassassin.org/
tag/ for more details.Report problems 
>tohttp://sf.net/tracker/?func=add&group_id=1&
amp;atid=2000010.2 NO_REAL_NAME    
>        From: does not include a real name1.0
FORGED_RCVD_HELO       
>Received: contains a forged HELO
>X-BeenThere: htdig-generallists.sourceforge.net
>X-Mailman-Version: 2.1.8
>Precedence: list
>List-Id: "A mailing list for general ht://Dig 
>discussion"<htdig-general.lists.sourceforge.net&
gt;
>List-Unsubscribe: 
><https://lists.sourceforge.net/lists/listinfo/htdig
-general>, 
><mailto:htdig-general-requestlists.sourceforge.net?subject=unsubscribe>
>List-Archive: 
><http://sourceforge.net/mailarchive/forum.php
?forum=htdig-general>
>List-Post: <mailto:htdig-generallists.sourceforge.net>
>List-Help: 
><mailto:htdig-general-requestlists.sourceforge.net?subject=help>
>List-Subscribe: 
><https://lists.sourceforge.net/lists/listinfo/htdig
-general>, 
><mailto:htdig-general-requestlists.sourceforge.net?subject=subscribe>
>Errors-To: htdig-general-bounceslists.sourceforge.net
>Return-Path: htdig-general-bounceslists.sourceforge.net
>
>  In this case I can be fairly sure they were not
called!
>Note the line that says 'not changed' ? Not sure how
extensive your
>indexes are, or if you are in a production status, but
you may want to
>add the -i  flag to do an index from scratch.  From
memory, the -s  flag
>turns on a set of summary statistics, which may include
useful info.
>During a normal run at the correct level, you should see
a line like
>++++---++-
>for each file that you index.  www.htdig.org  can reveal
what these
>symbols mean - I can't remember off hand, but this helps
to indicate
>what is actually found inside a document. Check also
that htmerge is
>running at a similar verbosity setting.
>
>On my system, doc2html etc is called from an
intermediate DOS batch
>file, which is an easy place to put in an extra bit of
logging.
>Alternatively, you may be brave enough to put a debug
line into doc2html
>itself - it is just a bit of PERL if I remember
correctly.
>
>Mike
>NB, I have copied this back to the list - not sure if
you meant to send
>this direct, I get that wrong all the time!
>
> > -----Original Message-----
> > From: CHUN KI SHIN [mailto:ckshin0121hotmail.com]
> > Sent: Thursday, May 10, 2007 4:12 PM
> > To: Brockington,MJ,Michael,JPGA4X R
> > Subject: Re: [htdig] doc2html - indexed but no
hits
> >
> > Thanks for the quick response, Mike.
> >
> > Ok, I ran the script with -vv, and I don't know
what I'm
> > looking for from
> > the index log. Only thing I can see is the
following:
> >
> > pick: devserverxxx.com, # servers = 1
> > 234:31:2:https://devserverxxx.com/library/ADJA/docs/portlet-1_
> > 0-fr-spec.pdf:
> >   not changed
> >
> > and the same for .doc.
> >
> > Could you tell me how to make sure doc2html is
being called?
> >
> > Also, what do you mean by 'statistics' in htdig?
> >
> > Thanks for your time and help!
> >
> > >From: <michael.brockingtonbt.com>
> > >To: <htdig-generallists.sourceforge.net>
> > >Subject: Re: [htdig] doc2html - indexed but no
hits
> > >Date: Thu, 10 May 2007 14:14:59 +0100
> > >MIME-Version: 1.0
> > >Received: from lists-outbound.sourceforge.net
([66.35.250.225]) by
> > >bay0-mc10-f3.bay0.hotmail.com with Microsoft
> > SMTPSVC(6.0.3790.2668); Thu,
> > >10 May 2007 06:15:16 -0700
> > >Received: from
sc8-sf-list1-new.sourceforge.net
> > >(sc8-sf-list1-new-b.sourceforge.net
[10.3.1.93])by
> > >sc8-sf-spam2.sourceforge.net (Postfix) with
ESMTPid
> > 05C7C12E15; Thu, 10 May
> > >2007 06:15:16 -0700 (PDT)
> > >Received: from sc8-sf-mx1-b.sourceforge.net
> > >([10.3.1.91]helo=mail.sourceforge.net)by
> > sc8-sf-list1-new.sourceforge.net
> > >with esmtp (Exim 4.43)id 1Hm8U9-0004LN-Hnfor
> > >htdig-generallists.sourceforge.net; Thu,
10 May 2007 06:15:09 -0700
> > >Received: from smtp2.smtp.bt.com
([217.32.164.150])by
> > mail.sourceforge.net
> > >with esmtp (Exim 4.44) id 1Hm8U7-0004Pw-NFfor
> > >htdig-generallists.sourceforge.net; Thu,
10 May 2007 06:15:09 -0700
> > >Received: from
I2KF03CV-UKBR.domain1.systemhost.net
> > ([193.113.197.43])
> > >bysmtp2.smtp.bt.com with Microsoft
SMTPSVC(6.0.3790.1830);
> > Thu, 10 May 2007
> > >14:15:00 +0100
> > >Received: from
E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63])
> > >byI2KF03CV-UKBR.domain1.systemhost.net with
> > MicrosoftSMTPSVC(6.0.3790.211);
> > >Thu, 10 May 2007 14:15:00 +0100
> > >X-Message-Info:
> >
>LsUYwwHHNt3igTN6QK+bgHeD79v5SZW0Ne7jEEII55/mb39+2hv8+2ps
07jKcsv0
> > >X-MimeOLE: Produced By Microsoft Exchange
V6.5
> > >Content-class: urn:content-classes:message
> > >X-MS-Has-Attach: X-MS-TNEF-Correlator:
Thread-Topic: [htdig]
> > doc2html -
> > >indexed but no hits
> > >Thread-Index:
AceTAM4rcEeX2/+QTI2LarpwABt5LAABAOJg
> > >X-OriginalArrivalTime: 10 May 2007
13:15:00.0122
> > >(UTC)FILETIME=[3676BFA0:01C79305]
> > >X-Spam-Score: 1.2 (+)
> > >X-Spam-Report: Spam Filtering performed by
sourceforge.net.See
> > >http://spamassassin.org/
tag/ for more details.Report problems
> > >tohttp://sf.net/tracker/?func=add&group_id=1&
amp;atid=2000010.2
> > NO_REAL_NAME
> > >        From: does not include a real name1.0
FORGED_RCVD_HELO
> > >Received: contains a forged HELO
> > >X-BeenThere: htdig-generallists.sourceforge.net
> > >X-Mailman-Version: 2.1.8
> > >Precedence: list
> > >List-Id: "A mailing list for general
ht://Dig
> >
>discussion"<htdig-general.lists.sourceforge.net&
gt;
> > >List-Unsubscribe:
> > ><https://lists.sourceforge.net/lists/listinfo/htdig
-general>,
> > ><mailto:htdig-general-requestlists.sourceforge.net?subject=u
> > nsubscribe>
> > >List-Archive:
> > ><http://sourceforge.net/mailarchive/forum.php
?forum=htdig-general>
> > >List-Post: <mailto:htdig-generallists.sourceforge.net>
> > >List-Help:
> > ><mailto:htdig-general-requestlists.sourceforge.net?subject=help>
> > >List-Subscribe:
> > ><https://lists.sourceforge.net/lists/listinfo/htdig
-general>,
> > ><mailto:htdig-general-requestlists.sourceforge.net?subject=s
> > ubscribe>
> > >Errors-To: htdig-general-bounceslists.sourceforge.net
> > >Return-Path: htdig-general-bounceslists.sourceforge.net
> > >
> > >Can you tell if  doc2html is actually being
called by htdig? Just
> > >because htdig is downloading the document, it
does not
> > guarantee that it
> > >is being passed over for conversion to an
indexable format.
> > >It might be worth decreasing the number of 
v's you are
> > using by one or
> > >two so that you can see what is being found in
each
> > document. Not sure
> > >if you have the 'statistics' turned on?
> > >
> > >Regards,
> > >Mike
> > >
> > > > -----Original Message-----
> > > > From: htdig-general-bounceslists.sourceforge.net
> > > > [mailto:htdig-general-bounceslists.sourceforge.net] On
> > > > Behalf Of CHUN KI SHIN
> > > > Sent: Thursday, May 10, 2007 1:43 PM
> > > > To: htdig-generallists.sourceforge.net
> > > > Subject: [htdig] doc2html - indexed but
no hits
> > > >
> > > > I've been trying to index .pdf and .doc
documents in v.
> > 3.2.0b with
> > > > doc2html/catdoc/pdf2html.
> > > > I can see both types indexed fine
(though I'm not sure why
> > > > log doesn't tell
> > > > which words and tags have been indexed).
See below:
> > > >
> > >
> >
>--------------------------------------------------------
-----
> > ------------
> > >This SF.net email is sponsored by DB2 Express
> > >Download DB2 Express C - the FREE version of
DB2 express and take
> > >control of your XML. No limits. Just data.
Click to get it now.
> > >http://sourcefor
ge.net/powerbar/db2/
> >
>_______________________________________________
> > >ht://Dig general mailing list:
<htdig-generallists.sourceforge.net>
> > >ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
> > >List information (subscribe/unsubscribe,
etc.)
> > >https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral
> >
> >
____________________________________________________________
_____
> > PC Magazine's 2007 editors' choice for best Web
> > mail-award-winning Windows
> > Live Hotmail.
> > http://imagine-windowslive.com/hotmail/?locale
=en-us&ocid=TXT_
> > TAGHM_migration_HM_mini_pcmag_0507
> >
> >
>
>--------------------------------------------------------
-----------------
>This SF.net email is sponsored by DB2 Express
>Download DB2 Express C - the FREE version of DB2 express
and take
>control of your XML. No limits. Just data. Click to get
it now.
>http://sourcefor
ge.net/powerbar/db2/
>_______________________________________________
>ht://Dig general mailing list: <htdig-generallists.sourceforge.net>
>ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
>List information (subscribe/unsubscribe, etc.)
>https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral

____________________________________________________________
_____
See what you’re getting into…before you go there 
http://newlivehotmail.com/?ocid=TXT_TAG
HM_migration_HM_viral_preview_0507



------------------------------------------------------------
-------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and
take
control of your XML. No limits. Just data. Click to get it
now.
http://sourcefor
ge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <htdig-generallists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )