List Info

Thread: do not index but follow links




do not index but follow links
user name
2007-12-18 17:56:43
Hi
In my pages I have a menu : "product1 product2
product3"
These menu items lead to pages describing each product.

Of course, if htdig indexes the pages normally, then
searching for 
"product1" gives all product pages because they
all have this menu in 
it..., while the only interesting page is product1

So I tried to use <!--htdig_noindex-->, but then the
menu is not scanned 
at all, and the products pages are not pushed and then not
indexed at all.

Is there any way to follow links in a section without
indexing the words 
in this section ?

I've searched a lot the docs and mailing list but could not
find any clue.

Thanks

-- 
Très cordialement,

Riccardo Cohen
-------------------------------------------
Articque
http://www.articque.com
149 av Général de Gaulle
37230 Fondettes - France
tel : 02-47-49-90-49
fax : 02-47-49-91-49


------------------------------------------------------------
-------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216
239;13503038;w?http://sf.net/marketplace
_______________________________________________
ht://Dig general mailing list: <htdig-generallists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral

Re: do not index but follow links
user name
2007-12-19 15:22:56
On Wed, 19 Dec 2007, Riccardo Cohen wrote:

> In my pages I have a menu : "product1 product2
product3"
> These menu items lead to pages describing each
product.
> 
> Of course, if htdig indexes the pages normally, then
searching for 
> "product1" gives all product pages because
they all have this menu in 
> it..., while the only interesting page is product1
> 
> So I tried to use <!--htdig_noindex-->, but then
the menu is not scanned 
> at all, and the products pages are not pushed and then
not indexed at all.
> 
> Is there any way to follow links in a section without
indexing the words 
> in this section ?

If you don't need to validate the pages in question, I
believe you can 
enclose the relevant section in <noindex
follow></noindex> tags. Ugly in 
terms of standards compliance, but a quick fix if you need
it. A cleaner 
solution, but one requiring a lot more work, would be to
write a parser
that removes the text that you don't want indexed and then
apply the
parser as a preprocessor using ht://Dig's external_parsers
attribute.

  http://www.htdig.org/dev/htdig-3.2/attrs.html#exte
rnal_parsers


Jim

------------------------------------------------------------
-------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216
239;13503038;w?http://sf.net/marketplace
_______________________________________________
ht://Dig general mailing list: <htdig-generallists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral

Re: do not index but follow links
user name
2007-12-20 02:23:08
That seems to fit perfectly my need, and I can see now the
chapter in 
the FAQ !

Thanks a lot

ps: I don't mind if the tag is not in xhtml dtd... as long
as browser 
and htdig can parse it...

Jim wrote:
> On Wed, 19 Dec 2007, Riccardo Cohen wrote:
> 
>> In my pages I have a menu : "product1 product2
product3"
>> These menu items lead to pages describing each
product.
>>
>> Of course, if htdig indexes the pages normally,
then searching for 
>> "product1" gives all product pages
because they all have this menu in 
>> it..., while the only interesting page is product1
>>
>> So I tried to use <!--htdig_noindex-->, but
then the menu is not scanned 
>> at all, and the products pages are not pushed and
then not indexed at all.
>>
>> Is there any way to follow links in a section
without indexing the words 
>> in this section ?
> 
> If you don't need to validate the pages in question, I
believe you can 
> enclose the relevant section in <noindex
follow></noindex> tags. Ugly in 
> terms of standards compliance, but a quick fix if you
need it. A cleaner 
> solution, but one requiring a lot more work, would be
to write a parser
> that removes the text that you don't want indexed and
then apply the
> parser as a preprocessor using ht://Dig's
external_parsers attribute.
> 
>   http://www.htdig.org/dev/htdig-3.2/attrs.html#exte
rnal_parsers
> 
> 
> Jim
> 

-- 
Très cordialement,

Riccardo Cohen
-------------------------------------------
Articque
http://www.articque.com
149 av Général de Gaulle
37230 Fondettes - France
tel : 02-47-49-90-49
fax : 02-47-49-91-49

------------------------------------------------------------
-------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216
239;13503038;w?http://sf.net/marketplace
_______________________________________________
ht://Dig general mailing list: <htdig-generallists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )