List Info

Thread: Following tags




Following <form action> tags
user name
2006-05-18 11:43:43
Chris Schneider wrote:
> Gang,
>
> I had a webmaster complain that our crawler was
following his <form action> links. Although he admits
that his use of the GET method is a bit unorthodox, he feels
strongly that form submissions with input fields shouldn't
be followed by crawlers. Would it make sense to modify the
HTML parser so that it checked to see whether such input
fields exist before following <form action> links?
>
>   

I read through your email exchange, and setting aside all
emotional 
content I think this is a valid request - indeed, as far as
I can tell 
other major crawlers don't follow these links. We could
either remove 
this, or make it optional (default not to use them).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com


Following <form action> tags
user name
2006-05-19 18:17:51
Andrzej Bialecki wrote:
> I read through your email exchange, and setting aside
all emotional 
> content I think this is a valid request - indeed, as
far as I can tell 
> other major crawlers don't follow these links. We
could either remove 
> this, or make it optional (default not to use them).

Is this as simple as deleting line 60 from
DOMContentUtils.java (in the 
html-parser plugin)?

Doug
Following <form action> tags
user name
2006-05-19 18:24:52
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> I read through your email exchange, and setting
aside all emotional 
>> content I think this is a valid request - indeed,
as far as I can 
>> tell other major crawlers don't follow these
links. We could either 
>> remove this, or make it optional (default not to
use them).
>
> Is this as simple as deleting line 60 from
DOMContentUtils.java (in 
> the html-parser plugin)?

Yes.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )