List Info

Thread: Commented: (NUTCH-247) robot parser to restrict.




Commented: (NUTCH-247) robot parser to restrict.
country flaguser name
United States
2007-02-19 13:14:05
    [ https://issues.apache.org/jira/browse/N
UTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanel
s:comment-tabpanel#action_12474257 ] 

Sami Siren commented on NUTCH-247:
----------------------------------

> Setting even a bogus agent name is an insignificant
effort compared to the further complication of the code and
configuration options
I don't see how it complicates code if checking for data
needed for http is done in a place that only affects http.
What is wrong with the check in it's original place?

> robot parser to restrict.
> -------------------------
>
>                 Key: NUTCH-247
>                 URL: https
://issues.apache.org/jira/browse/NUTCH-247
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Stefan Groschupf
>         Assigned To: Dennis Kubes
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: agent-names.patch,
agent-names3.patch.txt
>
>
> If the agent name and the robots agents are not proper
configure the Robot rule parser uses LOG.severe to log the
problem but solve it also. 
> Later on the fetcher thread checks for severe errors
and stop if there is one.
> RobotRulesParser:
> if (agents.size() == 0) {
>       agents.add(agentName);
>       LOG.severe("No agents listed in
'http.robots.agents' property!");
>     } else if
(!((String)agents.get(0)).equalsIgnoreCase(agentName)) {
>       agents.add(0, agentName);
>       LOG.severe("Agent we advertise (" +
agentName
>                  + ") not listed first in
'http.robots.agents' property!");
>     }
> Fetcher.FetcherThread:
>  if (LogFormatter.hasLoggedSevere())     // something
bad happened
>             break;  
> I suggest to use warn or something similar instead of
severe to log this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )