[ https://issues.apache.org/jira/browse/N
UTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanel
s:comment-tabpanel#action_12474269 ]
Sami Siren commented on NUTCH-247:
----------------------------------
I am OK with the efforts making things more user friendly
but still doing checks and specifically blocking fetching
based on property that isn't relevant to task at you hands
does not sound so user friendly to me. I also believe that
at the time you would run your task in massive cluster where
allocating resources has any significance you couldn't have
missed the log lines.
Also wouldn't it be better to but that functionality
somewhere else but Fethcer (Utility class, if a class
relevant to http is not appropriate?) unless you like to
start maintenance of multiple sets of those checking rules -
in fetcher you would then use delegation?
> robot parser to restrict.
> -------------------------
>
> Key: NUTCH-247
> URL: https
://issues.apache.org/jira/browse/NUTCH-247
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8
> Reporter: Stefan Groschupf
> Assigned To: Dennis Kubes
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: agent-names.patch,
agent-names3.patch.txt
>
>
> If the agent name and the robots agents are not proper
configure the Robot rule parser uses LOG.severe to log the
problem but solve it also.
> Later on the fetcher thread checks for severe errors
and stop if there is one.
> RobotRulesParser:
> if (agents.size() == 0) {
> agents.add(agentName);
> LOG.severe("No agents listed in
'http.robots.agents' property!");
> } else if
(!((String)agents.get(0)).equalsIgnoreCase(agentName)) {
> agents.add(0, agentName);
> LOG.severe("Agent we advertise (" +
agentName
> + ") not listed first in
'http.robots.agents' property!");
> }
> Fetcher.FetcherThread:
> if (LogFormatter.hasLoggedSevere()) // something
bad happened
> break;
> I suggest to use warn or something similar instead of
severe to log this problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|