I am able to fix the problem of last email and go through
the command of
whole-web site crawl from nutch-0.8.x tutorial.
But the resultant folder crawl is still very small, and the
last search of
"apache", I got the "hit 0" message.
Something is still wrong.
Please give me some feedback.
Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Tsengtan A Shuy [mailto:ttashuy sbcglobal.net]
Sent: Saturday, July 14, 2007 12:11 PM
To: nutch-dev lucene.apache.org
Subject: inject command fail on whole-web run
I am running the "bin/nutch inject crawl/crawldb
dmoz" command on my ubuntu
OS by following the nutch-0.8.x tutorial. But I got the
following error
message:
2007-07-14 11:38:35,238 WARN mapred.LocalJobRunner
(LocalJobRunner.java:run(120)) - job_ij0atx
java.lang.NoClassDefFoundError:
dk/brics/automaton/RunAutomaton
at
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter$Rule
.<init>(Automato
nURLFilter.java:89)
at
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter.crea
teRule(Automaton
URLFilter.java:70)
at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRulesF
ile(RegexURLFilt
erBase.java:191)
at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.setConf(Re
gexURLFilterBase
.java:140)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Exten
sion.java:153)
at
org.apache.nutch.net.URLFilters.<init>(URLFilters.java
:53)
at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injec
tor.java:56)
at
org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:44
3)
at
org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:
33)
at
org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:44
3)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:125)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunn
er.java:91)
Exception in thread "main" java.io.IOException:
Job failed!
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357
)
at
org.apache.nutch.crawl.Injector.inject(Injector.java:138)
at
org.apache.nutch.crawl.Injector.main(Injector.java:164)
adamshuy adamshuy-desktop:~/nutch-0.8.1$
What is wrong in my ubuntu environment?
Please help!!
Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
|