I have a problem with partitioning. After selecting phase of
generating
which completed succesfully i got a following exception:
Generator: org.apache.hadoop.ipc.RemoteException:
java.io.IOException:
Cannot open filename
/tmp/hadoop-nutch/mapred/temp/generate-temp-1194364974836/_t
ask_200711051139_0323_r_000007_0
at
org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
at
sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
at
org.apache.hadoop.ipc.Client.call(Client.java:482)
at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
at org.apache.hadoop.dfs.$Proxy0.open(Unknown
Source)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth
od(
RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.open(Unknown
Source)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(
DFSClient.java:848)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(
DFSClient.java:840)
at
org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:285)
at org.apache.hadoop.dfs.DistributedFileSystem.open(
DistributedFileSystem.java:114)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(Sequen
ceFile.java
:1356)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(Sequen
ceFile.java
:1349)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(Sequen
ceFile.java
:1344)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders
(
SequenceFileOutputFormat.java:87)
at
org.apache.nutch.crawl.Generator.generate(Generator.java:429
)
at
org.apache.nutch.crawl.Generator.run(Generator.java:563)
at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
at
org.apache.nutch.crawl.Generator.main(Generator.java:526)
On my client node in taskctracker log i found a following
exception
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
not find
task_200711051139_0322_m_000064_0/file.out.index in any of
the configured
local director
ies
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.g
etLocalPathToRead
(LocalDirAllocator.java:327)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(
LocalDirAllocator.java:138)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(
TaskTracker.java:1923)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder
.java
:427)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(
WebApplicationHandler.java:475)
at org.mortbay.jetty.servlet.ServletHandler.handle(
ServletHandler.java:567)
at
org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(
WebApplicationContext.java:635)
at
org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at
org.mortbay.http.HttpServer.service(HttpServer.java:954)
at
org.mortbay.http.HttpConnection.service(HttpConnection.java:
814)
at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.ja
va
:981)
at
org.mortbay.http.HttpConnection.handle(HttpConnection.java:8
31)
at org.mortbay.http.SocketListener.handleConnection(
SocketListener.java:244)
at
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:3
57)
at
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:5
34)
The weird thing is that second time i run the batch it
worked ok, i'm
generating 500 000 urls from a db of about 550 000.
Could it have something to do with open files limit ?
--
Karol Rybak
Programista / Programmer
Sekcja aplikacji / Applications section
Wyższa Szkoła Informatyki i Zarządzania / University of
Internet Technology
and Management
+48(17)8661277
|