List Info

Thread: Commented: (HADOOP-1159) Reducers hang when map output file has a checksum error




Commented: (HADOOP-1159) Reducers hang when map output file has a checksum error
country flaguser name
United States
2007-03-29 16:19:25
    [ https://issues.apache.org/jira/browse
/HADOOP-1159?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12485353 ] 

Tom White commented on HADOOP-1159:
-----------------------------------

I understand that the NPE is logged fully, but NPEs
typically indicate a programming problem (which is why APIs
don't document them as being thrown, except to indicate a
programming problem), so we should strive to fix the
underlying problem.

Put another way - which line in TaskTracker throws the NPE?
If we know this, then can we check for the condition that
causes it to be thrown and then declare the task lost.


> Reducers hang when map output file has a checksum
error
>
-------------------------------------------------------
>
>                 Key: HADOOP-1159
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1159
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Nigel Daley
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.3
>
>         Attachments: 1159-merge.patch, 1159.patch,
h1159-2.patch, h1159.patch
>
>
> Two reduces hung in our sort benchmark. They always
fail to get map outputs from node X due to checksum error
when the map outputs are read at that node resulting in a
NullPointerException on node X. This leads to constant
failures on the two fetching reduces.
> 2007-03-26 00:02:57,082 WARN
org.apache.hadoop.fs.FileSystem: Moving bad file
/e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.ou
t to /e/c/bad_files/file.out.542279301
> 2007-03-26 00:02:57,083 INFO
org.apache.hadoop.fs.FSInputChecker: Found checksum error:
org.apache.hadoop.fs.ChecksumException: Checksum error:
/e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.ou
t at 106484224
> 	at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verif
ySum(ChecksumFileSystem.java:254)
> 	at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readB
uffer(ChecksumFileSystem.java:211)
> 	at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(
ChecksumFileSystem.java:167)
> 	at
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FS
DataInputStream.java:41)
> 	at
java.io.BufferedInputStream.fill(BufferedInputStream.java:21
8)
> 	at
java.io.BufferedInputStream.read1(BufferedInputStream.java:2
58)
> 	at
java.io.BufferedInputStream.read(BufferedInputStream.java:31
7)
> 	at
java.io.DataInputStream.read(DataInputStream.java:132)
> 	at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(
TaskTracker.java:1659)
> 	at
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)

> 	at
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

> 	at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder
.java:427)
> 	at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(Web
ApplicationHandler.java:475)
> 	at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandl
er.java:567)
> 	at
org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> 	at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebAp
plicationContext.java:635)
> 	at
org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> 	at
org.mortbay.http.HttpServer.service(HttpServer.java:954)
> 	at
org.mortbay.http.HttpConnection.service(HttpConnection.java:
814)
> 	at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.ja
va:981)
> 	at
org.mortbay.http.HttpConnection.handle(HttpConnection.java:8
31)
> 	at
org.mortbay.http.SocketListener.handleConnection(SocketListe
ner.java:244)
> 	at
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:3
57)
> 	at
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:5
34)
> 2007-03-26 00:02:57,083 WARN /:
/mapOutput?map=task_0002_m_022488_0&reduce=1542: 
> java.lang.NullPointerException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )