|
|
| Created: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset
exception |
  United States |
2007-10-17 17:25:50 |
StreamXmlRecordReader throws java.io.IOException: Mark/reset
exception in hadoop 0.14
------------------------------------------------------------
-------------------------
Key: HADOOP-2071
URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
Project: Hadoop
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.14.3
Reporter: lohit vijayarenu
In hadoop 0.14, using -inputreader StreamXmlRecordReader
for streaming jobs throw
java.io.IOException: Mark/reset exception in hadoop 0.14
This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
<stack trace>
Caused by: java.io.IOException: Mark/reset not supported
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
at
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
mlRecordReader.java:289)
at
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
XmlRecordReader.java:118)
at
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
eamXmlRecordReader.java:111)
at
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
.java:73)
at
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
a:63)
</stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset excepti |
  United States |
2007-10-17 18:09:50 |
[ https://issues.apache.org/jira/browse
/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12535778 ]
Raghu Angadi commented on HADOOP-2071:
--------------------------------------
Mark/reset are supported anymore. If streaming must use
mark/reset, it should use a BufferedInputStream over
DFSInputStream.
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset
exception |
  United States |
2007-10-17 23:27:50 |
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
a> ]
lohit vijayarenu updated HADOOP-2071:
-------------------------------------
Assignee: lohit vijayarenu
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset
exception |
  United States |
2007-10-17 23:27:50 |
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
a> ]
lohit vijayarenu updated HADOOP-2071:
-------------------------------------
Attachment: HADOOP-2071-1.patch
Attached is a patch, which eliminates mark/reset.
At one place seek() was called even after reset() which made
it redundant.
Please could anyone review this.
Thanks
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset excepti |
  United States |
2007-10-18 12:16:51 |
[ https://issues.apache.org/jira/browse
/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12535977 ]
Milind Bhandarkar commented on HADOOP-2071:
-------------------------------------------
Code reviewed:
-1.
the readimit argument for mark is not honored in these
changes. If one calls reset after more than readlimit bytes
have been read after mark, that reset is supposed to throw
IOException.
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset excepti |
  United States |
2007-10-18 12:18:51 |
[ https://issues.apache.org/jira/browse
/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12535980 ]
Milind Bhandarkar commented on HADOOP-2071:
-------------------------------------------
I think we should use wrap InputStream in_ in
java.io.BufferedInputStream, as Raghu suggested, and keep
the mark/reset based impl of StreamXmlRecordReader.
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset excepti |
  United States |
2007-10-18 13:43:50 |
[ https://issues.apache.org/jira/browse
/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12536008 ]
Raghu Angadi commented on HADOOP-2071:
--------------------------------------
bq. the readimit argument for mark is not honored in these
changes. If one calls reset after more than readlimit bytes
have been read after mark, that reset is supposed to throw
IOException.
We can just keep track of how many bytes we read and if it
is larger than readlimit, we can throw an IOException, if we
want to keep that behavior. Actually we can just throw an
exception if there is no record found within readlimit
(instead of reading till there a match or EOF).
Lohit and I looked the code around and it seems to seek-back
pretty heavily (pretty much for every record). Seeking back
is pretty inefficient in DFS. It throws away current buffers
(both app and TCP) and starts a new connection in most
cases. The current patch does not make this situation any
worse. I wonder what the typical size of these records is..
One problem with using BufferedInputStream() is that current
code uses getPos() and seek() in many place which is
specific to FSDataInputStream. So it will need more changes
to manage it.
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Commented: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset excepti |
  United States |
2007-10-18 13:49:50 |
[ https://issues.apache.org/jira/browse
/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12536013 ]
Raghu Angadi commented on HADOOP-2071:
--------------------------------------
After a little bit more discussion it looks like using
BufferedInputStream can get rid of problem with seek-back as
well. Because we are always seeking with-in what we have
recently read. So we would replace seek() with {{ reset();
skip(); }}.
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset
exception |
  United States |
2007-10-20 04:32:50 |
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
a> ]
lohit vijayarenu updated HADOOP-2071:
-------------------------------------
Attachment: HADOOP-2071-2.patch
With inputs from Raghu and Milind, here is an updated patch.
This wraps FSDataInputStream around BufferedInputStream and
eliminates seek(). Patch also includes a simple test case
for StreamXmlRecordReader.
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch,
HADOOP-2071-2.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (HADOOP-2071)
StreamXmlRecordReader throws
java.io.IOException: Mark/reset
exception |
  United States |
2007-11-07 06:08:50 |
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
a> ]
Arun C Murthy updated HADOOP-2071:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I just committed this. Thanks, Lohit!
> StreamXmlRecordReader throws java.io.IOException:
Mark/reset exception in hadoop 0.14
>
------------------------------------------------------------
-------------------------
>
> Key: HADOOP-2071
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Fix For: 0.16.0
>
> Attachments: HADOOP-2071-1.patch,
HADOOP-2071-2.patch, HADOOP-2071-3.patch,
HADOOP-2071-4.patch, HADOOP-2071-5.patch
>
>
> In hadoop 0.14, using -inputreader
StreamXmlRecordReader for streaming jobs throw
> java.io.IOException: Mark/reset exception in hadoop
0.14
> This looks to be related to (htt
ps://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not
supported
> at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClie
nt.java:1353)
> at
java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUn
tilMatch(StreamX
> mlRecordReader.java:289)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilM
atchBegin(Stream
> XmlRecordReader.java:118)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRe
cordBoundary(Str
> eamXmlRecordReader.java:111)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.init(Strea
mXmlRecordReader
> .java:73)
> at
>
org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXml
RecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|