Hi all
I am trying to write multiple files to HDFS in each Mapper. Each mapper
parses the text inputs, and writes different outputs to 16 different files
(other than 1 file in general OuputFormat). However, this leads the
java.io.EOFException from "DataInputStream" or java.io.IOException about
"bad connection".
Currently, each slave node maximumly runs 3 mappers, and each
mappers simultaneously write 16 files (about 3 minutes per map-task).
Replication is 3. Hadoop version is 0.20.2. The writer I used is
SequenceFile.writer.
Such exception may be reduced or eliminated, if i reduce the maptask
capacity or reduce the dfs.replication to 2, .
May I know how to avoid such exceptions if I hope to write multiple files in
my OutputFormat? Any suggestions would be very appreciated!
Here are the Exceptions:
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2913)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2838)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2114)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2300)
java.io.IOException: Bad connect ack with firstBadLink 192.168.22.1:55610
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2915)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2838)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2114)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2300)
Thanks!
-
Regards
Yuting