FAQ
Hi,

We're running into issues were we are seeing timeouts when
writing/reading a lot of hdfs data. (Hadoop is version CDH4B3 and hdfs
appending is enabled). The type of exceptions vary a lot, but most of
the times it's whenever a DFSClient writes data into the datanodes
pipeline.

For example, one datanode logs "Exception in receiveBlock for block
blk_5476601577216704980_62953994 java.io.EOFException: while trying to
read 65557 bytes" and the other side logs "writeBlock
blk_5476601577216704980_62953994 received exception
java.net.SocketTimeoutException: Read timed out". That's it.

We cannot seem to determine the exact problem. The read timeout is
default (60 sec). The open files limit and the number of xceivers is
upped a lot. A full GC never takes longer than a second.

However, we are seeing a lot of dropped packages on the networking
interface. Could these problems be related?

Any advice will be helpful.

Ferdy.

Search Discussions

  • Allen Wittenauer at Apr 13, 2011 at 4:59 pm

    On Apr 12, 2011, at 10:37 AM, Ferdy Galema wrote:
    We're running into issues were we are seeing timeouts when writing/reading a lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).

    ....
    Any advice will be helpful.
    You should ask Cloudera since you are running their fork of Apache Hadoop.
  • Eli Collins at Apr 13, 2011 at 7:22 pm
    Hey Ferdy,

    If you're seeing this after bumping fs.datanode.max.xcievers and the
    nfiles ulimit, and you're also seeing dropped packets it sounds like
    you're having networking issues.

    See the following as well:
    https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?#d3d8ec0d14c065bb

    Thanks,
    Eli
    On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galema wrote:
    Hi,

    We're running into issues were we are seeing timeouts when writing/reading a
    lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).
    The type of exceptions vary a lot, but most of the times it's whenever a
    DFSClient writes data into the datanodes pipeline.

    For example, one datanode logs "Exception in receiveBlock for block
    blk_5476601577216704980_62953994 java.io.EOFException: while trying to read
    65557 bytes" and the other side logs "writeBlock
    blk_5476601577216704980_62953994 received exception
    java.net.SocketTimeoutException: Read timed out". That's it.

    We cannot seem to determine the exact problem. The read timeout is default
    (60 sec). The open files limit and the number of xceivers is upped a lot. A
    full GC never takes longer than a second.

    However, we are seeing a lot of dropped packages on the networking
    interface. Could these problems be related?

    Any advice will be helpful.

    Ferdy.
  • Arun C Murthy at Apr 13, 2011 at 7:27 pm
    Please keep Cloudera issues off this list.
    On Apr 13, 2011, at 12:22 PM, Eli Collins wrote:

    Hey Ferdy,

    If you're seeing this after bumping fs.datanode.max.xcievers and the
    nfiles ulimit, and you're also seeing dropped packets it sounds like
    you're having networking issues.

    See the following as well:
    https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?
    #d3d8ec0d14c065bb

    Thanks,
    Eli

    On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galema <ferdy.galema@kalooga.com
    wrote:
    Hi,

    We're running into issues were we are seeing timeouts when writing/
    reading a
    lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is
    enabled).
    The type of exceptions vary a lot, but most of the times it's
    whenever a
    DFSClient writes data into the datanodes pipeline.

    For example, one datanode logs "Exception in receiveBlock for block
    blk_5476601577216704980_62953994 java.io.EOFException: while trying
    to read
    65557 bytes" and the other side logs "writeBlock
    blk_5476601577216704980_62953994 received exception
    java.net.SocketTimeoutException: Read timed out". That's it.

    We cannot seem to determine the exact problem. The read timeout is
    default
    (60 sec). The open files limit and the number of xceivers is upped
    a lot. A
    full GC never takes longer than a second.

    However, we are seeing a lot of dropped packages on the networking
    interface. Could these problems be related?

    Any advice will be helpful.

    Ferdy.
  • Ferdy Galema at Apr 13, 2011 at 8:07 pm
    Hey,

    Thanks for replying. By now we are also pretty sure it's an issue in the
    hardware layer. We have updated the system (kernel/NIC drivers)
    therefore eliminating any possible bugs in there. But still encountering
    timeouts and dropped packets.

    @Allen/Arun
    My bad, I was not aware that cloudera releases could not be discussed
    here at all. I was thinking that even though cloudera releases are
    somewhat different, issues that are probably generic could still
    discussed here. (Surely I would use the cloudera lists when I'm pretty
    sure it's absolutely specific to cloudera).

    Anyway, I will update the list when we have figured the problem out. The
    right list, cdh-user ;)

    Ferdy.
    On 04/13/2011 09:22 PM, Eli Collins wrote:
    Hey Ferdy,

    If you're seeing this after bumping fs.datanode.max.xcievers and the
    nfiles ulimit, and you're also seeing dropped packets it sounds like
    you're having networking issues.

    See the following as well:
    https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?#d3d8ec0d14c065bb

    Thanks,
    Eli

    On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galemawrote:
    Hi,

    We're running into issues were we are seeing timeouts when writing/reading a
    lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).
    The type of exceptions vary a lot, but most of the times it's whenever a
    DFSClient writes data into the datanodes pipeline.

    For example, one datanode logs "Exception in receiveBlock for block
    blk_5476601577216704980_62953994 java.io.EOFException: while trying to read
    65557 bytes" and the other side logs "writeBlock
    blk_5476601577216704980_62953994 received exception
    java.net.SocketTimeoutException: Read timed out". That's it.

    We cannot seem to determine the exact problem. The read timeout is default
    (60 sec). The open files limit and the number of xceivers is upped a lot. A
    full GC never takes longer than a second.

    However, we are seeing a lot of dropped packages on the networking
    interface. Could these problems be related?

    Any advice will be helpful.

    Ferdy.
  • Allen Wittenauer at Apr 14, 2011 at 5:32 pm

    On Apr 13, 2011, at 1:06 PM, Ferdy Galema wrote:

    @Allen/Arun
    My bad, I was not aware that cloudera releases could not be discussed here at all. I was thinking that even though cloudera releases are somewhat different, issues that are probably generic could still discussed here. (Surely I would use the cloudera lists when I'm pretty sure it's absolutely specific to cloudera).
    Unfortunately, all of the various forks of the Apache releases (regardless of where they come from) have diverged enough that the issues are rarely generic anymore, outside of those answered on the FAQ. :(

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedApr 12, '11 at 5:38p
activeApr 14, '11 at 5:32p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase