FAQ
Might there be a reason for why this seems to routinely happen to me when
using Hadoop 0.19.0 on Amazon EC2?

09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
nodes contain current block
09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
nodes contain current block
09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
nodes contain current block
09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could
not obtain block: blk_-1757733438820764312_6736 file=/stats.txt

It seems hdfs isn't so robust or reliable as the website says and/or I have
a configuration issue.


Richard J. Zak

Search Discussions

  • Jean-Daniel Cryans at Jan 23, 2009 at 5:34 pm
    Richard,

    This happens when the datanodes are too slow and eventually all replicas for
    a single block are tagged as "bad". What kind of instances are you using?
    How many of them?

    J-D
    On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA] wrote:

    Might there be a reason for why this seems to routinely happen to me when
    using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
    nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
    nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
    nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could
    not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
    It seems hdfs isn't so robust or reliable as the website says and/or I have
    a configuration issue.


    Richard J. Zak
  • Zak, Richard [USA] at Jan 23, 2009 at 6:12 pm
    4 slaves, 1 master, all are the m1.xlarge instance type.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 12:34
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    Richard,

    This happens when the datanodes are too slow and eventually all replicas for
    a single block are tagged as "bad". What kind of instances are you using?
    How many of them?

    J-D

    On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
    wrote:
    Might there be a reason for why this seems to routinely happen to me
    when using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
    Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
    It seems hdfs isn't so robust or reliable as the website says and/or I
    have a configuration issue.


    Richard J. Zak
  • Jean-Daniel Cryans at Jan 23, 2009 at 6:24 pm
    xlarge is good. Is it normally happening during a MR job? If so, how many
    tasks do you have running at the same moment overall? Also, is your data
    stored on EBS?

    Thx,

    J-D
    On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA] wrote:

    4 slaves, 1 master, all are the m1.xlarge instance type.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 12:34
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    Richard,

    This happens when the datanodes are too slow and eventually all replicas
    for
    a single block are tagged as "bad". What kind of instances are you using?
    How many of them?

    J-D

    On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
    wrote:
    Might there be a reason for why this seems to routinely happen to me
    when using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
    Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
    It seems hdfs isn't so robust or reliable as the website says and/or I
    have a configuration issue.


    Richard J. Zak
  • Zak, Richard [USA] at Jan 23, 2009 at 6:34 pm
    It happens right after the MR job (though once or twice its happened
    during). I am not using EBS, just HDFS between the machines. As for tasks,
    there are 4 mappers and 0 reducers.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 13:24
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    xlarge is good. Is it normally happening during a MR job? If so, how many
    tasks do you have running at the same moment overall? Also, is your data
    stored on EBS?

    Thx,

    J-D

    On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
    wrote:
    4 slaves, 1 master, all are the m1.xlarge instance type.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 12:34
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    Richard,

    This happens when the datanodes are too slow and eventually all
    replicas for a single block are tagged as "bad". What kind of
    instances are you using?
    How many of them?

    J-D

    On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
    wrote:
    Might there be a reason for why this seems to routinely happen to
    me when using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
    Could not obtain block: blk_-1757733438820764312_6736
    file=/stats.txt It seems hdfs isn't so robust or reliable as the
    website says and/or I have a configuration issue.


    Richard J. Zak
  • Jean-Daniel Cryans at Jan 23, 2009 at 6:46 pm
    Yes you may overload your machines that way because of the small number. One
    thing to do would be to look in the logs for any signs of IOExceptions and
    report them back here. Another thing you can do is to change some configs.
    Increase *dfs.datanode.max.xcievers* to 512 and set the
    *dfs.datanode.socket.write.timeout
    to *0 (this is supposed to be fixed in 0.19 but I've had some problems).** A
    HDFS restart is required.

    J-D
    On Fri, Jan 23, 2009 at 1:26 PM, Zak, Richard [USA] wrote:

    It happens right after the MR job (though once or twice its happened
    during). I am not using EBS, just HDFS between the machines. As for
    tasks,
    there are 4 mappers and 0 reducers.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 13:24
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    xlarge is good. Is it normally happening during a MR job? If so, how many
    tasks do you have running at the same moment overall? Also, is your data
    stored on EBS?

    Thx,

    J-D

    On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
    wrote:
    4 slaves, 1 master, all are the m1.xlarge instance type.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 12:34
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    Richard,

    This happens when the datanodes are too slow and eventually all
    replicas for a single block are tagged as "bad". What kind of
    instances are you using?
    How many of them?

    J-D

    On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
    wrote:
    Might there be a reason for why this seems to routinely happen to
    me when using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
    Could not obtain block: blk_-1757733438820764312_6736
    file=/stats.txt It seems hdfs isn't so robust or reliable as the
    website says and/or I have a configuration issue.


    Richard J. Zak
  • Konstantin Shvachko at Jan 23, 2009 at 7:18 pm
    Yes guys. We observed such problems.
    They will be common for 0.18.2 and 0.19.0 exactly as you
    described it when data-nodes become unstable.

    There were several issues, please take a look
    HADOOP-4997 workaround for tmp file handling on DataNodes
    HADOOP-4663 - links to other related
    HADOOP-4810 Data lost at cluster startup
    HADOOP-4702 Failed block replication leaves an incomplete block
    ....

    We run 0.18.3 now and it does not have these problems.
    0.19.1 should be the same.

    Thanks,
    --Konstantin

    Zak, Richard [USA] wrote:
    It happens right after the MR job (though once or twice its happened
    during). I am not using EBS, just HDFS between the machines. As for tasks,
    there are 4 mappers and 0 reducers.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 13:24
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    xlarge is good. Is it normally happening during a MR job? If so, how many
    tasks do you have running at the same moment overall? Also, is your data
    stored on EBS?

    Thx,

    J-D

    On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
    wrote:
    4 slaves, 1 master, all are the m1.xlarge instance type.


    Richard J. Zak

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of
    Jean-Daniel Cryans
    Sent: Friday, January 23, 2009 12:34
    To: core-user@hadoop.apache.org
    Subject: Re: HDFS loosing blocks or connection error

    Richard,

    This happens when the datanodes are too slow and eventually all
    replicas for a single block are tagged as "bad". What kind of
    instances are you using?
    How many of them?

    J-D

    On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
    wrote:
    Might there be a reason for why this seems to routinely happen to
    me when using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException:
    No live nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
    Could not obtain block: blk_-1757733438820764312_6736
    file=/stats.txt It seems hdfs isn't so robust or reliable as the
    website says and/or I have a configuration issue.


    Richard J. Zak
  • Raghu Angadi at Jan 23, 2009 at 7:42 pm

    It seems hdfs isn't so robust or reliable as the website says and/or I
    have a configuration issue.
    quite possible. How robust does the website say it is?

    I agree debuggings failures like the following is pretty hard for casual
    users. You need look at the logs for block, or run 'bin/hadoop fsck
    /stats.txt' etc. Reason could as simple as no live datanodes or as
    complex as strange network behavior triggering a bug in DFSClient.

    You can start by looking at or attaching client log around the lines
    that contain block id. Also, note the version you are running.

    Raghu.

    Zak, Richard [USA] wrote:
    Might there be a reason for why this seems to routinely happen to me
    when using Hadoop 0.19.0 on Amazon EC2?

    09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
    blk_-1757733438820764312_6736 from any node: java.io.IOException: No
    live nodes contain current block
    09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
    Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt


    Richard J. Zak

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 23, '09 at 5:20p
activeJan 23, '09 at 7:42p
posts8
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase