FAQ
Hi.

I'm testing Hadoop in our lab, and started getting the following message
when trying to copy a file:
Could only be replicated to 0 nodes, instead of 1

I have the following setup:

* 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
* Two clients are copying files all the time (one of them is the 1.5GB
machine)
* The replication is set on 2
* I let the space on 2 smaller machines to end, to test the behavior

Now, one of the clients (the one located on 1.5GB) works fine, and the other
one - the external, unable to copy and displays the error + the exception
below

Any idea if this expected on my scenario? Or how it can be solved?

Thanks in advance.



09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
/test/test.bin retries left 1

09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test/test.bin could only be replicated to 0 nodes, instead of 1

at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
)

at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



at org.apache.hadoop.ipc.Client.call(Client.java:716)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

at java.lang.reflect.Method.invoke(Method.java:597)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)

at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
)



09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
datanode[0]

java.io.IOException: Could not get block locations. Aborting...

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
)

Search Discussions

  • Ashish pareek at May 21, 2009 at 11:30 am
    Hi ,

    I have two suggestion

    i)Choose a right version ( Hadoop- 0.18 is good)
    ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that
    ur configuration is correct !!)

    Hey even i am just suggesting this as i am also a new to hadoop

    Ashish Pareek

    On Thu, May 21, 2009 at 2:41 PM, Stas Oskin wrote:

    Hi.

    I'm testing Hadoop in our lab, and started getting the following message
    when trying to copy a file:
    Could only be replicated to 0 nodes, instead of 1

    I have the following setup:

    * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
    * Two clients are copying files all the time (one of them is the 1.5GB
    machine)
    * The replication is set on 2
    * I let the space on 2 smaller machines to end, to test the behavior

    Now, one of the clients (the one located on 1.5GB) works fine, and the
    other
    one - the external, unable to copy and displays the error + the exception
    below

    Any idea if this expected on my scenario? Or how it can be solved?

    Thanks in advance.



    09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
    /test/test.bin retries left 1

    09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /test/test.bin could only be replicated to 0 nodes, instead of 1

    at

    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )



    09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
    datanode[0]

    java.io.IOException: Could not get block locations. Aborting...

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
    )
  • Jason hadoop at May 21, 2009 at 11:36 am
    It does not appear that any datanodes have connected to your namenode.
    on the datanode machines look in the hadoop logs directory at the datanode
    log files.
    There should be some information there that helps you diagnose the problem.

    chapter 4 of my book provides some detail on work with this problem
    On Thu, May 21, 2009 at 4:29 AM, ashish pareek wrote:

    Hi ,

    I have two suggestion

    i)Choose a right version ( Hadoop- 0.18 is good)
    ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that
    ur configuration is correct !!)

    Hey even i am just suggesting this as i am also a new to hadoop

    Ashish Pareek

    On Thu, May 21, 2009 at 2:41 PM, Stas Oskin wrote:

    Hi.

    I'm testing Hadoop in our lab, and started getting the following message
    when trying to copy a file:
    Could only be replicated to 0 nodes, instead of 1

    I have the following setup:

    * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
    * Two clients are copying files all the time (one of them is the 1.5GB
    machine)
    * The replication is set on 2
    * I let the space on 2 smaller machines to end, to test the behavior

    Now, one of the clients (the one located on 1.5GB) works fine, and the
    other
    one - the external, unable to copy and displays the error + the exception
    below

    Any idea if this expected on my scenario? Or how it can be solved?

    Thanks in advance.



    09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
    /test/test.bin retries left 1

    09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /test/test.bin could only be replicated to 0 nodes, instead of 1

    at

    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )



    09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
    datanode[0]

    java.io.IOException: Could not get block locations. Aborting...

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
    )


    --
    Alpha Chapters of my book on Hadoop are available
    http://www.apress.com/book/view/9781430219422
    www.prohadoopbook.com a community for Hadoop Professionals
  • Stas Oskin at May 21, 2009 at 3:21 pm
    Hi.

    2009/5/21 jason hadoop <jason.hadoop@gmail.com>
    It does not appear that any datanodes have connected to your namenode.
    on the datanode machines look in the hadoop logs directory at the datanode
    log files.
    There should be some information there that helps you diagnose the problem.

    chapter 4 of my book provides some detail on work with this problem
    NameNode web panel shows that all DataNodes are connected.

    Also, as I said above, one client (same as located on the 1.5GB DataNode) is
    working ok.

    Anything else that I can check?

    Regards.
  • Stas Oskin at May 21, 2009 at 3:20 pm
    Hi.

    i)Choose a right version ( Hadoop- 0.18 is good)


    I'm using 0.18.3.

    ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that
    ur configuration is correct !!)
    Actually I'm testing 2x replication on any number of DN's, to see how
    reliable is it.

    Hey even i am just suggesting this as i am also a new to hadoop

    Ashish Pareek

    On Thu, May 21, 2009 at 2:41 PM, Stas Oskin wrote:

    Hi.

    I'm testing Hadoop in our lab, and started getting the following message
    when trying to copy a file:
    Could only be replicated to 0 nodes, instead of 1

    I have the following setup:

    * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
    * Two clients are copying files all the time (one of them is the 1.5GB
    machine)
    * The replication is set on 2
    * I let the space on 2 smaller machines to end, to test the behavior

    Now, one of the clients (the one located on 1.5GB) works fine, and the
    other
    one - the external, unable to copy and displays the error + the exception
    below

    Any idea if this expected on my scenario? Or how it can be solved?

    Thanks in advance.



    09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
    /test/test.bin retries left 1

    09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /test/test.bin could only be replicated to 0 nodes, instead of 1

    at

    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )



    09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
    datanode[0]

    java.io.IOException: Could not get block locations. Aborting...

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
    )
  • Raghu Angadi at May 21, 2009 at 7:02 pm
    I think you should file a jira on this. Most likely this is what is
    happening :

    * two out of 3 dns can not take anymore blocks.
    * While picking nodes for a new block, NN mostly skips the third dn as
    well since '# active writes' on it is larger than '2 * avg'.
    * Even if there is one other block is being written on the 3rd, it is
    still greater than (2 * 1/3).

    To test this, if you write just one block to an idle cluster it should
    succeed.

    Writing from the client on the 3rd dn succeeds since local node is
    always favored.

    This particular problem is not that severe on a large cluster but HDFS
    should do the sensible thing.

    Raghu.

    Stas Oskin wrote:
    Hi.

    I'm testing Hadoop in our lab, and started getting the following message
    when trying to copy a file:
    Could only be replicated to 0 nodes, instead of 1

    I have the following setup:

    * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
    * Two clients are copying files all the time (one of them is the 1.5GB
    machine)
    * The replication is set on 2
    * I let the space on 2 smaller machines to end, to test the behavior

    Now, one of the clients (the one located on 1.5GB) works fine, and the other
    one - the external, unable to copy and displays the error + the exception
    below

    Any idea if this expected on my scenario? Or how it can be solved?

    Thanks in advance.



    09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
    /test/test.bin retries left 1

    09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /test/test.bin could only be replicated to 0 nodes, instead of 1

    at
    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )



    09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
    datanode[0]

    java.io.IOException: Could not get block locations. Aborting...

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
    )
  • Brian Bockelman at May 21, 2009 at 7:07 pm

    On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:
    I think you should file a jira on this. Most likely this is what is
    happening :

    * two out of 3 dns can not take anymore blocks.
    * While picking nodes for a new block, NN mostly skips the third dn
    as well since '# active writes' on it is larger than '2 * avg'.
    * Even if there is one other block is being written on the 3rd, it
    is still greater than (2 * 1/3).

    To test this, if you write just one block to an idle cluster it
    should succeed.

    Writing from the client on the 3rd dn succeeds since local node is
    always favored.

    This particular problem is not that severe on a large cluster but
    HDFS should do the sensible thing.
    Hey Raghu,

    If this analysis is right, I would add it can happen even on large
    clusters! I've seen this error at our cluster when we're very full
    (>97%) and very few nodes have any empty space. This usually happens
    because we have two very large nodes (10x bigger than the rest of the
    cluster), and HDFS tends to distribute writes randomly -- meaning the
    smaller nodes fill up quickly, until the balancer can catch up.

    Brian
    Raghu.

    Stas Oskin wrote:
    Hi.
    I'm testing Hadoop in our lab, and started getting the following
    message
    when trying to copy a file:
    Could only be replicated to 0 nodes, instead of 1
    I have the following setup:
    * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
    * Two clients are copying files all the time (one of them is the
    1.5GB
    machine)
    * The replication is set on 2
    * I let the space on 2 smaller machines to end, to test the behavior
    Now, one of the clients (the one located on 1.5GB) works fine, and
    the other
    one - the external, unable to copy and displays the error + the
    exception
    below
    Any idea if this expected on my scenario? Or how it can be solved?
    Thanks in advance.
    09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException
    sleeping
    /test/test.bin retries left 1
    09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /test/test.bin could only be replicated to 0 nodes, instead of 1
    at
    org
    .apache
    .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
    )
    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:
    330)
    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown
    Source)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25
    )
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:
    890)
    at org.apache.hadoop.ipc.Client.call(Client.java:716)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25
    )
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org
    .apache
    .hadoop
    .io
    .retry
    .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )
    at
    org
    .apache
    .hadoop
    .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:
    59
    )
    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )
    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access
    $1800(DFSClient.java:1745
    )
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream
    $DataStreamer.run(DFSClient.java:1922
    )
    09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null
    bad
    datanode[0]
    java.io.IOException: Could not get block locations. Aborting...
    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.processDatanodeError(DFSClient.java:2153
    )
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access
    $1400(DFSClient.java:1745
    )
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream
    $DataStreamer.run(DFSClient.java:1899
    )
  • Raghu Angadi at May 21, 2009 at 7:24 pm

    Brian Bockelman wrote:
    On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:


    I think you should file a jira on this. Most likely this is what is
    happening :

    * two out of 3 dns can not take anymore blocks.
    * While picking nodes for a new block, NN mostly skips the third dn as
    well since '# active writes' on it is larger than '2 * avg'.
    * Even if there is one other block is being written on the 3rd, it is
    still greater than (2 * 1/3).

    To test this, if you write just one block to an idle cluster it should
    succeed.

    Writing from the client on the 3rd dn succeeds since local node is
    always favored.

    This particular problem is not that severe on a large cluster but HDFS
    should do the sensible thing.
    Hey Raghu,

    If this analysis is right, I would add it can happen even on large
    clusters! I've seen this error at our cluster when we're very full
    (>97%) and very few nodes have any empty space. This usually happens
    because we have two very large nodes (10x bigger than the rest of the
    cluster), and HDFS tends to distribute writes randomly -- meaning the
    smaller nodes fill up quickly, until the balancer can catch up.
    Yes. This would bite when ever a large portion of nodes can not accept
    blocks. In general can happen whenever less than half the nodes have any
    space left.

    Raghu.
  • Stas Oskin at May 21, 2009 at 8:11 pm
    Hi.

    If this analysis is right, I would add it can happen even on large clusters!
    I've seen this error at our cluster when we're very full (>97%) and very
    few nodes have any empty space. This usually happens because we have two
    very large nodes (10x bigger than the rest of the cluster), and HDFS tends
    to distribute writes randomly -- meaning the smaller nodes fill up quickly,
    until the balancer can catch up.

    A bit of topic, do you ran the balancer manually? Or you have some scheduler
    that does it?
  • Brian Bockelman at May 21, 2009 at 8:32 pm

    On May 21, 2009, at 3:10 PM, Stas Oskin wrote:

    Hi.

    If this analysis is right, I would add it can happen even on large
    clusters!
    I've seen this error at our cluster when we're very full (>97%) and
    very
    few nodes have any empty space. This usually happens because we
    have two
    very large nodes (10x bigger than the rest of the cluster), and
    HDFS tends
    to distribute writes randomly -- meaning the smaller nodes fill up
    quickly,
    until the balancer can catch up.

    A bit of topic, do you ran the balancer manually? Or you have some
    scheduler
    that does it?
    crontab does it for us, once an hour. We're always importing data, so
    the cluster is always out-of-balance.

    If the previous balancer didn't exit, the new one will simply exit.

    The real trick has been to make sure the balancer doesn't get stuck --
    a Nagios plugin makes sure that the stdout has been printed to in the
    last hour or so, otherwise it kills the running balancer. Stuck
    balancers have been an issue in the past.

    Brian
  • Stas Oskin at May 21, 2009 at 8:36 pm

    The real trick has been to make sure the balancer doesn't get stuck -- a
    Nagios plugin makes sure that the stdout has been printed to in the last
    hour or so, otherwise it kills the running balancer. Stuck balancers have
    been an issue in the past.

    Thanks for the advice.
  • Stas Oskin at May 21, 2009 at 8:09 pm
    Hi.

    I think you should file a jira on this. Most likely this is what is
    happening :
    Will do - this goes to DFS section, correct?

    * two out of 3 dns can not take anymore blocks.
    * While picking nodes for a new block, NN mostly skips the third dn as
    well since '# active writes' on it is larger than '2 * avg'.
    * Even if there is one other block is being written on the 3rd, it is
    still greater than (2 * 1/3).
    Frankly I'm not so familiar with Hadoop inner workings to understand this
    completely, but from what I digest, NN doesn't like the 3rd DN because there
    is too many blocks on it, compared to other servers?

    To test this, if you write just one block to an idle cluster it should
    succeed.
    What exactly is "idle cluster"? Something that nothing is being written to
    (including the 3rd DN)?

    Writing from the client on the 3rd dn succeeds since local node is always
    favored.

    Makes sense.

    This particular problem is not that severe on a large cluster but HDFS
    should do the sensible thing.
    Yes, I agree that this is a non-standard situation, but IMHO the best way of
    action would be write anyway, but throw a warning. There is one already
    appearing when there is not enough space for replication, and it explains
    quite well the matter. So similar one would be great.
  • Stas Oskin at May 21, 2009 at 8:41 pm

    I think you should file a jira on this. Most likely this is what is
    happening :
    Here it is - hope it's ok:

    https://issues.apache.org/jira/browse/HADOOP-5886
  • Raghu Angadi at May 21, 2009 at 8:51 pm

    Stas Oskin wrote:
    I think you should file a jira on this. Most likely this is what is
    happening :
    Here it is - hope it's ok:

    https://issues.apache.org/jira/browse/HADOOP-5886
    looks good. I will add my earlier post as comment. You could update the
    jira with any more tests.

    Next time, it would be better include larger stack traces, logs etc in
    subsequent comments rather than in the description.

    Thanks,
    Raghu.
  • Stas Oskin at May 21, 2009 at 10:22 pm


    Next time, it would be better include larger stack traces, logs etc in
    subsequent comments rather than in the description.
    Will do, thanks for the tip.
  • Stas Oskin at Oct 13, 2009 at 1:38 pm
    Hi.

    I wonder if there was any progress with this issue?

    Regards.
    On Thu, May 21, 2009 at 9:01 PM, Raghu Angadi wrote:


    I think you should file a jira on this. Most likely this is what is
    happening :

    * two out of 3 dns can not take anymore blocks.
    * While picking nodes for a new block, NN mostly skips the third dn as
    well since '# active writes' on it is larger than '2 * avg'.
    * Even if there is one other block is being written on the 3rd, it is
    still greater than (2 * 1/3).

    To test this, if you write just one block to an idle cluster it should
    succeed.

    Writing from the client on the 3rd dn succeeds since local node is always
    favored.

    This particular problem is not that severe on a large cluster but HDFS
    should do the sensible thing.

    Raghu.


    Stas Oskin wrote:
    Hi.

    I'm testing Hadoop in our lab, and started getting the following message
    when trying to copy a file:
    Could only be replicated to 0 nodes, instead of 1

    I have the following setup:

    * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
    * Two clients are copying files all the time (one of them is the 1.5GB
    machine)
    * The replication is set on 2
    * I let the space on 2 smaller machines to end, to test the behavior

    Now, one of the clients (the one located on 1.5GB) works fine, and the
    other
    one - the external, unable to copy and displays the error + the exception
    below

    Any idea if this expected on my scenario? Or how it can be solved?

    Thanks in advance.



    09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
    /test/test.bin retries left 1

    09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /test/test.bin could only be replicated to 0 nodes, instead of 1

    at

    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )



    09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
    datanode[0]

    java.io.IOException: Could not get block locations. Aborting...

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
    )

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 21, '09 at 9:11a
activeOct 13, '09 at 1:38p
posts16
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase