FAQ
I have a problem with the datanodes. I shutdown DFS and Mapred on Friday
for my cluster and then when I started them up on Monday it remained in
safe mode listing two of the datanodes with no blocks. Then when I
checked the logs on the datanodes the log said that the data directory
was not formatted. It preceded to format them and I suppose erased all
blocks stored there. I did not have a high enough replication factor for
both of these to go down so my DFS was ruined. Is this because the
datanodes are storing data in the tmp directory? ... If so how can I
change that directory?



2007-12-03 09:24:08,299 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null

2007-12-03 09:24:08,395 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:09,453 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:10,476 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:11,478 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:12,683 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:13,716 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:14,806 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:15,855 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:16,916 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:18,295 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:19,298 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:20,391 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:21,403 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:22,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:23,515 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:24,544 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:26,065 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:27,068 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:28,230 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:29,411 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:30,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:31,504 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:32,508 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Storage
directory /tmp/hadoop-hadoop/dfs/data is not formatted.

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Formatting
...

2007-12-03 09:24:52,741 INFO org.apache.hadoop.dfs.DataNode: Opened
server at 50010

2007-12-03 09:24:52,794 INFO org.mortbay.util.Credential: Checking
Resource aliases

2007-12-03 09:24:52,827 INFO org.mortbay.http.HttpServer: Version
Jetty/5.1.4

2007-12-03 09:24:53,086 INFO org.mortbay.util.Container: Started
[email protected]

2007-12-03 09:24:53,116 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]

2007-12-03 09:24:53,118 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50075

2007-12-03 09:24:53,118 INFO org.mortbay.util.Container: Started
[email protected]

2007-12-03 09:24:53,148 INFO org.apache.hadoop.dfs.DataNode: New storage
id DS-1588572895-172.18.2.23-50010-1196702693143 is assigned to
data-node 172.18.2.23:50010

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: In
DataNode.run, data =
FSDataset{dirpath='/tmp/hadoop-hadoop/dfs/data/current'}

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: using
BLOCKREPORT_INTERVAL of 3463518msec

2007-12-03 09:31:23,420 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:31:23,447 INFO org.apache.hadoop.dfs.DataNode:
SHUTDOWN_MSG:



Thanks,

Michael

Search Discussions

  • Michael Harris at Dec 6, 2007 at 6:32 pm
    No one has responded to my question... this incident has really eroded
    my trust in the stability of Hadoop and the safety of the data stored in
    the DFS. Am I just doing something really obvious/stupid or do people
    need more information before they can look into the problem? Is this
    related to the fact that the Namenode took a very long time to respond
    to the Datanode's requests (the Namenode is temporarily running in a VM
    so its quite slow).

    -Michael

    -----Original Message-----
    From: Michael Harris
    Sent: Monday, December 03, 2007 10:06 AM
    To: [email protected]
    Subject: DFS Datanodes are suddenly "not formatted"

    I have a problem with the datanodes. I shutdown DFS and Mapred on Friday
    for my cluster and then when I started them up on Monday it remained in
    safe mode listing two of the datanodes with no blocks. Then when I
    checked the logs on the datanodes the log said that the data directory
    was not formatted. It preceded to format them and I suppose erased all
    blocks stored there. I did not have a high enough replication factor for
    both of these to go down so my DFS was ruined. Is this because the
    datanodes are storing data in the tmp directory? ... If so how can I
    change that directory?



    2007-12-03 09:24:08,299 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=DataNode, sessionId=null

    2007-12-03 09:24:08,395 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:24:09,453 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
    time(s).

    2007-12-03 09:24:10,476 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
    time(s).

    2007-12-03 09:24:11,478 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
    time(s).

    2007-12-03 09:24:12,683 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
    time(s).

    2007-12-03 09:24:13,716 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
    time(s).

    2007-12-03 09:24:14,806 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
    time(s).

    2007-12-03 09:24:15,855 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
    time(s).

    2007-12-03 09:24:16,916 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
    time(s).

    2007-12-03 09:24:18,295 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
    time(s).

    2007-12-03 09:24:19,298 INFO org.apache.hadoop.ipc.RPC: Server at
    mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

    2007-12-03 09:24:20,391 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:24:21,403 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
    time(s).

    2007-12-03 09:24:22,431 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
    time(s).

    2007-12-03 09:24:23,515 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
    time(s).

    2007-12-03 09:24:24,544 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
    time(s).

    2007-12-03 09:24:26,065 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
    time(s).

    2007-12-03 09:24:27,068 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
    time(s).

    2007-12-03 09:24:28,230 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
    time(s).

    2007-12-03 09:24:29,411 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
    time(s).

    2007-12-03 09:24:30,431 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
    time(s).

    2007-12-03 09:24:31,504 INFO org.apache.hadoop.ipc.RPC: Server at
    mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

    2007-12-03 09:24:32,508 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Storage
    directory /tmp/hadoop-hadoop/dfs/data is not formatted.

    2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Formatting
    ...

    2007-12-03 09:24:52,741 INFO org.apache.hadoop.dfs.DataNode: Opened
    server at 50010

    2007-12-03 09:24:52,794 INFO org.mortbay.util.Credential: Checking
    Resource aliases

    2007-12-03 09:24:52,827 INFO org.mortbay.http.HttpServer: Version
    Jetty/5.1.4

    2007-12-03 09:24:53,086 INFO org.mortbay.util.Container: Started
    [email protected]

    2007-12-03 09:24:53,116 INFO org.mortbay.util.Container: Started
    WebApplicationContext[/,/]

    2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
    HttpContext[/logs,/logs]

    2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
    HttpContext[/static,/static]

    2007-12-03 09:24:53,118 INFO org.mortbay.http.SocketListener: Started
    SocketListener on 0.0.0.0:50075

    2007-12-03 09:24:53,118 INFO org.mortbay.util.Container: Started
    [email protected]

    2007-12-03 09:24:53,148 INFO org.apache.hadoop.dfs.DataNode: New storage
    id DS-1588572895-172.18.2.23-50010-1196702693143 is assigned to
    data-node 172.18.2.23:50010

    2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: In
    DataNode.run, data =
    FSDataset{dirpath='/tmp/hadoop-hadoop/dfs/data/current'}

    2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: using
    BLOCKREPORT_INTERVAL of 3463518msec

    2007-12-03 09:31:23,420 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:31:23,447 INFO org.apache.hadoop.dfs.DataNode:
    SHUTDOWN_MSG:



    Thanks,

    Michael
  • Michael Bieniosek at Dec 6, 2007 at 7:45 pm
    In your hadoop-site.xml, you can set

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/hadoop</value>
    </property>

    This will put all the hadoop stuff in /hadoop. By default, this directory is /tmp/hadoop-$USER, which is probably worth a bug report.

    -Michael

    On 12/6/07 10:31 AM, "Michael Harris" wrote:

    No one has responded to my question... this incident has really eroded
    my trust in the stability of Hadoop and the safety of the data stored in
    the DFS. Am I just doing something really obvious/stupid or do people
    need more information before they can look into the problem? Is this
    related to the fact that the Namenode took a very long time to respond
    to the Datanode's requests (the Namenode is temporarily running in a VM
    so its quite slow).

    -Michael

    -----Original Message-----
    From: Michael Harris
    Sent: Monday, December 03, 2007 10:06 AM
    To: [email protected]
    Subject: DFS Datanodes are suddenly "not formatted"

    I have a problem with the datanodes. I shutdown DFS and Mapred on Friday
    for my cluster and then when I started them up on Monday it remained in
    safe mode listing two of the datanodes with no blocks. Then when I
    checked the logs on the datanodes the log said that the data directory
    was not formatted. It preceded to format them and I suppose erased all
    blocks stored there. I did not have a high enough replication factor for
    both of these to go down so my DFS was ruined. Is this because the
    datanodes are storing data in the tmp directory? ... If so how can I
    change that directory?



    2007-12-03 09:24:08,299 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=DataNode, sessionId=null

    2007-12-03 09:24:08,395 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:24:09,453 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
    time(s).

    2007-12-03 09:24:10,476 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
    time(s).

    2007-12-03 09:24:11,478 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
    time(s).

    2007-12-03 09:24:12,683 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
    time(s).

    2007-12-03 09:24:13,716 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
    time(s).

    2007-12-03 09:24:14,806 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
    time(s).

    2007-12-03 09:24:15,855 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
    time(s).

    2007-12-03 09:24:16,916 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
    time(s).

    2007-12-03 09:24:18,295 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
    time(s).

    2007-12-03 09:24:19,298 INFO org.apache.hadoop.ipc.RPC: Server at
    mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

    2007-12-03 09:24:20,391 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:24:21,403 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
    time(s).

    2007-12-03 09:24:22,431 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
    time(s).

    2007-12-03 09:24:23,515 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
    time(s).

    2007-12-03 09:24:24,544 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
    time(s).

    2007-12-03 09:24:26,065 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
    time(s).

    2007-12-03 09:24:27,068 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
    time(s).

    2007-12-03 09:24:28,230 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
    time(s).

    2007-12-03 09:24:29,411 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
    time(s).

    2007-12-03 09:24:30,431 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
    time(s).

    2007-12-03 09:24:31,504 INFO org.apache.hadoop.ipc.RPC: Server at
    mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

    2007-12-03 09:24:32,508 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Storage
    directory /tmp/hadoop-hadoop/dfs/data is not formatted.

    2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Formatting
    ...

    2007-12-03 09:24:52,741 INFO org.apache.hadoop.dfs.DataNode: Opened
    server at 50010

    2007-12-03 09:24:52,794 INFO org.mortbay.util.Credential: Checking
    Resource aliases

    2007-12-03 09:24:52,827 INFO org.mortbay.http.HttpServer: Version
    Jetty/5.1.4

    2007-12-03 09:24:53,086 INFO org.mortbay.util.Container: Started
    [email protected]

    2007-12-03 09:24:53,116 INFO org.mortbay.util.Container: Started
    WebApplicationContext[/,/]

    2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
    HttpContext[/logs,/logs]

    2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
    HttpContext[/static,/static]

    2007-12-03 09:24:53,118 INFO org.mortbay.http.SocketListener: Started
    SocketListener on 0.0.0.0:50075

    2007-12-03 09:24:53,118 INFO org.mortbay.util.Container: Started
    [email protected]

    2007-12-03 09:24:53,148 INFO org.apache.hadoop.dfs.DataNode: New storage
    id DS-1588572895-172.18.2.23-50010-1196702693143 is assigned to
    data-node 172.18.2.23:50010

    2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: In
    DataNode.run, data =
    FSDataset{dirpath='/tmp/hadoop-hadoop/dfs/data/current'}

    2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: using
    BLOCKREPORT_INTERVAL of 3463518msec

    2007-12-03 09:31:23,420 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
    time(s).

    2007-12-03 09:31:23,447 INFO org.apache.hadoop.dfs.DataNode:
    SHUTDOWN_MSG:



    Thanks,

    Michael

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 3, '07 at 6:06p
activeDec 6, '07 at 7:45p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase