FAQ
Hi All:

Environment: 4 node grid running hadoop-0.14.1.

With the system shutdown I wiped out the old HDFS directory structure and created an empty directory. Did a namenode format, and then brought up the system with start-all.sh.

I then set up a directory structure, and ran the first job. The job runs 100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs. There are no user-level error messages. All systems go to idle.

I started looking at the Hadoop logs, first strange message from the namenode log:

2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
ask_200710301345_0001_r_000001_0/part-00001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current leaseholder i
s trying to recreate file.
2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310, call create(/import/raw_logs/20070929/_task_2007103
01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0, true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
oop.dfs.AlreadyBeingCreatedException: failed to create file /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for DFSCl
ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current leaseholder is trying to recreate file.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current leaseholder is trying to recreate file.
at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

Second strange error message comes from the jobtracker log:
2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring 'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'

I'm curious about how to proceed. I suspect that my code is OK as I've run it numerous times in both single node and multiple grid environments. I've never seen these error messages before.

Any help much appreciated....

Thanks,
C G

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Search Discussions

  • Ted Dunning at Oct 30, 2007 at 6:34 pm
    Is it possible you have some nodes running multiple instances of datanodes?

    On 10/30/07 11:06 AM, "C G" wrote:

    Hi All:

    Environment: 4 node grid running hadoop-0.14.1.

    With the system shutdown I wiped out the old HDFS directory structure and
    created an empty directory. Did a namenode format, and then brought up the
    system with start-all.sh.

    I then set up a directory structure, and ran the first job. The job runs
    100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.
    There are no user-level error messages. All systems go to idle.

    I started looking at the Hadoop logs, first strange message from the
    namenode log:

    2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR*
    NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
    ask_200710301345_0001_r_000001_0/part-00001 for
    DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because
    current leaseholder i
    s trying to recreate file.
    2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler
    9 on 54310, call create(/import/raw_logs/20070929/_task_2007103
    01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0,
    true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
    oop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for
    DFSCl
    ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
    leaseholder is trying to recreate file.
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
    0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4
    because current leaseholder is trying to recreate file.
    at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
    at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
    at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
    ava:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

    Second strange error message comes from the jobtracker log:
    2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring
    'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'

    I'm curious about how to proceed. I suspect that my code is OK as I've run
    it numerous times in both single node and multiple grid environments. I've
    never seen these error messages before.

    Any help much appreciated....

    Thanks,
    C G

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com
  • C G at Oct 30, 2007 at 6:45 pm
    Each node runs 1 datanode instance. I did notice this on "node4" trying to send to "node2":

    2007-10-30 14:35:16,876 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_7852052048957161201 to 10.2.11.2:50010 got java.net.SocketExcep
    tion: Connection reset
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at org.apache.hadoop.dfs.DataNode.sendBlock(DataNode.java:1231)
    at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
    at java.lang.Thread.run(Thread.java:619)

    and then on node2:
    2007-10-30 14:35:16,918 INFO org.apache.hadoop.dfs.DataNode: Received block blk_7852052048957161201 from /10.2.11.3
    2007-10-30 14:35:16,919 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException: Block blk_7852052048957161201 is valid, and cannot
    be written to.
    at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:515)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:822)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
    at java.lang.Thread.run(Thread.java:619)

    Ted Dunning wrote:

    Is it possible you have some nodes running multiple instances of datanodes?


    On 10/30/07 11:06 AM, "C G"
    wrote:
    Hi All:

    Environment: 4 node grid running hadoop-0.14.1.

    With the system shutdown I wiped out the old HDFS directory structure and
    created an empty directory. Did a namenode format, and then brought up the
    system with start-all.sh.

    I then set up a directory structure, and ran the first job. The job runs
    100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.
    There are no user-level error messages. All systems go to idle.

    I started looking at the Hadoop logs, first strange message from the
    namenode log:

    2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR*
    NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
    ask_200710301345_0001_r_000001_0/part-00001 for
    DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because
    current leaseholder i
    s trying to recreate file.
    2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler
    9 on 54310, call create(/import/raw_logs/20070929/_task_2007103
    01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0,
    true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
    oop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for
    DFSCl
    ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
    leaseholder is trying to recreate file.
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
    0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4
    because current leaseholder is trying to recreate file.
    at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
    at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
    at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
    ava:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

    Second strange error message comes from the jobtracker log:
    2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring
    'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'

    I'm curious about how to proceed. I suspect that my code is OK as I've run
    it numerous times in both single node and multiple grid environments. I've
    never seen these error messages before.

    Any help much appreciated....

    Thanks,
    C G

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com


    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com
  • C G at Oct 30, 2007 at 8:24 pm
    Still more weirdness. When I run my code on a single node hadoop installation it runs fine, albeit performance is terrible. When I move to 2 nods, I see the various error messages I've previously described. The system is also writing directories of stuff to my output when I am only expecting to see reducer output, and logging those writes in the jobtracker log thusly:

    2007-10-30 16:14:33,264 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710301552_0002_r_000007_0' to tip tip_200710301552_0002_r_000007, for tracker 'tracker_localhost.localdomain:50050'
    2007-10-30 16:14:33,628 INFO org.apache.hadoop.mapred.JobTracker: Ignoring 'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
    2007-10-30 16:14:41,359 INFO org.apache.hadoop.mapred.JobTracker: Ignoring 'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
    2007-10-30 16:14:41,722 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200710301552_0002_r_000007_0' has completed tip_200710301552_0002_r_000007 successfully.
    2007-10-30 16:14:41,729 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200710301552_0002_r_000007_0' to hdfs://10.2.11.1:54310/import/raw_logs/20070929

    So instead of reducer output, there is a combination of reducer output and task directories:
    /import/raw_logs/20070929/_task_200710301552_0002_r_000001_0 <dir> 2007-10-30 16:15
    /import/raw_logs/20070929/_task_200710301552_0002_r_000002_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/_task_200710301552_0002_r_000003_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/_task_200710301552_0002_r_000004_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/_task_200710301552_0002_r_000006_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/part-00000 <r 3> 2323852 2007-10-30 16:13
    /import/raw_logs/20070929/part-00001 <r 3> 2271699 2007-10-30 16:13
    /import/raw_logs/20070929/part-00003 <r 3> 2358158 2007-10-30 16:14
    /import/raw_logs/20070929/part-00004 <r 3> 2341299 2007-10-30 16:14
    /import/raw_logs/20070929/part-00005 <r 3> 2265270 2007-10-30 16:14
    /import/raw_logs/20070929/part-00006 <r 3> 2305368 2007-10-30 16:14
    /import/raw_logs/20070929/part-00007 <r 3> 2292513 2007-10-30 16:14

    those _task directories contain output that appears at least in file size to be identical to the reducer output:

    bin/hadoop dfs -ls /import/raw_logs/20070929/_task_200710301552_0002_r_000001_0
    Found 1 items
    /import/raw_logs/20070929/_task_200710301552_0002_r_000001_0/part-00001 <r 3> 2271699 2007-10-30 16:15

    Has anybody seen this behavior before? This is happening for any map/reduce program I run - we have several. All the code is basic stuff - nothing exotic happening at all.

    Any advice much appreciated...

    Thanks,
    C G

    C G wrote:
    Each node runs 1 datanode instance. I did notice this on "node4" trying to send to "node2":

    2007-10-30 14:35:16,876 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_7852052048957161201 to 10.2.11.2:50010 got java.net.SocketExcep
    tion: Connection reset
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at org.apache.hadoop.dfs.DataNode.sendBlock(DataNode.java:1231)
    at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
    at java.lang.Thread.run(Thread.java:619)

    and then on node2:
    2007-10-30 14:35:16,918 INFO org.apache.hadoop.dfs.DataNode: Received block blk_7852052048957161201 from /10.2.11.3
    2007-10-30 14:35:16,919 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException: Block blk_7852052048957161201 is valid, and cannot
    be written to.
    at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:515)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:822)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
    at java.lang.Thread.run(Thread.java:619)

    Ted Dunning wrote:

    Is it possible you have some nodes running multiple instances of datanodes?


    On 10/30/07 11:06 AM, "C G"
    wrote:
    Hi All:

    Environment: 4 node grid running hadoop-0.14.1.

    With the system shutdown I wiped out the old HDFS directory structure and
    created an empty directory. Did a namenode format, and then brought up the
    system with start-all.sh.

    I then set up a directory structure, and ran the first job. The job runs
    100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.
    There are no user-level error messages. All systems go to idle.

    I started looking at the Hadoop logs, first strange message from the
    namenode log:

    2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR*
    NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
    ask_200710301345_0001_r_000001_0/part-00001 for
    DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because
    current leaseholder i
    s trying to recreate file.
    2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler
    9 on 54310, call create(/import/raw_logs/20070929/_task_2007103
    01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0,
    true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
    oop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for
    DFSCl
    ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
    leaseholder is trying to recreate file.
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
    0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4
    because current leaseholder is trying to recreate file.
    at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
    at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
    at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
    ava:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

    Second strange error message comes from the jobtracker log:
    2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring
    'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'

    I'm curious about how to proceed. I suspect that my code is OK as I've run
    it numerous times in both single node and multiple grid environments. I've
    never seen these error messages before.

    Any help much appreciated....

    Thanks,
    C G

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com


    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com
  • Stu Hood at Oct 30, 2007 at 8:36 pm
    The files in the _task* subfolders are expected. Any task executed by Hadoop gets its own working directory, and writes its output there. When one attempt of the task completes successfully, Hadoop copies its successful output from its _task* directory to the final destination.

    That way, multiple attempts of the same task don't have colliding output files.

    Thanks,
    Stu


    -----Original Message-----
    From: C G <parallelguy@yahoo.com>
    Sent: Tuesday, October 30, 2007 4:23pm
    To: hadoop-user@lucene.apache.org
    Subject: Re: Job hangs, strange error messages...help sought...

    Still more weirdness. When I run my code on a single node hadoop installation it runs fine, albeit performance is terrible. When I move to 2 nods, I see the various error messages I've previously described. The system is also writing directories of stuff to my output when I am only expecting to see reducer output, and logging those writes in the jobtracker log thusly:

    2007-10-30 16:14:33,264 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200710301552_0002_r_000007_0' to tip tip_200710301552_0002_r_000007, for tracker 'tracker_localhost.localdomain:50050'
    2007-10-30 16:14:33,628 INFO org.apache.hadoop.mapred.JobTracker: Ignoring 'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
    2007-10-30 16:14:41,359 INFO org.apache.hadoop.mapred.JobTracker: Ignoring 'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
    2007-10-30 16:14:41,722 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200710301552_0002_r_000007_0' has completed tip_200710301552_0002_r_000007 successfully.
    2007-10-30 16:14:41,729 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200710301552_0002_r_000007_0' to hdfs://10.2.11.1:54310/import/raw_logs/20070929

    So instead of reducer output, there is a combination of reducer output and task directories:
    /import/raw_logs/20070929/_task_200710301552_0002_r_000001_0 <dir> 2007-10-30 16:15
    /import/raw_logs/20070929/_task_200710301552_0002_r_000002_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/_task_200710301552_0002_r_000003_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/_task_200710301552_0002_r_000004_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/_task_200710301552_0002_r_000006_0 <dir> 2007-10-30 16:14
    /import/raw_logs/20070929/part-00000 <r 3> 2323852 2007-10-30 16:13
    /import/raw_logs/20070929/part-00001 <r 3> 2271699 2007-10-30 16:13
    /import/raw_logs/20070929/part-00003 <r 3> 2358158 2007-10-30 16:14
    /import/raw_logs/20070929/part-00004 <r 3> 2341299 2007-10-30 16:14
    /import/raw_logs/20070929/part-00005 <r 3> 2265270 2007-10-30 16:14
    /import/raw_logs/20070929/part-00006 <r 3> 2305368 2007-10-30 16:14
    /import/raw_logs/20070929/part-00007 <r 3> 2292513 2007-10-30 16:14

    those _task directories contain output that appears at least in file size to be identical to the reducer output:

    bin/hadoop dfs -ls /import/raw_logs/20070929/_task_200710301552_0002_r_000001_0
    Found 1 items
    /import/raw_logs/20070929/_task_200710301552_0002_r_000001_0/part-00001 <r 3> 2271699 2007-10-30 16:15

    Has anybody seen this behavior before? This is happening for any map/reduce program I run - we have several. All the code is basic stuff - nothing exotic happening at all.

    Any advice much appreciated...

    Thanks,
    C G

    C G wrote:
    Each node runs 1 datanode instance. I did notice this on "node4" trying to send to "node2":

    2007-10-30 14:35:16,876 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_7852052048957161201 to 10.2.11.2:50010 got java.net.SocketExcep
    tion: Connection reset
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at org.apache.hadoop.dfs.DataNodesendBlock(DataNode.java:1231)
    at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
    at java.lang.Thread.run(Thread.java:619)

    and then on node2:
    2007-10-30 14:35:16,918 INFO org.apache.hadoop.dfs.DataNode: Received block blk_7852052048957161201 from /10.2.11.3
    2007-10-30 14:35:16,919 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException: Block blk_7852052048957161201 is valid, and cannot
    be written to.
    at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:515)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:822)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
    at java.lang.Thread.run(Thread.java:619)

    Ted Dunning wrote:

    Is it possible you have some nodes running multiple instances of datanodes?


    On 10/30/07 11:06 AM, "C G"
    wrote:
    Hi All:

    Environment: 4 node grid running hadoop-0.14.1.

    With the system shutdown I wiped out the old HDFS directory structure and
    created an empty directory. Did a namenode format, and then brought up the
    system with start-all.sh.

    I then set up a directory structure, and ran the first job. The job runs
    100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.
    There are no user-level error messages. All systems go to idle.

    I started looking at the Hadoop logs, first strange message from the
    namenode log:

    2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR*
    NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
    ask_200710301345_0001_r_000001_0/part-00001 for
    DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because
    current leaseholder i
    s trying to recreate file.
    2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler
    9 on 54310, call create(/import/raw_logs/20070929/_task_2007103
    01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0,
    true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
    oop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for
    DFSCl
    ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
    leaseholder is trying to recreate file.
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
    /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
    0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4
    because current leaseholder is trying to recreate file.
    at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
    at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
    at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
    ava:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apachehadoop.ipc.RPC$Server.call(RPC.java:340)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

    Second strange error message comes from the jobtracker log:
    2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring
    'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'

    I'm curious about how to proceed. I suspect that my code is OK as I've run
    it numerous times in both single node and multiple grid environments. I've
    never seen these error messages before.

    Any help much appreciated....

    Thanks,
    C G

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com


    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 30, '07 at 6:06p
activeOct 30, '07 at 8:36p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

C G: 3 posts Stu Hood: 1 post Ted Dunning: 1 post

People

Translate

site design / logo © 2022 Grokbase