FAQ
Hello all,

I was trying to upgrade hadoop 0.13.1 to 0.14.1, but when I follow the
instruction at http://wiki.apache.org/lucene-hadoop/Hadoop_0.14_Upgrade,
running "./start-dfs.sh -upgrad", I found no progress with the upgrading
process.

I tried to check the status with ./hadoop dfsadmin -upgradeProgress status
and get

Distributed upgrade for version -6 is in progress. Status = 0%

Last Block Level Stats updated at : Thu Sep 13 02:04:43 GMT+08:00
2007
Last Block Level Stats : Total Blocks : 1661833
Fully Upgragraded : 0.00%
Minimally Upgraded : 0.00%
Under Upgraded : 100.00% (includes
Un-upgraded blocks)
Un-upgraded : 100.00%
Errors : 0
Brief Datanode Status : Avg completion of all Datanodes: 0.00% with
0 errors.

then
./hadoop dfsadmin -upgradeProgress details
and get
Last Block Level Stats updated at : Thu Sep 13 02:09:47 GMT+08:00
2007
Last Block Level Stats : Total Blocks : 1661833
Fully Upgragraded : 0.00%
Minimally Upgraded : 0.00%
Under Upgraded : 100.00% (includes
Un-upgraded blocks)
Un-upgraded : 100.00%
Errors : 0
Brief Datanode Status : Avg completion of all Datanodes: 0.00% with
0 errors.

Datanode Stats (total: 0): pct Completion(%) blocks upgraded (u)
blocks remaining (r) errors (e)

There are no known Datanodes

Also I checked the log of the name node, and found one exception as followed

2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 9000: starting
2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 9000: starting
2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 9000: starting
2007-09-13 02:17:25,325 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 9000: starting
2007-09-13 02:17:25,400 INFO org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
Block CRC Upgrade is still running.
Avg completion of all Datanodes: 0.00% with
0 errors.
2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000, call getProtocolVersion(org.apache.hado
op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(
SocketChannelImpl.java:125)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(
SocketChannelOutputStream.java:108)
at org.apache.hadoop.ipc.SocketChannelOutputStream.write(
SocketChannelOutputStream.java:89)
at java.io.BufferedOutputStream.flushBuffer(
BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:585)
2007-09-13 02:18:24,921 INFO org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
Block CRC Upgrade is still running.
Avg completion of all Datanodes: 0.00% with 0 errors.

It seems some thing was going wong on data node side, however the log of one
of the data nodes show it was started, and it was still running as I can
find from the processes list, but some how lost connection with the
name-node.

************************************************************/
2007-09-12 22:23:35,319 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = TE-DN-002/192.168.2.102
STARTUP_MSG: args = []
************************************************************/
2007-09-12 22:23:35,533 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessi
onId=null
2007-09-12 22:24:35,619 INFO org.apache.hadoop.ipc.RPC: Problem connecting
to server: /192.168.2.1:9000
2007-09-12 22:25:34,878 INFO org.apache.hadoop.dfs.Storage: Recovering
storage directory /home/textd/data/fs/data from previous
upgrade.
2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.DataNode:
Distributed upgrade for DataNode version -6 to current LV -7 is
initialized.
2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.Storage: Upgrading
storage directory /home/textd/data/fs/data.
old LV = -4; old CTime = 0.
new LV = -7; new CTime = 1189616555276

The hardware configuration was
Namenode: P4D, 3G RAM
3 Datanodes: AMD 64 4000x2, 1G RAM
They worked with hadoop 0.13.1

Any idea or suggestion?

Search Discussions

  • Torsten Curdt at Sep 12, 2007 at 6:49 pm

    I was trying to upgrade hadoop 0.13.1 to 0.14.1, but when I follow the
    instruction at http://wiki.apache.org/lucene-hadoop/
    Hadoop_0.14_Upgrade,
    running "./start-dfs.sh -upgrad", I found no progress with the
    upgrading
    process.
    This takes a while before you see progress. Give it some time.
    2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
    handler 5 on 9000, call getProtocolVersion(org.apache.hado
    op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
    Wondering... how many network interfaces do you have on the boxes?

    cheers
    --
    Torsten
  • Open Study at Sep 12, 2007 at 6:55 pm
    Hi Torsten

    The namenod does have two network interfaces, one of which, eth0, is
    integrated on board and shall have been disabled.

    eth0 Link encap:Ethernet HWaddr 00:13:D4:CB:1C:16
    UP BROADCAST MULTICAST MTU:1500 Metric:1
    RX packets:14388 errors:0 dropped:0 overruns:0 frame:0
    TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:1600574 (1.5 Mb) TX bytes:3482 (3.4 Kb)
    Interrupt:177

    eth1 Link encap:Ethernet HWaddr 00:40:05:42:1B:44
    inet addr:192.168.2.1 Bcast:255.255.255.255 Mask:255.255.240.0
    inet6 addr: fe80::240:5ff:fe42:1b44/64 Scope:Link
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:3344715 errors:0 dropped:0 overruns:0 frame:0
    TX packets:1708146 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:414278686 (395.0 Mb) TX bytes:134601855 (128.3 Mb)
    Interrupt:177 Base address:0xb800

    On 9/13/07, Torsten Curdt wrote:

    I was trying to upgrade hadoop 0.13.1 to 0.14.1, but when I follow the
    instruction at http://wiki.apache.org/lucene-hadoop/
    Hadoop_0.14_Upgrade,
    running "./start-dfs.sh -upgrad", I found no progress with the
    upgrading
    process.
    This takes a while before you see progress. Give it some time.
    2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
    handler 5 on 9000, call getProtocolVersion(org.apache.hado
    op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
    Wondering... how many network interfaces do you have on the boxes?

    cheers
    --
    Torsten
  • Torsten Curdt at Sep 12, 2007 at 7:50 pm
    Well, as long as only one has an IP assigned your are good. There are
    still multi-home problems (even with 0.14.1) which is why I was
    asking. (jira issue still to be opened)

    cheers
    --
    Torsten
  • Raghu Angadi at Sep 12, 2007 at 6:57 pm
    Hi,

    Datanode should be able to connect to Namenode for any progress on
    upgrade. Do you see any other errors reported in datanode log? You need
    to fix the connection problem first.

    Are you comfortable taking tcpdump for Namenode port on the client? I
    think client should be trying to reconnect.

    Note that it is safe to restart the cluster or just the datanodes before
    the upgrade completes.

    Raghu.
    Open Study wrote:
    Also I checked the log of the name node, and found one exception as followed

    2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 6 on 9000: starting
    2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 7 on 9000: starting
    2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 8 on 9000: starting
    2007-09-13 02:17:25,325 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 9 on 9000: starting
    2007-09-13 02:17:25,400 INFO org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
    Block CRC Upgrade is still running.
    Avg completion of all Datanodes: 0.00% with
    0 errors.
    2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
    handler 5 on 9000, call getProtocolVersion(org.apache.hado
    op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
    java.nio.channels.ClosedChannelException
    at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(
    SocketChannelImpl.java:125)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
    at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(
    SocketChannelOutputStream.java:108)
    at org.apache.hadoop.ipc.SocketChannelOutputStream.write(
    SocketChannelOutputStream.java:89)
    at java.io.BufferedOutputStream.flushBuffer(
    BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at java.io.DataOutputStream.flush(DataOutputStream.java:106)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:585)
    2007-09-13 02:18:24,921 INFO org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
    Block CRC Upgrade is still running.
    Avg completion of all Datanodes: 0.00% with 0 errors.

    It seems some thing was going wong on data node side, however the log of one
    of the data nodes show it was started, and it was still running as I can
    find from the processes list, but some how lost connection with the
    name-node.

    ************************************************************/
    2007-09-12 22:23:35,319 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting DataNode
    STARTUP_MSG: host = TE-DN-002/192.168.2.102
    STARTUP_MSG: args = []
    ************************************************************/
    2007-09-12 22:23:35,533 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=DataNode, sessi
    onId=null
    2007-09-12 22:24:35,619 INFO org.apache.hadoop.ipc.RPC: Problem connecting
    to server: /192.168.2.1:9000
    2007-09-12 22:25:34,878 INFO org.apache.hadoop.dfs.Storage: Recovering
    storage directory /home/textd/data/fs/data from previous
    upgrade.
    2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.DataNode:
    Distributed upgrade for DataNode version -6 to current LV -7 is
    initialized.
    2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.Storage: Upgrading
    storage directory /home/textd/data/fs/data.
    old LV = -4; old CTime = 0.
    new LV = -7; new CTime = 1189616555276

    The hardware configuration was
    Namenode: P4D, 3G RAM
    3 Datanodes: AMD 64 4000x2, 1G RAM
    They worked with hadoop 0.13.1

    Any idea or suggestion?
  • Open Study at Sep 12, 2007 at 7:39 pm
    Hi, all

    I noticed there's wrong timestamp from the status report of " ./hadoop
    dfsadmin -upgradeProgress details", although the time setting on the server
    is right, will this matter?

    <-------------------------------------------------------------------------------------------------------------------------
    Distributed upgrade for version -6 is in progress. Status = 0%

    Last Block Level Stats updated at : Thu Jan 01 08:00:00 GMT+08:00
    1970
    Last Block Level Stats : Total Blocks : 0
    Fully Upgragraded : 0.00%
    Minimally Upgraded : 0.00%
    Under Upgraded : 0.00% (includes
    Un-upgraded blocks)
    Un-upgraded : 0.00%
    Errors : 0
    Brief Datanode Status : Avg completion of all Datanodes: 0.00% with
    0 errors.

    Datanode Stats (total: 0): pct Completion(%) blocks upgraded (u)
    blocks remaining (r) errors (e)

    There are no known Datanodes
    ------------------------------------------------------------------------------------------------------------------------->

    Here is the tcpdump I made using "tcpdump host 192.168.2.101 and 192.168.2.1"
    on one of the data-nodes from the start of the cluster to the loss of the
    connection, where 192.168.2.101 is the datanode, and 192.168.2.1 is the
    name-node.

    <-------------------------------------------------------------------------------------------------------------------------
    03:21:01.082055 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: S
    3124566345:3124566345(0) win 5840 <mss 1460,sackOK,timestamp 12085778
    0,nop,wscale 7>
    03:21:01.084143 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: S
    635998938:635998938(0) ack 3124566346 win 5792 <mss 1460,sackOK,timestamp
    211599828 12085778,nop,wscale 7>
    03:21:01.082120 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 1 win
    46 <nop,nop,timestamp 12085778 211599828>
    03:21:01.090313 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P 1:22(21)
    ack 1 win 46 <nop,nop,timestamp 211599830 12085778>
    03:21:01.095758 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 22
    win 46 <nop,nop,timestamp 12085781 211599830>
    03:21:01.095876 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P 1:21(20)
    ack 22 win 46 <nop,nop,timestamp 12085781 211599830>
    03:21:01.095903 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 21
    win 46 <nop,nop,timestamp 211599832 12085781>
    03:21:01.096282 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    21:773(752) ack 22 win 46 <nop,nop,timestamp 12085782 211599832>
    03:21:01.096304 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 773
    win 57 <nop,nop,timestamp 211599832 12085782>
    03:21:01.097154 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    22:766(744) ack 773 win 57 <nop,nop,timestamp 211599832 12085782>
    03:21:01.097795 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    773:797(24) ack 766 win 58 <nop,nop,timestamp 12085782 211599832>
    03:21:01.100199 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    766:918(152) ack 797 win 57 <nop,nop,timestamp 211599833 12085782>
    03:21:01.106536 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    797:941(144) ack 918 win 69 <nop,nop,timestamp 12085784 211599833>
    03:21:01.108781 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    918:1382(464) ack 941 win 69 <nop,nop,timestamp 211599835 12085784>
    03:21:01.113305 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    941:957(16) ack 1382 win 81 <nop,nop,timestamp 12085786 211599835>
    03:21:01.155108 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 957
    win 69 <nop,nop,timestamp 211599846 12085786>
    03:21:01.155199 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    957:1005(48) ack 1382 win 81 <nop,nop,timestamp 12085796 211599846>
    03:21:01.155217 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1005
    win 69 <nop,nop,timestamp 211599846 12085796>
    03:21:01.155273 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    1382:1430(48) ack 1005 win 69 <nop,nop,timestamp 211599847 12085796>
    03:21:01.155453 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    1005:1069(64) ack 1430 win 81 <nop,nop,timestamp 12085796 211599847>
    03:21:01.199106 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1069
    win 69 <nop,nop,timestamp 211599857 12085796>
    03:21:01.214178 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    1430:1494(64) ack 1069 win 69 <nop,nop,timestamp 211599861 12085796>
    03:21:01.214481 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    1069:1597(528) ack 1494 win 81 <nop,nop,timestamp 12085811 211599861>
    03:21:01.214518 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1597
    win 81 <nop,nop,timestamp 211599861 12085811>
    03:21:01.218638 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    1494:1974(480) ack 1597 win 81 <nop,nop,timestamp 211599862 12085811>
    03:21:01.222363 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    1597:2173(576) ack 1974 win 93 <nop,nop,timestamp 12085813 211599862>
    03:21:01.224255 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    1974:2006(32) ack 2173 win 93 <nop,nop,timestamp 211599864 12085813>
    03:21:01.224521 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    2173:2237(64) ack 2006 win 93 <nop,nop,timestamp 12085814 211599864>
    03:21:01.227368 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    2006:2054(48) ack 2237 win 93 <nop,nop,timestamp 211599865 12085814>
    03:21:01.227689 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    2237:2493(256) ack 2054 win 93 <nop,nop,timestamp 12085814 211599865>
    03:21:01.228913 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    2054:2102(48) ack 2493 win 104 <nop,nop,timestamp 211599865 12085814>
    03:21:01.268981 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2102
    win 93 <nop,nop,timestamp 12085825 211599865>
    03:21:01.344551 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    2102:2246(144) ack 2493 win 104 <nop,nop,timestamp 211599894 12085825>
    03:21:01.344689 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2246
    win 104 <nop,nop,timestamp 12085844 211599894>
    03:21:02.037296 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: S
    638840154:638840154(0) win 5840 <mss 1460,sackOK,timestamp 211600067
    0,nop,wscale 7>
    03:21:02.037414 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: S
    3130232567:3130232567(0) ack 638840155 win 5792 <mss 1460,sackOK,timestamp
    12086017 211600067,nop,wscale 7>
    03:21:02.037473 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
    ack 1 win 46 <nop,nop,timestamp 211600067 12086017>
    03:21:02.049490 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
    1:110(109) ack 1 win 46 <nop,nop,timestamp 211600070 12086017>
    03:21:02.049626 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
    ack 110 win 46 <nop,nop,timestamp 12086020 211600070>
    03:21:02.357928 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    2246:2278(32) ack 2493 win 104 <nop,nop,timestamp 211600147 12085844>
    03:21:02.358048 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2278
    win 104 <nop,nop,timestamp 12086097 211600147>
    03:21:02.358089 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
    2278:2374(96) ack 2493 win 104 <nop,nop,timestamp 211600147 12086097>
    03:21:02.358178 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2374
    win 104 <nop,nop,timestamp 12086097 211600147>
    03:21:02.358316 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
    2493:2525(32) ack 2374 win 104 <nop,nop,timestamp 12086097 211600147>
    03:21:02.358356 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: F
    2525:2525(0) ack 2374 win 104 <nop,nop,timestamp 12086097 211600147>
    03:21:02.359169 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: F
    2374:2374(0) ack 2526 win 104 <nop,nop,timestamp 211600147 12086097>
    03:21:02.359254 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2375
    win 104 <nop,nop,timestamp 12086097 211600147>
    03:22:03.064540 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
    110:214(104) ack 1 win 46 <nop,nop,timestamp 211615323 12086020>
    03:22:03.064664 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
    ack 214 win 46 <nop,nop,timestamp 12101272 211615323>
    03:22:08.065775 arp who-has TE-DN-001.local.TEST tell 192.168.2.1
    03:22:08.065791 arp reply TE-DN-001.local.TEST is-at 00:18:37:02:74:76 (oui
    Unknown)
    03:22:54.349567 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
    1:20(19) ack 214 win 46 <nop,nop,timestamp 12114091 211615323>
    03:22:54.349624 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
    ack 20 win 46 <nop,nop,timestamp 211628143 12114091>
    03:22:54.349708 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
    20:39(19) ack 214 win 46 <nop,nop,timestamp 12114091 211628143>
    03:22:54.349718 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
    ack 39 win 46 <nop,nop,timestamp 211628143 12114091>
    03:22:54.385237 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
    214:242(28) ack 39 win 46 <nop,nop,timestamp 211628152 12114091>
    03:22:54.385342 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
    ack 242 win 46 <nop,nop,timestamp 12114100 211628152>
    03:22:54.391417 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
    39:146(107) ack 242 win 46 <nop,nop,timestamp 12114101 211628152>
    03:22:54.433048 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
    ack 146 win 46 <nop,nop,timestamp 211628164 12114101>
    03:22:55.525390 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: F
    242:242(0) ack 146 win 46 <nop,nop,timestamp 211628437 12114101>
    03:22:55.525719 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: F
    146:146(0) ack 243 win 46 <nop,nop,timestamp 12114385 211628437>
    03:22:55.525746 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
    ack 147 win 46 <nop,nop,timestamp 211628437 12114385>
    ------------------------------------------------------------------------------------------------------------------------->


    On 9/13/07, Raghu Angadi wrote:

    Hi,

    Datanode should be able to connect to Namenode for any progress on
    upgrade. Do you see any other errors reported in datanode log? You need
    to fix the connection problem first.

    Are you comfortable taking tcpdump for Namenode port on the client? I
    think client should be trying to reconnect.

    Note that it is safe to restart the cluster or just the datanodes before
    the upgrade completes.

    Raghu.
    Open Study wrote:
    Also I checked the log of the name node, and found one exception as followed
    2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 6 on 9000: starting
    2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 7 on 9000: starting
    2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 8 on 9000: starting
    2007-09-13 02:17:25,325 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 9 on 9000: starting
    2007-09-13 02:17:25,400 INFO
    org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
    Block CRC Upgrade is still running.
    Avg completion of all Datanodes: 0.00%with
    0 errors.
    2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
    handler 5 on 9000, call getProtocolVersion(org.apache.hado
    op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
    java.nio.channels.ClosedChannelException
    at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(
    SocketChannelImpl.java:125)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java :294)
    at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(
    SocketChannelOutputStream.java:108)
    at org.apache.hadoop.ipc.SocketChannelOutputStream.write(
    SocketChannelOutputStream.java:89)
    at java.io.BufferedOutputStream.flushBuffer(
    BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java :123)
    at java.io.DataOutputStream.flush(DataOutputStream.java:106)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:585)
    2007-09-13 02:18:24,921 INFO
    org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
    Block CRC Upgrade is still running.
    Avg completion of all Datanodes: 0.00% with 0 errors.

    It seems some thing was going wong on data node side, however the log of one
    of the data nodes show it was started, and it was still running as I can
    find from the processes list, but some how lost connection with the
    name-node.

    ************************************************************/
    2007-09-12 22:23:35,319 INFO org.apache.hadoop.dfs.DataNode:
    STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting DataNode
    STARTUP_MSG: host = TE-DN-002/192.168.2.102
    STARTUP_MSG: args = []
    ************************************************************/
    2007-09-12 22:23:35,533 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=DataNode, sessi
    onId=null
    2007-09-12 22:24:35,619 INFO org.apache.hadoop.ipc.RPC: Problem
    connecting
    to server: /192.168.2.1:9000
    2007-09-12 22:25:34,878 INFO org.apache.hadoop.dfs.Storage: Recovering
    storage directory /home/textd/data/fs/data from previous
    upgrade.
    2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.DataNode:
    Distributed upgrade for DataNode version -6 to current LV -7 is
    initialized.
    2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.Storage: Upgrading
    storage directory /home/textd/data/fs/data.
    old LV = -4; old CTime = 0.
    new LV = -7; new CTime = 1189616555276

    The hardware configuration was
    Namenode: P4D, 3G RAM
    3 Datanodes: AMD 64 4000x2, 1G RAM
    They worked with hadoop 0.13.1

    Any idea or suggestion?
  • Raghu Angadi at Sep 12, 2007 at 8:00 pm

    Open Study wrote:
    Hi, all

    I noticed there's wrong timestamp from the status report of " ./hadoop
    dfsadmin -upgradeProgress details", although the time setting on the server
    is right, will this matter?
    No. It just means the stats were not updated yet (yeah, it probably
    should say "never" instead of some 1970 date).

    For some reason datanode closes the connection. I don't know why. I
    assume datanode does not have any errors in log.

    Could you try restarting one of the datanode? Do these datanodes
    register well (i.e. do the appear on Namenode webpage)?

    thanks.
    Raghu.
    <-------------------------------------------------------------------------------------------------------------------------
    Distributed upgrade for version -6 is in progress. Status = 0%

    Last Block Level Stats updated at : Thu Jan 01 08:00:00 GMT+08:00
    1970
    Last Block Level Stats : Total Blocks : 0
    Fully Upgragraded : 0.00%
    Minimally Upgraded : 0.00%
    Under Upgraded : 0.00% (includes
    Un-upgraded blocks)
    Un-upgraded : 0.00%
    Errors : 0
    Brief Datanode Status : Avg completion of all Datanodes: 0.00% with
    0 errors.

    Datanode Stats (total: 0): pct Completion(%) blocks upgraded (u)
    blocks remaining (r) errors (e)

    There are no known Datanodes
    ------------------------------------------------------------------------------------------------------------------------->

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 12, '07 at 6:29p
activeSep 12, '07 at 8:00p
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase