FAQ
Hi, all

I noticed there's wrong timestamp from the status report of " ./hadoop
dfsadmin -upgradeProgress details", although the time setting on the server
is right, will this matter?

<-------------------------------------------------------------------------------------------------------------------------
Distributed upgrade for version -6 is in progress. Status = 0%

Last Block Level Stats updated at : Thu Jan 01 08:00:00 GMT+08:00
1970
Last Block Level Stats : Total Blocks : 0
Fully Upgragraded : 0.00%
Minimally Upgraded : 0.00%
Under Upgraded : 0.00% (includes
Un-upgraded blocks)
Un-upgraded : 0.00%
Errors : 0
Brief Datanode Status : Avg completion of all Datanodes: 0.00% with
0 errors.

Datanode Stats (total: 0): pct Completion(%) blocks upgraded (u)
blocks remaining (r) errors (e)

There are no known Datanodes
------------------------------------------------------------------------------------------------------------------------->

Here is the tcpdump I made using "tcpdump host 192.168.2.101 and 192.168.2.1"
on one of the data-nodes from the start of the cluster to the loss of the
connection, where 192.168.2.101 is the datanode, and 192.168.2.1 is the
name-node.

<-------------------------------------------------------------------------------------------------------------------------
03:21:01.082055 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: S
3124566345:3124566345(0) win 5840 <mss 1460,sackOK,timestamp 12085778
0,nop,wscale 7>
03:21:01.084143 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: S
635998938:635998938(0) ack 3124566346 win 5792 <mss 1460,sackOK,timestamp
211599828 12085778,nop,wscale 7>
03:21:01.082120 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 1 win
46 <nop,nop,timestamp 12085778 211599828>
03:21:01.090313 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P 1:22(21)
ack 1 win 46 <nop,nop,timestamp 211599830 12085778>
03:21:01.095758 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 22
win 46 <nop,nop,timestamp 12085781 211599830>
03:21:01.095876 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P 1:21(20)
ack 22 win 46 <nop,nop,timestamp 12085781 211599830>
03:21:01.095903 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 21
win 46 <nop,nop,timestamp 211599832 12085781>
03:21:01.096282 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
21:773(752) ack 22 win 46 <nop,nop,timestamp 12085782 211599832>
03:21:01.096304 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 773
win 57 <nop,nop,timestamp 211599832 12085782>
03:21:01.097154 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
22:766(744) ack 773 win 57 <nop,nop,timestamp 211599832 12085782>
03:21:01.097795 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
773:797(24) ack 766 win 58 <nop,nop,timestamp 12085782 211599832>
03:21:01.100199 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
766:918(152) ack 797 win 57 <nop,nop,timestamp 211599833 12085782>
03:21:01.106536 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
797:941(144) ack 918 win 69 <nop,nop,timestamp 12085784 211599833>
03:21:01.108781 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
918:1382(464) ack 941 win 69 <nop,nop,timestamp 211599835 12085784>
03:21:01.113305 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
941:957(16) ack 1382 win 81 <nop,nop,timestamp 12085786 211599835>
03:21:01.155108 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 957
win 69 <nop,nop,timestamp 211599846 12085786>
03:21:01.155199 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
957:1005(48) ack 1382 win 81 <nop,nop,timestamp 12085796 211599846>
03:21:01.155217 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1005
win 69 <nop,nop,timestamp 211599846 12085796>
03:21:01.155273 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1382:1430(48) ack 1005 win 69 <nop,nop,timestamp 211599847 12085796>
03:21:01.155453 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
1005:1069(64) ack 1430 win 81 <nop,nop,timestamp 12085796 211599847>
03:21:01.199106 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1069
win 69 <nop,nop,timestamp 211599857 12085796>
03:21:01.214178 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1430:1494(64) ack 1069 win 69 <nop,nop,timestamp 211599861 12085796>
03:21:01.214481 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
1069:1597(528) ack 1494 win 81 <nop,nop,timestamp 12085811 211599861>
03:21:01.214518 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1597
win 81 <nop,nop,timestamp 211599861 12085811>
03:21:01.218638 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1494:1974(480) ack 1597 win 81 <nop,nop,timestamp 211599862 12085811>
03:21:01.222363 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
1597:2173(576) ack 1974 win 93 <nop,nop,timestamp 12085813 211599862>
03:21:01.224255 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1974:2006(32) ack 2173 win 93 <nop,nop,timestamp 211599864 12085813>
03:21:01.224521 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
2173:2237(64) ack 2006 win 93 <nop,nop,timestamp 12085814 211599864>
03:21:01.227368 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2006:2054(48) ack 2237 win 93 <nop,nop,timestamp 211599865 12085814>
03:21:01.227689 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
2237:2493(256) ack 2054 win 93 <nop,nop,timestamp 12085814 211599865>
03:21:01.228913 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2054:2102(48) ack 2493 win 104 <nop,nop,timestamp 211599865 12085814>
03:21:01.268981 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2102
win 93 <nop,nop,timestamp 12085825 211599865>
03:21:01.344551 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2102:2246(144) ack 2493 win 104 <nop,nop,timestamp 211599894 12085825>
03:21:01.344689 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2246
win 104 <nop,nop,timestamp 12085844 211599894>
03:21:02.037296 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: S
638840154:638840154(0) win 5840 <mss 1460,sackOK,timestamp 211600067
0,nop,wscale 7>
03:21:02.037414 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: S
3130232567:3130232567(0) ack 638840155 win 5792 <mss 1460,sackOK,timestamp
12086017 211600067,nop,wscale 7>
03:21:02.037473 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 1 win 46 <nop,nop,timestamp 211600067 12086017>
03:21:02.049490 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
1:110(109) ack 1 win 46 <nop,nop,timestamp 211600070 12086017>
03:21:02.049626 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
ack 110 win 46 <nop,nop,timestamp 12086020 211600070>
03:21:02.357928 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2246:2278(32) ack 2493 win 104 <nop,nop,timestamp 211600147 12085844>
03:21:02.358048 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2278
win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.358089 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2278:2374(96) ack 2493 win 104 <nop,nop,timestamp 211600147 12086097>
03:21:02.358178 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2374
win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.358316 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
2493:2525(32) ack 2374 win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.358356 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: F
2525:2525(0) ack 2374 win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.359169 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: F
2374:2374(0) ack 2526 win 104 <nop,nop,timestamp 211600147 12086097>
03:21:02.359254 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2375
win 104 <nop,nop,timestamp 12086097 211600147>
03:22:03.064540 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
110:214(104) ack 1 win 46 <nop,nop,timestamp 211615323 12086020>
03:22:03.064664 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
ack 214 win 46 <nop,nop,timestamp 12101272 211615323>
03:22:08.065775 arp who-has TE-DN-001.local.TEST tell 192.168.2.1
03:22:08.065791 arp reply TE-DN-001.local.TEST is-at 00:18:37:02:74:76 (oui
Unknown)
03:22:54.349567 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
1:20(19) ack 214 win 46 <nop,nop,timestamp 12114091 211615323>
03:22:54.349624 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 20 win 46 <nop,nop,timestamp 211628143 12114091>
03:22:54.349708 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
20:39(19) ack 214 win 46 <nop,nop,timestamp 12114091 211628143>
03:22:54.349718 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 39 win 46 <nop,nop,timestamp 211628143 12114091>
03:22:54.385237 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
214:242(28) ack 39 win 46 <nop,nop,timestamp 211628152 12114091>
03:22:54.385342 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
ack 242 win 46 <nop,nop,timestamp 12114100 211628152>
03:22:54.391417 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
39:146(107) ack 242 win 46 <nop,nop,timestamp 12114101 211628152>
03:22:54.433048 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 146 win 46 <nop,nop,timestamp 211628164 12114101>
03:22:55.525390 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: F
242:242(0) ack 146 win 46 <nop,nop,timestamp 211628437 12114101>
03:22:55.525719 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: F
146:146(0) ack 243 win 46 <nop,nop,timestamp 12114385 211628437>
03:22:55.525746 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 147 win 46 <nop,nop,timestamp 211628437 12114385>
------------------------------------------------------------------------------------------------------------------------->


On 9/13/07, Raghu Angadi wrote:

Hi,

Datanode should be able to connect to Namenode for any progress on
upgrade. Do you see any other errors reported in datanode log? You need
to fix the connection problem first.

Are you comfortable taking tcpdump for Namenode port on the client? I
think client should be trying to reconnect.

Note that it is safe to restart the cluster or just the datanodes before
the upgrade completes.

Raghu.
Open Study wrote:
Also I checked the log of the name node, and found one exception as followed
2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 9000: starting
2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 9000: starting
2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 9000: starting
2007-09-13 02:17:25,325 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 9000: starting
2007-09-13 02:17:25,400 INFO
org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
Block CRC Upgrade is still running.
Avg completion of all Datanodes: 0.00%with
0 errors.
2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000, call getProtocolVersion(org.apache.hado
op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(
SocketChannelImpl.java:125)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java :294)
at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(
SocketChannelOutputStream.java:108)
at org.apache.hadoop.ipc.SocketChannelOutputStream.write(
SocketChannelOutputStream.java:89)
at java.io.BufferedOutputStream.flushBuffer(
BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java :123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:585)
2007-09-13 02:18:24,921 INFO
org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
Block CRC Upgrade is still running.
Avg completion of all Datanodes: 0.00% with 0 errors.

It seems some thing was going wong on data node side, however the log of one
of the data nodes show it was started, and it was still running as I can
find from the processes list, but some how lost connection with the
name-node.

************************************************************/
2007-09-12 22:23:35,319 INFO org.apache.hadoop.dfs.DataNode:
STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = TE-DN-002/192.168.2.102
STARTUP_MSG: args = []
************************************************************/
2007-09-12 22:23:35,533 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessi
onId=null
2007-09-12 22:24:35,619 INFO org.apache.hadoop.ipc.RPC: Problem
connecting
to server: /192.168.2.1:9000
2007-09-12 22:25:34,878 INFO org.apache.hadoop.dfs.Storage: Recovering
storage directory /home/textd/data/fs/data from previous
upgrade.
2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.DataNode:
Distributed upgrade for DataNode version -6 to current LV -7 is
initialized.
2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.Storage: Upgrading
storage directory /home/textd/data/fs/data.
old LV = -4; old CTime = 0.
new LV = -7; new CTime = 1189616555276

The hardware configuration was
Namenode: P4D, 3G RAM
3 Datanodes: AMD 64 4000x2, 1G RAM
They worked with hadoop 0.13.1

Any idea or suggestion?

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 7 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 12, '07 at 6:29p
activeSep 12, '07 at 8:00p
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase