FAQ
A map/reduce process applied on 3T input data halts for 1 hour at map
57% reduce 19% without any progress.

A same error occurs a millions of times in the huge syslog file. And I
also got a huge stderr file, where the logs are:

Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
not find any valid local directory for
taskTracker/jobcache/job_201106031106_0001/attempt_201106031106_0001_m_000015_0/output/spill1176.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:930)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:401)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:886)
java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:573)
at
edu.uchicago.naivetagger.fivetokens.FiveTokens_step2$Map.map(FiveTokens_step2.java:68)
at
edu.uchicago.naivetagger.fivetokens.FiveTokens_step2$Map.map(FiveTokens_step2.java:36)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
not find any valid local directory for
taskTracker/jobcache/job_201106031106_0001/attempt_201106031106_0001_m_000015_0/output/spill1176.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:930)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:401)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:886)
java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:573)
at
edu.uchicago.naivetagger.fivetokens.FiveTokens_step2$Map.map(FiveTokens_step2.java:68)
at
edu.uchicago.naivetagger.fivetokens.FiveTokens_step2$Map.map(FiveTokens_step2.java:36)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

The userlogs directory explodes to 125G. I was wondering whether I ran
out of disk on any node, so I used "bin/hadoop dfsadmin -report" and it
returns

Configured Capacity: 17001199828992 (15.46 TB)
Present Capacity: 5039523250176 (4.58 TB)
DFS Remaining: 3148881514496 (2.86 TB)
DFS Used: 1890641735680 (1.72 TB)
DFS Used%: 37.52%

Name: 192.168.136.15:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 106267807744 (98.97 GB)
Non DFS Used: 791830847488 (737.45 GB)
DFS Remaining: 101971922944(94.97 GB)
DFS Used%: 10.63%
DFS Remaining%: 10.2%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.33:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 100771901440 (93.85 GB)
Non DFS Used: 899197136896 (837.44 GB)
DFS Remaining: 101539840(96.84 MB)
DFS Used%: 10.08%
DFS Remaining%: 0.01%
Last contact: Fri Jun 03 16:09:15 GMT-06:00 2011


Name: 192.168.136.35:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 116891889664 (108.86 GB)
Non DFS Used: 426219978752 (396.95 GB)
DFS Remaining: 456958709760(425.58 GB)
DFS Used%: 11.69%
DFS Remaining%: 45.69%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.22:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 117867053056 (109.77 GB)
Non DFS Used: 662408974336 (616.92 GB)
DFS Remaining: 219794550784(204.7 GB)
DFS Used%: 11.79%
DFS Remaining%: 21.98%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.34:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 102359646208 (95.33 GB)
Non DFS Used: 568364900352 (529.33 GB)
DFS Remaining: 329346031616(306.73 GB)
DFS Used%: 10.24%
DFS Remaining%: 32.93%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.20:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 120821141504 (112.52 GB)
Non DFS Used: 650393731072 (605.73 GB)
DFS Remaining: 228855705600(213.14 GB)
DFS Used%: 12.08%
DFS Remaining%: 22.88%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.19:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 107614384128 (100.22 GB)
Non DFS Used: 754391699456 (702.58 GB)
DFS Remaining: 138064494592(128.58 GB)
DFS Used%: 10.76%
DFS Remaining%: 13.81%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.31:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 76825395200 (71.55 GB)
Non DFS Used: 923142823936 (859.74 GB)
DFS Remaining: 102359040(97.62 MB)
DFS Used%: 7.68%
DFS Remaining%: 0.01%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.17:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 106580185088 (99.26 GB)
Non DFS Used: 806645080064 (751.25 GB)
DFS Remaining: 86845313024(80.88 GB)
DFS Used%: 10.66%
DFS Remaining%: 8.68%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.36:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 119845072896 (111.61 GB)
Non DFS Used: 784374362112 (730.51 GB)
DFS Remaining: 95851143168(89.27 GB)
DFS Used%: 11.98%
DFS Remaining%: 9.58%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.14:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 123510685696 (115.03 GB)
Non DFS Used: 807602118656 (752.14 GB)
DFS Remaining: 68957773824(64.22 GB)
DFS Used%: 12.35%
DFS Remaining%: 6.9%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.23:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 110193819648 (102.63 GB)
Non DFS Used: 644950650880 (600.66 GB)
DFS Remaining: 244926107648(228.11 GB)
DFS Used%: 11.02%
DFS Remaining%: 24.49%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.12:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 105652305920 (98.4 GB)
Non DFS Used: 464804843520 (432.88 GB)
DFS Remaining: 429613428736(400.11 GB)
DFS Used%: 10.56%
DFS Remaining%: 42.96%
Last contact: Fri Jun 03 16:09:14 GMT-06:00 2011


Name: 192.168.136.11:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 148825817088 (138.6 GB)
Non DFS Used: 485232361472 (451.91 GB)
DFS Remaining: 366012399616(340.88 GB)
DFS Used%: 14.88%
DFS Remaining%: 36.6%
Last contact: Fri Jun 03 16:09:15 GMT-06:00 2011


Name: 192.168.136.24:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 109966073856 (102.41 GB)
Non DFS Used: 641134641152 (597.1 GB)
DFS Remaining: 248969863168(231.87 GB)
DFS Used%: 11%
DFS Remaining%: 24.9%
Last contact: Fri Jun 03 16:09:16 GMT-06:00 2011


Name: 192.168.136.30:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 109333385216 (101.82 GB)
Non DFS Used: 841253269504 (783.48 GB)
DFS Remaining: 49483923456(46.09 GB)
DFS Used%: 10.93%
DFS Remaining%: 4.95%
Last contact: Fri Jun 03 16:09:15 GMT-06:00 2011


Name: 192.168.136.29:50010
Decommission Status : Normal
Configured Capacity: 1000070578176 (931.39 GB)
DFS Used: 107315171328 (99.95 GB)
Non DFS Used: 809729159168 (754.12 GB)
DFS Remaining: 83026247680(77.32 GB)
DFS Used%: 10.73%
DFS Remaining%: 8.3%
Last contact: Fri Jun 03 16:09:16 GMT-06:00 2011

There are two nodes that only have 0.01% DFS remaining, is that the
exact reason halting the program?
If that is the case, is there any balancing option to avoid this: when
one node runs out of storage, is it possible to
automatically switch the output storage to other nodes who have more
free storage?

Thanks!

Shi

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 3, '11 at 10:18p
activeJun 3, '11 at 10:18p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Shi Yu: 1 post

People

Translate

site design / logo © 2022 Grokbase