FAQ
Hi,
It seems the error not happen at the reduce output but the map output.
It is likely the local file system of some node running the map task
doesn't have enough space for the map output. If there are lots of
items sharing the same key, you can use combiner at the map phase
before output the intermediate result.


-Gang



----- Original Message ----
From: himanshu chandola <himanshu_coolguy@yahoo.com>
To: common-user@hadoop.apache.org
Sent: 2009/12/31 5:10:10
Subject: large reducer output with same key

Hi Everyone,
My reducer output results in most of the data having the same key. The
reducer output is close to 16 GB and though my cluster in total has a
terabyte of space in hdfs I get errors like the following :
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
Could not find any valid local directory for
task_200808021906_0002_m_000014_2/spill4.out
After such failures, hadoop tries to start the same reduce job couple
times on other nodes before the job fails. From the
exception, it looks to me this is
probably a disk error(some machines have less than 16 gigs free space
on hdfs).

So my question was whether hadoop puts values which share the same key
as a single block in one node ? Or something else
could be happening here ?

Thanks

H

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 1, '10 at 6:53a
activeJan 1, '10 at 6:53a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Gang: 1 post

People

Translate

site design / logo © 2022 Grokbase