FAQ
redoing each other's work and stomping on each others output files.
I am assuming your tasks (reducers) are generating these files and these are not the output file like part-00000

Looks like you have speculative execution turned on.
hadoop tries to execute parallel attempts of map/reduce tasks if it finds out one of them is falling behind. All those task attempts are appended with a number as you can see _0 and _1.
If you have tasks which generate files to common files, then you hit this problem.
There are two ways out of this
1. turn off speculative execution by setting mapred.speculative.execution to false
2. if you are generating outputs, try to use taskID for unique attempt.
I've attached the JSP output that indicates this; let me know if you
need any other details.
No attachement.

Thanks,
Lohit



----- Original Message ----
From: Anthony Urso <anthony.urso@gmail.com>
To: core-user@hadoop.apache.org
Sent: Monday, August 11, 2008 7:03:45 PM
Subject: Stopping two reducer tasks on two machines from working on the same keys?

I have a Hadoop 0.16.4 cluster that effectively has no HDFS. It's
running a job analyzing data stored on a NAS type system mounted on
each tasktracker.

Unfortunately, the reducers task_200808062237_0031_r_000000_0 and
task_200808062237_0031_r_000000_1 are running simultaneously on the
same keys, redoing each other's work and stomping on each others
output files.

I've attached the JSP output that indicates this; let me know if you
need any other details.

Is this a configuration error, or is it a bug in Hadoop?

Cheers,
Anthony

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 12, '08 at 2:04a
activeAug 12, '08 at 5:37a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Anthony Urso: 2 posts Lohit: 1 post

People

Translate

site design / logo © 2022 Grokbase