I saw some puzzling behavior tonight when running a MapReduce program I

It would perform the mapping just fine, and would begin to shuffle. It got
to 33% complete reduce (end of shuffle) and then the task fails, claiming
that <output_dir>/_temporary was deleted.

I didn't touch HDFS while this was going on.

I tried running the job multiple more times, and this repeated twice more.
Puzzlingly, I was doing bin/hadoop fs -ls <output_dir> periodically in
another window. The _temporary directory got created just fine, but at some
point after shuffling began, it was removed.

I tried to see if I could manually race this, so I did a mkdir _temporary,
and the job proceeded just fine. Even more bizarre, the removal of the
_temporary directory did not occur on any subsequent MR jobs (executions of
the same, unmodified program). So I can't reproduce the bug.

This is on 0.18.2.

It went away, so I'm not *too* concerned, but I'd rather not deal with
heisenbugs if at all possible

So: has anyone seen this behavior? Have you figured out how to reproduce it,
or even better, prevent it?

- Aaron

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
postedJan 23, '09 at 9:30a
activeJan 23, '09 at 9:30a

1 user in discussion

Aaron Kimball: 1 post



site design / logo © 2022 Grokbase