FAQ
Hello all,

Occasionally when running jobs, Hadoop fails to clean up the
"_temporary" directories it has left behind. This only appears to
happen when a task is killed (aka a speculative execution), and the
data that task has outputted so far is not cleaned up. Is this a known
issue in hadoop? Is the data from that task guaranteed to be duplicate
data of what was outputted by another task? Is it safe to just delete
this directory without worrying about losing data?

Thanks,
Nathan Marz
Rapleaf

Search Discussions

  • Amareshwari Sriramadasu at Nov 5, 2008 at 4:14 am

    Nathan Marz wrote:
    Hello all,

    Occasionally when running jobs, Hadoop fails to clean up the
    "_temporary" directories it has left behind. This only appears to
    happen when a task is killed (aka a speculative execution), and the
    data that task has outputted so far is not cleaned up. Is this a known
    issue in hadoop?
    Yes. It is possible that _temporary gets created by a speculative, after
    the cleanup in some corner cases.
    Is the data from that task guaranteed to be duplicate data of what was
    outputted by another task? Is it safe to just delete this directory
    without worrying about losing data?
    Yes. You are right. It is duplicate data created by the speculative
    task. You can go ahead and delete it.
    -Amareshwari
    Thanks,
    Nathan Marz
    Rapleaf

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 4, '08 at 7:47p
activeNov 5, '08 at 4:14a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase