FAQ
I'd like to get some idea on how the task scheduler relies on RecordReader.getProgress() with version 0.20.2.

There are times when I don't have an accurate count of the total records to be processed, and I wonder the impact on task scheduling when returning an inaccurate progress percentage. I found that when I return either 0 when not done and 1 when done will make the job hang.

Any advice is greatly appreciated.

Thanks,
Jane

Search Discussions

  • Harsh J at Apr 9, 2011 at 9:12 am
    Hello Jane,
    On Tue, Mar 29, 2011 at 4:40 AM, Jane Chen wrote:
    There are times when I don't have an accurate count of the total records to be processed, and I wonder the impact on task scheduling when returning an inaccurate progress percentage.  I found that when I return either 0 when not done and 1 when done will make the job hang.
    What do you mean when you say the job 'hangs' when you statically set
    it to 0 or 1 always? Do you mean the task gets killed and restarted?

    When progress or status message changes are made, a Task status report
    is sent back via the reporter to the TIP object held by the parent
    TaskTracker. In case a TIP has not received task reports in a while,
    it can go ahead and purge the task claiming that it has hung or gone
    unresponsive (mapred.task.timeout, 600s by default - set to 0 to never
    let it purge) and it gets rescheduled.

    If you're not sure what your progress is while processing stuff in RR,
    set progress to a random value; it shouldn't matter to the framework
    if the progress decreases in value.

    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedMar 28, '11 at 11:11p
activeApr 9, '11 at 9:12a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Harsh J: 1 post Jane Chen: 1 post

People

Translate

site design / logo © 2022 Grokbase