FAQ
Hi all,

Is it just me, or is there something strange with Hadoop since ~0.10 or
thereabout .. With older version of Hadoop I would get a nice often
updated progress status for each map task. What I'm seeing now is that
map tasks stay at 0.0% and then finally jump to 100.0% and finish.
Consequently, for jobs with small number of long-running map tasks, the
progress update is very coarse.

As I understand, this progress meter (in absence of map tasks explicitly
setting the progress) was based on the RecordReader reporting of how
much of the current split has been read. Is this something that got
broken on the way? If not, what's the reason for this, and how to fix it?

--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com

Search Discussions

  • Andrzej Bialecki at Mar 15, 2007 at 9:36 pm

    Andrzej Bialecki wrote:
    Hi all,

    Is it just me, or is there something strange with Hadoop since ~0.10
    or thereabout .. With older version of Hadoop I would get a nice often
    updated progress status for each map task. What I'm seeing now is that
    map tasks stay at 0.0% and then finally jump to 100.0% and finish.
    Consequently, for jobs with small number of long-running map tasks,
    the progress update is very coarse.

    As I understand, this progress meter (in absence of map tasks
    explicitly setting the progress) was based on the RecordReader
    reporting of how much of the current split has been read. Is this
    something that got broken on the way? If not, what's the reason for
    this, and how to fix it?
    Does anyone have a suggestion about this problem? It's rather irritating
    - long-running tasks seem to be stuck at 0%, and only jump to 100% at
    the end of the task. This happens with 0.11.2 as well.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Owen O'Malley at Mar 15, 2007 at 10:09 pm

    On Mar 15, 2007, at 2:36 PM, Andrzej Bialecki wrote:

    Does anyone have a suggestion about this problem? It's rather
    irritating - long-running tasks seem to be stuck at 0%, and only
    jump to 100% at the end of the task. This happens with 0.11.2 as well.
    Most of my maps happen so fast that it isn't that easy to watch
    individual ones. The "progress" is based on the getPos() of the
    RecordReader feeding the maps. How long do your maps run?

    -- Owen
  • Andrzej Bialecki at Mar 16, 2007 at 12:23 am

    Owen O'Malley wrote:
    On Mar 15, 2007, at 2:36 PM, Andrzej Bialecki wrote:

    Does anyone have a suggestion about this problem? It's rather
    irritating - long-running tasks seem to be stuck at 0%, and only jump
    to 100% at the end of the task. This happens with 0.11.2 as well.
    Most of my maps happen so fast that it isn't that easy to watch
    individual ones. The "progress" is based on the getPos() of the
    RecordReader feeding the maps. How long do your maps run?
    Several hours up to 1-2 days (Nutch fetcher).

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Espen Amble Kolstad at Mar 15, 2007 at 10:13 pm
    It's tiny bug in SequenceFileRecordReader. A cast to float is needed here
    return (in.getPosition() - start) / (end - start);
    gives
    return (in.getPosition() - start) / (float) (end - start);

    As well as assigning start in the constructor:
    this.start = split.getStart();

    - Espen

    (Sorry, about this not being a patch ... windoze ... arg)

    Andrzej Bialecki wrote:
    Andrzej Bialecki wrote:
    Hi all,

    Is it just me, or is there something strange with Hadoop since ~0.10
    or thereabout .. With older version of Hadoop I would get a nice
    often updated progress status for each map task. What I'm seeing now
    is that map tasks stay at 0.0% and then finally jump to 100.0% and
    finish. Consequently, for jobs with small number of long-running map
    tasks, the progress update is very coarse.

    As I understand, this progress meter (in absence of map tasks
    explicitly setting the progress) was based on the RecordReader
    reporting of how much of the current split has been read. Is this
    something that got broken on the way? If not, what's the reason for
    this, and how to fix it?
    Does anyone have a suggestion about this problem? It's rather
    irritating - long-running tasks seem to be stuck at 0%, and only jump
    to 100% at the end of the task. This happens with 0.11.2 as well.
  • Andrzej Bialecki at Mar 16, 2007 at 1:59 pm

    Espen Amble Kolstad wrote:
    It's tiny bug in SequenceFileRecordReader. A cast to float is needed here
    return (in.getPosition() - start) / (end - start);
    gives
    return (in.getPosition() - start) / (float) (end - start);

    As well as assigning start in the constructor:
    this.start = split.getStart();
    Thanks Espen, that's exactly the issue! I discovered that this bug is
    also replicated in LineRecordReader (which is used by TextInputFormat).
    I'll create a patch and submit it.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 8, '07 at 10:37p
activeMar 16, '07 at 1:59p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase