FAQ
I have a loop that runs over a large number of iterations (order of 100,000)
very quickly. It is nice to do context.setStatus() with an indication of
where I am in the loop. Currently I'm only calling setStatus() every 10,000
iterations because I don't want to overwhelm the task trackers with lots of
status messages. Is this something I should be worried, about or is Hadoop
designed to handle a high volume of status messages? If so, I'll just call
setStatus() every iteration.

Search Discussions

  • Ted Dunning at Dec 23, 2010 at 7:42 pm
    It is reasonable to update counters often, but I think you are right to
    limit the number status updates.
    On Thu, Dec 23, 2010 at 11:15 AM, W.P. McNeill wrote:

    I have a loop that runs over a large number of iterations (order of
    100,000)
    very quickly. It is nice to do context.setStatus() with an indication of
    where I am in the loop. Currently I'm only calling setStatus() every
    10,000
    iterations because I don't want to overwhelm the task trackers with lots of
    status messages. Is this something I should be worried, about or is Hadoop
    designed to handle a high volume of status messages? If so, I'll just call
    setStatus() every iteration.
  • Ken at Dec 23, 2010 at 8:50 pm
    If I remember correctly, status is only sent on heartbeat. Which means if you are setting inside a fast running loop, you won't see every status message, only the status message that was current when the heartbeat was sent to the jobtracker.

    Sent from my iPad
    On Dec 23, 2010, at 11:41 AM, Ted Dunning wrote:

    It is reasonable to update counters often, but I think you are right to
    limit the number status updates.
    On Thu, Dec 23, 2010 at 11:15 AM, W.P. McNeill wrote:

    I have a loop that runs over a large number of iterations (order of
    100,000)
    very quickly. It is nice to do context.setStatus() with an indication of
    where I am in the loop. Currently I'm only calling setStatus() every
    10,000
    iterations because I don't want to overwhelm the task trackers with lots of
    status messages. Is this something I should be worried, about or is Hadoop
    designed to handle a high volume of status messages? If so, I'll just call
    setStatus() every iteration.
  • W.P. McNeill at Dec 23, 2010 at 9:14 pm
    I figured that was the case and it's okay if I don't see every status
    message, as long as it doesn't hurt anything to send them.
    On Thu, Dec 23, 2010 at 12:51 PM, Ken wrote:

    If I remember correctly, status is only sent on heartbeat. Which means if
    you are setting inside a fast running loop, you won't see every status
    message, only the status message that was current when the heartbeat was
    sent to the jobtracker.

    Sent from my iPad
    On Dec 23, 2010, at 11:41 AM, Ted Dunning wrote:

    It is reasonable to update counters often, but I think you are right to
    limit the number status updates.
    On Thu, Dec 23, 2010 at 11:15 AM, W.P. McNeill wrote:

    I have a loop that runs over a large number of iterations (order of
    100,000)
    very quickly. It is nice to do context.setStatus() with an indication
    of
    where I am in the loop. Currently I'm only calling setStatus() every
    10,000
    iterations because I don't want to overwhelm the task trackers with lots
    of
    status messages. Is this something I should be worried, about or is
    Hadoop
    designed to handle a high volume of status messages? If so, I'll just
    call
    setStatus() every iteration.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 23, '10 at 7:15p
activeDec 23, '10 at 9:14p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

W.P. McNeill: 2 posts Ted Dunning: 1 post Ken: 1 post

People

Translate

site design / logo © 2022 Grokbase