I have several counters that I maintain to allow me to keep statistics on
critical operations. I have my code incrementing the counters in an inner
loop
partly to make sure my job is not killed for not making progress. It would
be very easy to keep an internal counter and increment the Hadoop value
less frequently. Assuming
I am currently incrementing a counter several million times in a reduce
task- is this costing me performance and would I be better off incrementing
less frequently

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Search Discussions

  • Alex Kozlov at Dec 8, 2011 at 7:22 pm
    Hi Steve, One thing to keep in mind is that the counters in Hadoop are
    passed via heartbeats, so you'll see updates only 2 seconds or so. I have
    seen implementations with 1,000s of counters w/o a noticeable performance
    impact (since they are passed via heartbeats and pre-aggregated in a
    reducer the frequency does not matter).

    --
    Alex Kozlov PhD
    Senior Solutions Architect
    Cloudera, Inc

    Cloudera in Open Source http://www.cloudera.com/company/open-source/
    <http://www.cloudera.com/company/press-center/hadoop-world-nyc/>
    On Thu, Dec 8, 2011 at 10:48 AM, Steve Lewis wrote:

    I have several counters that I maintain to allow me to keep statistics on
    critical operations. I have my code incrementing the counters in an inner
    loop
    partly to make sure my job is not killed for not making progress. It would
    be very easy to keep an internal counter and increment the Hadoop value
    less frequently. Assuming
    I am currently incrementing a counter several million times in a reduce
    task- is this costing me performance and would I be better off incrementing
    less frequently

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com

  • Robert Evans at Dec 8, 2011 at 7:23 pm
    Looking at the code all of the instances of Counter for the increment method just to a value += incr. They are not going to be much more expensive then keeping the counter yourself internally. What happens in that at periodic intervals the counters are then sent to the JT (pre YARN) or the ApplicationMaster (in YARN) where it keeps track of them. It should really have no impact on how often you increment them.

    --Bobby Evans

    On 12/8/11 12:48 PM, "Steve Lewis" wrote:

    I have several counters that I maintain to allow me to keep statistics on critical operations. I have my code incrementing the counters in an inner loop
    partly to make sure my job is not killed for not making progress. It would be very easy to keep an internal counter and increment the Hadoop value less frequently. Assuming
    I am currently incrementing a counter several million times in a reduce task- is this costing me performance and would I be better off incrementing less frequently

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 8, '11 at 6:49p
activeDec 8, '11 at 7:23p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase