Harsh,

Sorry for creating confusion.
The question is if i have a single node setup and i give Sysout statements
in maptask.java and reducetask.java.
{HADOOP_HOME}$ant build
{HADOOP_HOME}$start all daemons
{HADOOP_HOME}$ run wordcount example

Yes i am able to see o/p in *.out files of tasktrackers.

Q>Does the map/reduce task run time displayed in web GUI is decent/accurate
enough ?
Q>If i want to do find the IO rate of a task, will the task run time
divided by total number of FIle bytes and HDFS bytes read/written give it
approximately ?
Q>Does the FILE Bytes read for the reduce task include the map output
record bytes read non-locally over network or the bytes read locally from
the map output records after they are copied locally ?

Thanks,
Arun

Search Discussions

  • Harsh J at Dec 3, 2011 at 9:14 am
    Arun,

    Inline again.
    On 03-Dec-2011, at 12:39 PM, arun k wrote:

    Q>Does the map/reduce task run time displayed in web GUI is decent/accurate enough ?
    Don't see why not. We only display what's been genuinely collected. What you get out of an API on the CLI is absolutely the same thing. Or perhaps I do not understand your question completely here - what's led you to ask this?
    Q>If i want to do find the IO rate of a task, will the task run time divided by total number of FIle bytes and HDFS bytes read/written give it approximately ?
    Yes, that should give you a stop-watch measure. Task start -> Task end, and the counters the task puts up for itself.
    Q>Does the FILE Bytes read for the reduce task include the map output record bytes read non-locally over network or the bytes read locally from the map output records after they are copied locally ?
    FILE counters are from whatever is read off a local filesystem (file:///), so would mean the latter. If you look again, you will notice another counter named "Reduce shuffle bytes" that gives you the former count - separately.
  • Arun k at Dec 3, 2011 at 2:30 pm
    Harsh,

    I wanted to conform about it b'coz in case if it doesn't i want to write
    code to capture it.

    Does it make sense to classify a map/reduce task as I/O bound or cpu bound
    based on its I/O rate ?

    Arun
    On Sat, Dec 3, 2011 at 2:43 PM, Harsh J wrote:

    Arun,

    Inline again.

    On 03-Dec-2011, at 12:39 PM, arun k wrote:


    Q>Does the map/reduce task run time displayed in web GUI is
    decent/accurate enough ?


    Don't see why not. We only display what's been genuinely collected. What
    you get out of an API on the CLI is absolutely the same thing. Or perhaps I
    do not understand your question completely here - what's led you to ask
    this?

    Q>If i want to do find the IO rate of a task, will the task run time
    divided by total number of FIle bytes and HDFS bytes read/written give it
    approximately ?


    Yes, that should give you a stop-watch measure. Task start -> Task end,
    and the counters the task puts up for itself.

    Q>Does the FILE Bytes read for the reduce task include the map output
    record bytes read non-locally over network or the bytes read locally from
    the map output records after they are copied locally ?


    FILE counters are from whatever is read off a local filesystem (file:///),
    so would mean the latter. If you look again, you will notice another
    counter named "Reduce shuffle bytes" that gives you the former count -
    separately.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 3, '11 at 7:10a
activeDec 3, '11 at 2:30p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Arun k: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase