FAQ
Hi everybody,
as part of my project work at school I'm running some Hadoop jobs on a
cluster. I'd like to measure exactly how long each phase of the process
takes: mapping, shuffling (ideally divided in copying and sorting) and
reducing. The tasktracker logs do not seem to supply the start/end times for
each phase, at least not all of them, even when the log level is set to
DEBUG.
Do you have any ideas on how I could work this out?
Thanks
Antonio

Search Discussions

  • Simone Leo at Mar 17, 2010 at 3:45 pm
    At the default log level, Hadoop job logs (the ones you also get in the
    job's output directory under _logs/history) contain entries like the
    following:

    ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002"
    TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0"
    START_TIME="1220331166789"
    HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755"

    ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002"
    TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0"
    TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1220332036001"
    SORT_FINISHED="1220332036014" FINISH_TIME="1220332063254"
    HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755"

    You get start time, shuffle finish time, sort finish time and overall
    finish time. Similarly, you get start and finish time for MapAttempt
    entries.

    Hope this helps,

    Simone
    On 03/17/10 12:47, Antonio D'Ettole wrote:
    Hi everybody,
    as part of my project work at school I'm running some Hadoop jobs on a
    cluster. I'd like to measure exactly how long each phase of the process
    takes: mapping, shuffling (ideally divided in copying and sorting) and
    reducing. The tasktracker logs do not seem to supply the start/end times for
    each phase, at least not all of them, even when the log level is set to
    DEBUG.
    Do you have any ideas on how I could work this out?
    Thanks
    Antonio

    --
    Simone Leo
    Distributed Computing group
    Advanced Computing and Communications program
    CRS4
    POLARIS - Building #1
    Piscina Manna
    I-09010 Pula (CA) - Italy
    e-mail: simleo@crs4.it
    http://www.crs4.it
  • Owen O'Malley at Mar 17, 2010 at 3:46 pm

    On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote:

    Hi everybody,
    as part of my project work at school I'm running some Hadoop jobs on a
    cluster. I'd like to measure exactly how long each phase of the
    process
    takes: mapping, shuffling (ideally divided in copying and sorting) and
    reducing.
    Look at the job history logs. They break down the times for each task.
    You need to run a script to aggregate them. You can see an example of
    the aggregation on my petabyte sort description:

    http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html

    -- Owen
  • Antonio D'Ettole at Mar 17, 2010 at 10:16 pm

    At the default log level, Hadoop job logs (the ones you also get in the
    job's output directory under _logs/history)

    Thanks Simone, that's exactly what I was looking for.

    Look at the job history logs. They break down the times for each task


    I understand you guys are talking about the same thing? I'm using the file
    in /outputDir/__logs/history . Interestingly, before you told me, I was
    convinced that was actually a .jar archive so it took me a little while to
    figure out where these history logs where :)

    Thanks again folks!
    Antonio
    On Wed, Mar 17, 2010 at 4:45 PM, Owen O'Malley wrote:


    On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote:

    Hi everybody,
    as part of my project work at school I'm running some Hadoop jobs on a
    cluster. I'd like to measure exactly how long each phase of the process
    takes: mapping, shuffling (ideally divided in copying and sorting) and
    reducing.
    Look at the job history logs. They break down the times for each task. You
    need to run a script to aggregate them. You can see an example of the
    aggregation on my petabyte sort description:


    http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 17, '10 at 11:47a
activeMar 17, '10 at 10:16p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase