Hi,
I have been using Hadoop's MapReduce only for the past few months. I use it
for data mining purposes. I use a very small cluster, 4 nodes.
1- Name node, 3 - Datanodes and the Job tracker runs on the Name node
itself.
My requirements are to process ~10-12G of compressed data, which amounts to
40-45G of uncompressed data. I need to run 3 jobs on the MapReduce cluster
and get them completed within an hour. i.e., I want one hour's worth of data
processed within an hour. To give more detail, I have a job scheduler
application, which triggers these 3 jobs. Now, when the job scheduler app is
started, it triggers the jobs on a periodic basis. When the jobs are
triggered, the hadoop's job scheduler internally maintains them in a FIFO
queue and schedules them one after the other. When starting afresh(job
scheduler app), the 3 jobs get completed well within an hour; each of them
take < 20 mins. As time goes by, the map reduce jobs start to take a lot
more time. Sometimes, I have seen the jobs to take close to 2.5 hours. I am
unable to reason out what could be causing such huge delays. I checked the
tasktracker nodes, datanodes logs and the jobtracker web UI to see if there
is something abnormal, everything looks fine. Most of the map jobs that are
launched are data local tasks, hence, I do not see a problem there too. The
data nodes are fine too. dfsadmin report shows that all the data nodes are
fine and that there are no unreplicated blocks.
I also noticed that during the initial phase, the map reduce tasks on the
data nodes were using considerable amount of CPU, each of the map reduce
task would consume ~85-90% CPU, but as time progresses, the percentage drops
down. I initially felt that may be I was hitting a cluster limit, but the
very less CPU usage definitely does not support it. Usually the time taken
for the performance degradation from <20 mins to ~2.5 hours is a day and
half.
I would like to know if someone else has faced this issue before and what
may be causing this?
Thanks for your patience in reading thus far!

-Sriram

Search Discussions

  • Chase Bradford at Dec 22, 2010 at 2:43 am
    Check the userlog output directories for your tasks. If you are using NFS for logs, or are spawning many tasks on an ext3 filesystem, then you could be running into long directory creation times for the attempt logs.

    Sent from phnoe
    On Dec 12, 2010, at 1:35 AM, Sriram Ramachandrasekaran wrote:

    Hi,
    I have been using Hadoop's MapReduce only for the past few months. I use it for data mining purposes. I use a very small cluster, 4 nodes.
    1- Name node, 3 - Datanodes and the Job tracker runs on the Name node itself.
    My requirements are to process ~10-12G of compressed data, which amounts to 40-45G of uncompressed data. I need to run 3 jobs on the MapReduce cluster and get them completed within an hour. i.e., I want one hour's worth of data processed within an hour. To give more detail, I have a job scheduler application, which triggers these 3 jobs. Now, when the job scheduler app is started, it triggers the jobs on a periodic basis. When the jobs are triggered, the hadoop's job scheduler internally maintains them in a FIFO queue and schedules them one after the other. When starting afresh(job scheduler app), the 3 jobs get completed well within an hour; each of them take < 20 mins. As time goes by, the map reduce jobs start to take a lot more time. Sometimes, I have seen the jobs to take close to 2.5 hours. I am unable to reason out what could be causing such huge delays. I checked the tasktracker nodes, datanodes logs and the jobtracker web UI to see if there is something abnormal, everything looks fine. Most of the map jobs that are launched are data local tasks, hence, I do not see a problem there too. The data nodes are fine too. dfsadmin report shows that all the data nodes are fine and that there are no unreplicated blocks.
    I also noticed that during the initial phase, the map reduce tasks on the data nodes were using considerable amount of CPU, each of the map reduce task would consume ~85-90% CPU, but as time progresses, the percentage drops down. I initially felt that may be I was hitting a cluster limit, but the very less CPU usage definitely does not support it. Usually the time taken for the performance degradation from <20 mins to ~2.5 hours is a day and half.
    I would like to know if someone else has faced this issue before and what may be causing this?
    Thanks for your patience in reading thus far!

    -Sriram

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 12, '10 at 9:35a
activeDec 22, '10 at 2:43a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase