I have been using Hadoop's MapReduce only for the past few months. I use it
for data mining purposes. I use a very small cluster, 4 nodes.
1- Name node, 3 - Datanodes and the Job tracker runs on the Name node
My requirements are to process ~10-12G of compressed data, which amounts to
40-45G of uncompressed data. I need to run 3 jobs on the MapReduce cluster
and get them completed within an hour. i.e., I want one hour's worth of data
processed within an hour. To give more detail, I have a job scheduler
application, which triggers these 3 jobs. Now, when the job scheduler app is
started, it triggers the jobs on a periodic basis. When the jobs are
triggered, the hadoop's job scheduler internally maintains them in a FIFO
queue and schedules them one after the other. When starting afresh(job
scheduler app), the 3 jobs get completed well within an hour; each of them
take < 20 mins. As time goes by, the map reduce jobs start to take a lot
more time. Sometimes, I have seen the jobs to take close to 2.5 hours. I am
unable to reason out what could be causing such huge delays. I checked the
tasktracker nodes, datanodes logs and the jobtracker web UI to see if there
is something abnormal, everything looks fine. Most of the map jobs that are
launched are data local tasks, hence, I do not see a problem there too. The
data nodes are fine too. dfsadmin report shows that all the data nodes are
fine and that there are no unreplicated blocks.
I also noticed that during the initial phase, the map reduce tasks on the
data nodes were using considerable amount of CPU, each of the map reduce
task would consume ~85-90% CPU, but as time progresses, the percentage drops
down. I initially felt that may be I was hitting a cluster limit, but the
very less CPU usage definitely does not support it. Usually the time taken
for the performance degradation from <20 mins to ~2.5 hours is a day and
I would like to know if someone else has faced this issue before and what
may be causing this?
Thanks for your patience in reading thus far!