FAQ
I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of
11 c1.xlarge instances (1 master, 10 slaves), that is the biggest
instance available with 20 compute units and 4x 400gb disks.

I wrote some scripts to test many (100's) of configurations running a
particular Hive query to try to make it as fast as possible, but no
matter what I don't seem to be able to get above roughly 45% cpu
utilization on the slaves, and not more than about 1.5% wait state. I
have also measured network traffic and there don't seem to be
bottlenecks there at all.

Here are some typical CPU utilization lines from top on a slave when
running a query:
Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
0.7%st
Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
0.5%st
Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
1.0%st
Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
0.7%st
Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
1.3%st

It seems like if tuned properly, I should be able to max out my cpu (or
my disk) and get roughly twice the performance I am seeing now. None of
the parameters I am tuning seem to be able to achieve this. Adjusting
mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting
the io.file.buffer.size to 4096 does better than the default, but the
rest of the values I am testing seem to have little positive effect.

These are the parameters I am testing, and the values tried:

io.sort.factor=2,3,4,5,10,15,20,25,30,50,100
mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
io.bytes.per.checksum=256,512,1024,2048,4192
mapred.output.compress=true,false
hive.exec.compress.intermediate=true,false
hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200
mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
mapred.merge.recordsBeforeProgress=5000,10000,20000,30000
mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
mapred.job.tracker.handler.count=3,4,5,7,10,15,25
hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
hive.optimize.ppd=true,false
hive.merge.mapredfiles=false,true
io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
io.seqfile.lazydecompress=true,false
io.sort.mb=20,50,75,100,150,200,250,350,500
mapred.compress.map.output=true,false
io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
hive.exec.reducers.bytes.per.reducer=1000000000
dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
mapred.tasktracker.map.tasks.maximum=5,8,12,20

Anyone have any thoughts for other parameters I might try? Am I going
about this the wrong way? Am I missing some other bottleneck?

thanks

Chris Seline

Search Discussions

  • Jason Venner at Oct 14, 2009 at 6:03 am
    are your network interface or the namenode/jobtracker/datanodes saturated

    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline wrote:

    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of 11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no matter
    what I don't seem to be able to get above roughly 45% cpu utilization on the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or my
    disk) and get roughly twice the performance I am seeing now. None of the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting the
    io.file.buffer.size to 4096 does better than the default, but the rest of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100

    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false

    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200

    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000

    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99

    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25

    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true

    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99

    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Chris Seline at Oct 14, 2009 at 2:32 pm
    No, there doesn't seem to be all that much network traffic. Most of the
    time traffic (measured with nethogs) is about 15-30K/s on the master and
    slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
    5-10 seconds on a query that takes 10 minutes, but that is still less
    than what I see in scp transfers on EC2, which is typically about 30 MB/s.

    thanks

    Chris

    Jason Venner wrote:
    are your network interface or the namenode/jobtracker/datanodes saturated


    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline wrote:

    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of 11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no matter
    what I don't seem to be able to get above roughly 45% cpu utilization on the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or my
    disk) and get roughly twice the performance I am seeing now. None of the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting the
    io.file.buffer.size to 4096 does better than the default, but the rest of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100

    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false

    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200

    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000

    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99

    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25

    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true

    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99

    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline

  • Jason Venner at Oct 14, 2009 at 2:50 pm
    I remember having a problem like this at one point, it was related to the
    mean run time of my tasks, and the rate that the jobtracker could start new
    tasks.

    By increasing the split size until the mean run time of my tasks was in the
    minutes, I was able to drive up the utilization.

    On Wed, Oct 14, 2009 at 7:31 AM, Chris Seline wrote:

    No, there doesn't seem to be all that much network traffic. Most of the
    time traffic (measured with nethogs) is about 15-30K/s on the master and
    slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
    5-10 seconds on a query that takes 10 minutes, but that is still less than
    what I see in scp transfers on EC2, which is typically about 30 MB/s.

    thanks

    Chris


    Jason Venner wrote:
    are your network interface or the namenode/jobtracker/datanodes saturated


    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline <chris@searchles.com>
    wrote:


    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of
    11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no
    matter
    what I don't seem to be able to get above roughly 45% cpu utilization on
    the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or
    my
    disk) and get roughly twice the performance I am seeing now. None of the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting
    the
    io.file.buffer.size to 4096 does better than the default, but the rest of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100


    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false


    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200


    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000


    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99


    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25


    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true


    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99


    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going
    about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline




    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Chris Seline at Oct 14, 2009 at 6:27 pm
    That definitely helps a lot! I saw a few people talking about it on the
    webs, and they say to set the value to Long.MAX_VALUE, but that is not
    what I have found to be best. I see about 25% improvement at 300MB
    (300000000), CPU utilization is up to about 50-70%+, but I am still fine
    tuning.

    thanks!

    Chris

    Jason Venner wrote:
    I remember having a problem like this at one point, it was related to the
    mean run time of my tasks, and the rate that the jobtracker could start new
    tasks.

    By increasing the split size until the mean run time of my tasks was in the
    minutes, I was able to drive up the utilization.


    On Wed, Oct 14, 2009 at 7:31 AM, Chris Seline wrote:

    No, there doesn't seem to be all that much network traffic. Most of the
    time traffic (measured with nethogs) is about 15-30K/s on the master and
    slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
    5-10 seconds on a query that takes 10 minutes, but that is still less than
    what I see in scp transfers on EC2, which is typically about 30 MB/s.

    thanks

    Chris


    Jason Venner wrote:

    are your network interface or the namenode/jobtracker/datanodes saturated


    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline <chris@searchles.com>
    wrote:



    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of
    11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no
    matter
    what I don't seem to be able to get above roughly 45% cpu utilization on
    the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or
    my
    disk) and get roughly twice the performance I am seeing now. None of the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting
    the
    io.file.buffer.size to 4096 does better than the default, but the rest of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100


    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false


    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200


    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000


    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99


    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25


    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true


    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99


    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going
    about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline



  • Jason Venner at Oct 15, 2009 at 4:21 am
    The value really varies by job and by cluster, the larger the split, the
    more chance there is that a small number of splits will take much longer to
    complete than the rest resulting in a long job tail where very little of
    your cluster is utilized while they complete.

    The flip side is with very small task the overhead, startup time and
    co-ordination latency (which has been improved) can cause in efficient
    utilization of your cluster resource.

    If you really want to drive up your CPU utilization, reduce your per task
    memory size to the bare minimum, and your JVM's will consume massive amounts
    of CPU doing garbage collection :) It happend at a place I worked where ~60%
    of the job cpu was garbage collection.

    On Wed, Oct 14, 2009 at 11:26 AM, Chris Seline wrote:

    That definitely helps a lot! I saw a few people talking about it on the
    webs, and they say to set the value to Long.MAX_VALUE, but that is not what
    I have found to be best. I see about 25% improvement at 300MB (300000000),
    CPU utilization is up to about 50-70%+, but I am still fine tuning.


    thanks!

    Chris

    Jason Venner wrote:
    I remember having a problem like this at one point, it was related to the
    mean run time of my tasks, and the rate that the jobtracker could start
    new
    tasks.

    By increasing the split size until the mean run time of my tasks was in
    the
    minutes, I was able to drive up the utilization.


    On Wed, Oct 14, 2009 at 7:31 AM, Chris Seline <chris@searchles.com>
    wrote:


    No, there doesn't seem to be all that much network traffic. Most of the
    time traffic (measured with nethogs) is about 15-30K/s on the master and
    slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
    5-10 seconds on a query that takes 10 minutes, but that is still less
    than
    what I see in scp transfers on EC2, which is typically about 30 MB/s.

    thanks

    Chris


    Jason Venner wrote:


    are your network interface or the namenode/jobtracker/datanodes
    saturated


    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline <chris@searchles.com>
    wrote:




    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of
    11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no
    matter
    what I don't seem to be able to get above roughly 45% cpu utilization
    on
    the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or
    my
    disk) and get roughly twice the performance I am seeing now. None of
    the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and
    setting
    the
    io.file.buffer.size to 4096 does better than the default, but the rest
    of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100



    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false



    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99

    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200



    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000



    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99



    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25



    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true



    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99



    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going
    about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline







    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Chris Seline at Oct 15, 2009 at 3:25 pm
    Thanks for your help Jason. I actually did reduce the heap size to 400M
    and it sped things up a few percent. From my experience with jvm's, if
    you can handle lower amounts of heap your app will run faster because GC
    is more efficient for smaller garbage collections (which is also why
    using incremental garbage collection is often desired in web apps). But,
    the contrast to that is if you give it too little heap it can choke your
    app and run the garbage collection too often, sort of like it is
    drowning and gasping for breath.


    Jason Venner wrote:
    The value really varies by job and by cluster, the larger the split, the
    more chance there is that a small number of splits will take much longer to
    complete than the rest resulting in a long job tail where very little of
    your cluster is utilized while they complete.

    The flip side is with very small task the overhead, startup time and
    co-ordination latency (which has been improved) can cause in efficient
    utilization of your cluster resource.

    If you really want to drive up your CPU utilization, reduce your per task
    memory size to the bare minimum, and your JVM's will consume massive amounts
    of CPU doing garbage collection :) It happend at a place I worked where ~60%
    of the job cpu was garbage collection.


    On Wed, Oct 14, 2009 at 11:26 AM, Chris Seline wrote:

    That definitely helps a lot! I saw a few people talking about it on the
    webs, and they say to set the value to Long.MAX_VALUE, but that is not what
    I have found to be best. I see about 25% improvement at 300MB (300000000),
    CPU utilization is up to about 50-70%+, but I am still fine tuning.


    thanks!

    Chris

    Jason Venner wrote:

    I remember having a problem like this at one point, it was related to the
    mean run time of my tasks, and the rate that the jobtracker could start
    new
    tasks.

    By increasing the split size until the mean run time of my tasks was in
    the
    minutes, I was able to drive up the utilization.


    On Wed, Oct 14, 2009 at 7:31 AM, Chris Seline <chris@searchles.com>
    wrote:



    No, there doesn't seem to be all that much network traffic. Most of the
    time traffic (measured with nethogs) is about 15-30K/s on the master and
    slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
    5-10 seconds on a query that takes 10 minutes, but that is still less
    than
    what I see in scp transfers on EC2, which is typically about 30 MB/s.

    thanks

    Chris


    Jason Venner wrote:



    are your network interface or the namenode/jobtracker/datanodes
    saturated


    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline <chris@searchles.com>
    wrote:





    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of
    11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no
    matter
    what I don't seem to be able to get above roughly 45% cpu utilization
    on
    the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or
    my
    disk) and get roughly twice the performance I am seeing now. None of
    the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and
    setting
    the
    io.file.buffer.size to 4096 does better than the default, but the rest
    of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100



    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false



    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99

    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200



    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000



    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99



    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25



    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true



    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99



    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going
    about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline





  • Scott Carey at Oct 15, 2009 at 7:58 pm
    What Hadoop version?

    On a clusster this size there are two things to check right away:

    1. In the Hadoop UI, during the job, are the reduce and map slots close to
    being filled up most of the time, or are tasks completing faster than the
    scheduler can keep up so that there are often many empty slots?

    For 0.19.x and 0.20.x on a small cluster like this, use the Fair Scheduler
    and make sure the configuration parameter that allows it to schedule more
    than one task per heartbeat is on (at least one map and one reduce per,
    which is supported in 0.19.x). This alone will cut times down if the number
    of map and reduce tasks is at least 2x the number of nodes.


    2. Your CPU, Disk and Network aren't saturated -- take a look at the logs of
    the reduce tasks and look for long delays in the shuffle.
    Utilization is throttled by a bug in the reducer shuffle phase, not fixed
    until 0.21. Simply put, a single reduce task won't fetch more than one map
    output from another node every 2 seconds (though it can fetch from multiple
    nodes at once). Fix this by commenting out one line in 0.18.x, 0.19.x or
    0.20.x -- see my comment here:
    http://issues.apache.org/jira/browse/MAPREDUCE-318
    From June 10 2009.
    I saw shuffle times on small clusters with large map task count per node
    ratio decrease by a factor of 30 from that one line fix. It was the only
    way to get the network to ever be close to saturation on any node.

    The delays for low latency jobs on smaller clusters are predominantly
    artificial due to the nature of most RPC being ping-response and most design
    and testing done for large clusters of machines that only run a couple maps
    or reduces per TaskTracker.


    Do the above, and you won't be nearly as sensitive to the size of data per
    task for low latency jobs as out-of-the-box Hadoop. Your overall
    utilization will go up quite a bit.

    -Scott

    On 10/14/09 7:31 AM, "Chris Seline" wrote:

    No, there doesn't seem to be all that much network traffic. Most of the
    time traffic (measured with nethogs) is about 15-30K/s on the master and
    slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
    5-10 seconds on a query that takes 10 minutes, but that is still less
    than what I see in scp transfers on EC2, which is typically about 30 MB/s.

    thanks

    Chris

    Jason Venner wrote:
    are your network interface or the namenode/jobtracker/datanodes saturated


    On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline wrote:

    I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of 11
    c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
    available with 20 compute units and 4x 400gb disks.

    I wrote some scripts to test many (100's) of configurations running a
    particular Hive query to try to make it as fast as possible, but no matter
    what I don't seem to be able to get above roughly 45% cpu utilization on the
    slaves, and not more than about 1.5% wait state. I have also measured
    network traffic and there don't seem to be bottlenecks there at all.

    Here are some typical CPU utilization lines from top on a slave when
    running a query:
    Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,
    0.7%st
    Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,
    0.5%st
    Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,
    1.0%st
    Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,
    0.7%st
    Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,
    1.3%st

    It seems like if tuned properly, I should be able to max out my cpu (or my
    disk) and get roughly twice the performance I am seeing now. None of the
    parameters I am tuning seem to be able to achieve this. Adjusting
    mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting the
    io.file.buffer.size to 4096 does better than the default, but the rest of
    the values I am testing seem to have little positive effect.

    These are the parameters I am testing, and the values tried:

    io.sort.factor=2,3,4,5,10,15,20,25,30,50,100

    mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.9
    0,0.93,0.95,0.97,0.98,0.99
    io.bytes.per.checksum=256,512,1024,2048,4192
    mapred.output.compress=true,false
    hive.exec.compress.intermediate=true,false

    hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.9
    0,0.93,0.95,0.97,0.98,0.99
    mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200

    mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m
    ,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
    mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,20
    mapred.merge.recordsBeforeProgress=5000,10000,20000,30000

    mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0
    .80,0.90,0.93,0.95,0.99

    io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95
    ,0.99
    mapred.job.tracker.handler.count=3,4,5,7,10,15,25

    hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,40
    0000000
    hive.optimize.ppd=true,false
    hive.merge.mapredfiles=false,true

    io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.9
    5,0.97,0.98,0.99

    hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.9
    0,0.93,0.95,0.97,0.98,0.99
    mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
    mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
    io.seqfile.lazydecompress=true,false
    io.sort.mb=20,50,75,100,150,200,250,350,500
    mapred.compress.map.output=true,false
    io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
    hive.exec.reducers.bytes.per.reducer=1000000000
    dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
    mapred.tasktracker.map.tasks.maximum=5,8,12,20

    Anyone have any thoughts for other parameters I might try? Am I going about
    this the wrong way? Am I missing some other bottleneck?

    thanks

    Chris Seline

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 13, '09 at 4:06p
activeOct 15, '09 at 7:58p
posts8
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase