FAQ
Hi everyone

I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
Playing around with HIve and Map reduce I have a weird behaviour I can't
seem to fix.
One of my nodes takes an unusual long time t0 finish most of it's map tasks.
A normal running time for a map task is about 35 seconds but my node number
2 takes more along the line of 3 minutes.

For another reason I needed to reboot my machines yesterday and the first
job after reboot went on fine and all map tasks around 35 seconds and then
tje jobs after restarted with this wierd behaviour.
This node degrading my cluster performance as whole mkaing my jobs wait a
couple of minutes for the end.

All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a gigaethernet
connection.
I have changed some settings:
- 4GB Mapreduce child java maximum heap size
- jvm reuse -1
- jobtracker heap size 2GB
- tasktaracker heap size 2GB

I have monitored the same hive request to try to get a feel of what is
wrong.
My cpu and ram and all machines are ok used but not full
The map logs show no sign of errors or wait.

In the map logs I see long pauses between operations: I see that node 2
processes data from the hdfs on node1 so there must be network time,
however node 3 also retrives data from node 1 to process and never spends a
full minute on it.

2013-04-10 09:42:18,884 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000
rows
2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000
rows: used memory = 31260136
2013-04-10 09:43:17,820 INFO
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing
file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
2013-04-10 09:44:15,441 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
2013-04-10 09:44:15,441 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows


If anyone has insight on this issue or test I could run to understand
how to solve this I would appreciate it.

Thanks in advance

Search Discussions

  • Serega Sheypak at Apr 15, 2013 at 1:48 pm
    Can you specify:
    1. the block size for HDFS
    2. the list of input files for HDFS with their sizes and replication factor?
    3. the number of mappers, reducers used by HIVE to process data?
    4. Just put the full HIVE log here?


    2013/4/15 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi everyone

    I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
    Playing around with HIve and Map reduce I have a weird behaviour I can't
    seem to fix.
    One of my nodes takes an unusual long time t0 finish most of it's map
    tasks.
    A normal running time for a map task is about 35 seconds but my node
    number 2 takes more along the line of 3 minutes.

    For another reason I needed to reboot my machines yesterday and the first
    job after reboot went on fine and all map tasks around 35 seconds and then
    tje jobs after restarted with this wierd behaviour.
    This node degrading my cluster performance as whole mkaing my jobs wait a
    couple of minutes for the end.

    All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a
    gigaethernet connection.
    I have changed some settings:
    - 4GB Mapreduce child java maximum heap size
    - jvm reuse -1
    - jobtracker heap size 2GB
    - tasktaracker heap size 2GB

    I have monitored the same hive request to try to get a feel of what is
    wrong.
    My cpu and ram and all machines are ok used but not full
    The map logs show no sign of errors or wait.

    In the map logs I see long pauses between operations: I see that node 2
    processes data from the hdfs on node1 so there must be network time,
    however node 3 also retrives data from node 1 to process and never spends a
    full minute on it.

    2013-04-10 09:42:18,884 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000 rows
    2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory = 31260136
    2013-04-10 09:43:17,820 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows



    If anyone has insight on this issue or test I could run to understand how to solve this I would appreciate it.


    Thanks in advance
  • Serega Sheypak at Apr 15, 2013 at 2:51 pm
    Unfortunately log file is not full. It ends on:
    Persistence of Field :
    org.apache.hadoop.hive.metastore.model.MPartition.values [Table :
    "PARTITION_KEY_VALS"]
    2013-04-15 16:20:44,310 INFO DataNucleus.MetaData: Listener found
    initialisation for persistable class
    org.apache.hadoop.hive.metastore.model.MPartition



    2013/4/15 Serega Sheypak <serega.sheypak@gmail.com>
    Can you specify:
    1. the block size for HDFS
    2. the list of input files for HDFS with their sizes and replication
    factor?
    3. the number of mappers, reducers used by HIVE to process data?
    4. Just put the full HIVE log here?


    2013/4/15 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi everyone

    I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
    Playing around with HIve and Map reduce I have a weird behaviour I can't
    seem to fix.
    One of my nodes takes an unusual long time t0 finish most of it's map
    tasks.
    A normal running time for a map task is about 35 seconds but my node
    number 2 takes more along the line of 3 minutes.

    For another reason I needed to reboot my machines yesterday and the first
    job after reboot went on fine and all map tasks around 35 seconds and then
    tje jobs after restarted with this wierd behaviour.
    This node degrading my cluster performance as whole mkaing my jobs wait a
    couple of minutes for the end.

    All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a
    gigaethernet connection.
    I have changed some settings:
    - 4GB Mapreduce child java maximum heap size
    - jvm reuse -1
    - jobtracker heap size 2GB
    - tasktaracker heap size 2GB

    I have monitored the same hive request to try to get a feel of what is
    wrong.
    My cpu and ram and all machines are ok used but not full
    The map logs show no sign of errors or wait.

    In the map logs I see long pauses between operations: I see that node 2
    processes data from the hdfs on node1 so there must be network time,
    however node 3 also retrives data from node 1 to process and never spends a
    full minute on it.

    2013-04-10 09:42:18,884 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000 rows
    2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory = 31260136
    2013-04-10 09:43:17,820 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows




    If anyone has insight on this issue or test I could run to understand how to solve this I would appreciate it.



    Thanks in advance
  • Serega Sheypak at Apr 16, 2013 at 7:29 am
    Ok, I've read these logs. I didn't find anything critical.
    My theory is that one of your mapper gets too big split. And it causes
    skew. That is why I've sked you to provide:
    1. hadoop fs -ls -R /your/hive/input/path
    2. provide replication factor
    3. specify quantity of TaskTrackers, NameNodes? Are they co-located, I mean
    are they installed together on the each node?

    The other theory that you have disk problem.


    2013/4/15 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi everyone

    I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
    Playing around with HIve and Map reduce I have a weird behaviour I can't
    seem to fix.
    One of my nodes takes an unusual long time t0 finish most of it's map
    tasks.
    A normal running time for a map task is about 35 seconds but my node
    number 2 takes more along the line of 3 minutes.

    For another reason I needed to reboot my machines yesterday and the first
    job after reboot went on fine and all map tasks around 35 seconds and then
    tje jobs after restarted with this wierd behaviour.
    This node degrading my cluster performance as whole mkaing my jobs wait a
    couple of minutes for the end.

    All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a
    gigaethernet connection.
    I have changed some settings:
    - 4GB Mapreduce child java maximum heap size
    - jvm reuse -1
    - jobtracker heap size 2GB
    - tasktaracker heap size 2GB

    I have monitored the same hive request to try to get a feel of what is
    wrong.
    My cpu and ram and all machines are ok used but not full
    The map logs show no sign of errors or wait.

    In the map logs I see long pauses between operations: I see that node 2
    processes data from the hdfs on node1 so there must be network time,
    however node 3 also retrives data from node 1 to process and never spends a
    full minute on it.

    2013-04-10 09:42:18,884 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000 rows
    2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory = 31260136
    2013-04-10 09:43:17,820 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows



    If anyone has insight on this issue or test I could run to understand how to solve this I would appreciate it.


    Thanks in advance
  • Serega Sheypak at Apr 16, 2013 at 1:21 pm
    Can I see stats from jobtracker for the job with id *job_201304151717_0001*:

    2013-04-15 17:18:24,628 INFO org.apache.hadoop.mapred.JobInProgress: Input
    size for job job_201304151717_0001 = 20363146982. Number of splits = 71


    2013/4/16 Serega Sheypak <serega.sheypak@gmail.com>
    Ok, I've read these logs. I didn't find anything critical.
    My theory is that one of your mapper gets too big split. And it causes
    skew. That is why I've sked you to provide:
    1. hadoop fs -ls -R /your/hive/input/path
    2. provide replication factor
    3. specify quantity of TaskTrackers, NameNodes? Are they co-located, I
    mean are they installed together on the each node?

    The other theory that you have disk problem.


    2013/4/15 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi everyone

    I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
    Playing around with HIve and Map reduce I have a weird behaviour I can't
    seem to fix.
    One of my nodes takes an unusual long time t0 finish most of it's map
    tasks.
    A normal running time for a map task is about 35 seconds but my node
    number 2 takes more along the line of 3 minutes.

    For another reason I needed to reboot my machines yesterday and the first
    job after reboot went on fine and all map tasks around 35 seconds and then
    tje jobs after restarted with this wierd behaviour.
    This node degrading my cluster performance as whole mkaing my jobs wait a
    couple of minutes for the end.

    All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a
    gigaethernet connection.
    I have changed some settings:
    - 4GB Mapreduce child java maximum heap size
    - jvm reuse -1
    - jobtracker heap size 2GB
    - tasktaracker heap size 2GB

    I have monitored the same hive request to try to get a feel of what is
    wrong.
    My cpu and ram and all machines are ok used but not full
    The map logs show no sign of errors or wait.

    In the map logs I see long pauses between operations: I see that node 2
    processes data from the hdfs on node1 so there must be network time,
    however node 3 also retrives data from node 1 to process and never spends a
    full minute on it.

    2013-04-10 09:42:18,884 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000 rows
    2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory = 31260136
    2013-04-10 09:43:17,820 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows




    If anyone has insight on this issue or test I could run to understand how to solve this I would appreciate it.



    Thanks in advance
  • Serega Sheypak at Apr 16, 2013 at 1:33 pm
    Your files are rather big the common size is about 292519093 bytes =>
    292Mbytes.
    Your replication factor is 1, you have 4 TT and DN, it can lead to poor
    data locality. Your mapper runs and it has no local split (if your files
    are splittable???). TT will ask for a file split/full file (if not
    splittable) from remote DN.

    ***so most of my data is actually really on my master node, being node1
    I don't understand what you meant to say. Your data should be evenly
    distributed (HDFS splits file by blocks).

    My guess is poor locality. I would like if some more experienced group
    member will validate this suggestion.



    2013/4/16 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi Serega

    Thanks for the help

    I have a replication factor of 1 currently so most of my data is actually
    really on my master node, being node1.
    I Have 3 tasktrackers, one per node.
    I have 3 datanodes , one per node
    So datanodes and tastrackers are colocated.
    The main node, node1 the master is in addition also the jobtracker, the
    namenode and the secondary name node. As this was my first cluster I have
    read that this is not ideal. My Master node however performs well.

    I have included a print of the hdfs ls of my hive input path.

    As per your thought I have tried decomissioning node2 that was slowing
    down to see if it was a disk issue. This had the effect of bringing the
    slowdown on node3. I guess this probably points more to a too big a split
    being sent to one to a mapper.

    I am note sure why a mapper would get too big a split. How could I
    investigate this issue?
    could it be the hdfs files that are note evenly distributed?
    Are there ways to make sure the a mapper does not get too big a split.

    thanks again for your help




    On Tue, Apr 16, 2013 at 9:29 AM, Serega Sheypak wrote:

    Ok, I've read these logs. I didn't find anything critical.
    My theory is that one of your mapper gets too big split. And it causes
    skew. That is why I've sked you to provide:
    1. hadoop fs -ls -R /your/hive/input/path
    2. provide replication factor
    3. specify quantity of TaskTrackers, NameNodes? Are they co-located, I
    mean are they installed together on the each node?

    The other theory that you have disk problem.


    2013/4/15 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi everyone

    I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
    Playing around with HIve and Map reduce I have a weird behaviour I can't
    seem to fix.
    One of my nodes takes an unusual long time t0 finish most of it's map
    tasks.
    A normal running time for a map task is about 35 seconds but my node
    number 2 takes more along the line of 3 minutes.

    For another reason I needed to reboot my machines yesterday and the
    first job after reboot went on fine and all map tasks around 35 seconds and
    then tje jobs after restarted with this wierd behaviour.
    This node degrading my cluster performance as whole mkaing my jobs wait
    a couple of minutes for the end.

    All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a
    gigaethernet connection.
    I have changed some settings:
    - 4GB Mapreduce child java maximum heap size
    - jvm reuse -1
    - jobtracker heap size 2GB
    - tasktaracker heap size 2GB

    I have monitored the same hive request to try to get a feel of what is
    wrong.
    My cpu and ram and all machines are ok used but not full
    The map logs show no sign of errors or wait.

    In the map logs I see long pauses between operations: I see that node 2
    processes data from the hdfs on node1 so there must be network time,
    however node 3 also retrives data from node 1 to process and never spends a
    full minute on it.

    2013-04-10 09:42:18,884 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000 rows
    2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory = 31260136
    2013-04-10 09:43:17,820 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows





    If anyone has insight on this issue or test I could run to understand how to solve this I would appreciate it.




    Thanks in advance
  • Nicolas Maillard at May 3, 2013 at 2:38 pm
    Hello everyone

    Thank you for the replies.

    The hive query I used for a new test is simple select count(1) from table.

    I have tried a couple of things to check what swas going wrong.
    - Took out the slow node only left a two node cluster, the result was the
    next slave node slowing down.
    - I have upped the replication factor to 2 and checked that is was true,
    and the same happens most of the maps (not all ) of the 3rd node take up to
    a minute of process where all the oher maps only take abouit 30 seconds.
    Not all the maps of the slow node are slow though this is where it is
    puzzling.
    - I have changed table to see a new dataset and the same occurs.

    thanks for any help

    On Tue, Apr 16, 2013 at 3:32 PM, Serega Sheypak wrote:

    Your files are rather big the common size is about 292519093 bytes =>
    292Mbytes.
    Your replication factor is 1, you have 4 TT and DN, it can lead to poor
    data locality. Your mapper runs and it has no local split (if your files
    are splittable???). TT will ask for a file split/full file (if not
    splittable) from remote DN.

    ***so most of my data is actually really on my master node, being node1
    I don't understand what you meant to say. Your data should be evenly
    distributed (HDFS splits file by blocks).

    My guess is poor locality. I would like if some more experienced group
    member will validate this suggestion.



    2013/4/16 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi Serega

    Thanks for the help

    I have a replication factor of 1 currently so most of my data is actually
    really on my master node, being node1.
    I Have 3 tasktrackers, one per node.
    I have 3 datanodes , one per node
    So datanodes and tastrackers are colocated.
    The main node, node1 the master is in addition also the jobtracker, the
    namenode and the secondary name node. As this was my first cluster I have
    read that this is not ideal. My Master node however performs well.

    I have included a print of the hdfs ls of my hive input path.

    As per your thought I have tried decomissioning node2 that was slowing
    down to see if it was a disk issue. This had the effect of bringing the
    slowdown on node3. I guess this probably points more to a too big a split
    being sent to one to a mapper.

    I am note sure why a mapper would get too big a split. How could I
    investigate this issue?
    could it be the hdfs files that are note evenly distributed?
    Are there ways to make sure the a mapper does not get too big a split.

    thanks again for your help





    On Tue, Apr 16, 2013 at 9:29 AM, Serega Sheypak <serega.sheypak@gmail.com
    wrote:
    Ok, I've read these logs. I didn't find anything critical.
    My theory is that one of your mapper gets too big split. And it causes
    skew. That is why I've sked you to provide:
    1. hadoop fs -ls -R /your/hive/input/path
    2. provide replication factor
    3. specify quantity of TaskTrackers, NameNodes? Are they co-located, I
    mean are they installed together on the each node?

    The other theory that you have disk problem.


    2013/4/15 Nicolas Maillard <nicolas.maillard@fifty-five.com>
    Hi everyone

    I have installed CDH4.2 with CM 4.5 on a 3 node cluster.
    Playing around with HIve and Map reduce I have a weird behaviour I
    can't seem to fix.
    One of my nodes takes an unusual long time t0 finish most of it's map
    tasks.
    A normal running time for a map task is about 35 seconds but my node
    number 2 takes more along the line of 3 minutes.

    For another reason I needed to reboot my machines yesterday and the
    first job after reboot went on fine and all map tasks around 35 seconds and
    then tje jobs after restarted with this wierd behaviour.
    This node degrading my cluster performance as whole mkaing my jobs wait
    a couple of minutes for the end.

    All three nodes are on ubuntu 12.04 with 24GB ram 8cores and a
    gigaethernet connection.
    I have changed some settings:
    - 4GB Mapreduce child java maximum heap size
    - jvm reuse -1
    - jobtracker heap size 2GB
    - tasktaracker heap size 2GB

    I have monitored the same hive request to try to get a feel of what is
    wrong.
    My cpu and ram and all machines are ok used but not full
    The map logs show no sign of errors or wait.

    In the map logs I see long pauses between operations: I see that node 2
    processes data from the hdfs on node1 so there must be network time,
    however node 3 also retrives data from node 1 to process and never spends a
    full minute on it.

    2013-04-10 09:42:18,884 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100000 rows
    2013-04-10 09:42:18,884 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory = 31260136
    2013-04-10 09:43:17,820 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://node1:8020/user/hive/warehouse/oscaro_stack/year=2012/month=10/000024_0
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
    2013-04-10 09:44:15,441 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 722344 rows






    If anyone has insight on this issue or test I could run to understand how to solve this I would appreciate it.





    Thanks in advance

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedApr 15, '13 at 1:41p
activeMay 3, '13 at 2:38p
posts7
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase