FAQ
Hey,

I need some help regarding hive. I trying to benchmark Hive with TPCH
SF 100 dataset. For a simple SPJ query I ran (Select count(*) from
supplier,customer where s_nationekey=c_nationkey) ,

out of my 13 reduce tasks , 12 completed in less than 2 hrs and 1 ran
for 6 hours. Following are my cluster details :

10 Nodes (1 Master + 9 TTs+DNs) , 3.5GB ram per TT , 2 maps and 2
reducers max per TT,
600MB per task , 200MB io.sort.MB.

I saw that no swapping occurred while running the reduce task .
Following is the tail of the log on that machine ..where reduce ran
for 6 hrs

2011-09-26 22:48:48,285 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
47881000000 rows
2011-09-26 22:48:48,607 INFO ExecReducer: ExecReducer: processed
1280835 rows: used memory = 4840896
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 47881693522
rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 47881693522
rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarded 0 rows
2011-09-26 22:48:48,608 WARN
org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush
at close: size = 1
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarding 1 rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
2011-09-26 22:48:48,609 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/_tmp.000004_0
2011-09-26 22:48:48,609 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
2011-09-26 22:48:48,739 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished.
closing...
2011-09-26 22:48:48,740 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 Close done
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
2011-09-26 22:48:48,851 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201109261629_0001_r_000004_0 is done. And is in the
process of commiting
2011-09-26 22:48:48,854 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201109261629_0001_r_000004_0' done.


One thing I noticed is that the stats of row forwarding are almost
same across all the tasks ..however this task ran for 6hrs where as
all other just ran for 1,2 hrs ..
Any help?

Thanks


--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

Search Discussions

  • Aggarwal, Vaibhav at Sep 27, 2011 at 5:51 pm
    You can choose to turn the speculative execution ON which might help you with few slow progressing tasks.
    mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution are the job conf options.


    -----Original Message-----
    From: bharath vissapragada
    Sent: Tuesday, September 27, 2011 1:22 AM
    To: hive-user@hadoop.apache.org
    Subject: Benchmarking problems

    Hey,

    I need some help regarding hive. I trying to benchmark Hive with TPCH SF 100 dataset. For a simple SPJ query I ran (Select count(*) from supplier,customer where s_nationekey=c_nationkey) ,

    out of my 13 reduce tasks , 12 completed in less than 2 hrs and 1 ran for 6 hours. Following are my cluster details :

    10 Nodes (1 Master + 9 TTs+DNs) , 3.5GB ram per TT , 2 maps and 2 reducers max per TT, 600MB per task , 200MB io.sort.MB.

    I saw that no swapping occurred while running the reduce task .
    Following is the tail of the log on that machine ..where reduce ran for 6 hrs

    2011-09-26 22:48:48,285 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
    47881000000 rows
    2011-09-26 22:48:48,607 INFO ExecReducer: ExecReducer: processed
    1280835 rows: used memory = 4840896
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 47881693522 rows
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 47881693522 rows
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 finished. closing...
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarded 0 rows
    2011-09-26 22:48:48,608 WARN
    org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush at close: size = 1
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarding 1 rows
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
    hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
    2011-09-26 22:48:48,609 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
    FS hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/_tmp.000004_0
    2011-09-26 22:48:48,609 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
    hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
    2011-09-26 22:48:48,739 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished.
    closing...
    2011-09-26 22:48:48,740 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
    2011-09-26 22:48:48,847 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 Close done
    2011-09-26 22:48:48,847 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
    2011-09-26 22:48:48,847 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
    2011-09-26 22:48:48,851 INFO org.apache.hadoop.mapred.TaskRunner:
    Task:attempt_201109261629_0001_r_000004_0 is done. And is in the process of commiting
    2011-09-26 22:48:48,854 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201109261629_0001_r_000004_0' done.


    One thing I noticed is that the stats of row forwarding are almost same across all the tasks ..however this task ran for 6hrs where as all other just ran for 1,2 hrs ..
    Any help?

    Thanks


    --
    Regards,
    Bharath .V
    w:http://researchweb.iiit.ac.in/~bharath.v
  • Bharath vissapragada at Sep 28, 2011 at 4:09 am
    I turned it off because , it was trying to launch 2 copies of every
    task and they are hogging my TTs.

    I am just curious abt one thing .. Are the reducers in JOIN CPU
    intensive or do they consume a lot of memory ?
    From my monitoring the TT during reduce phase ..its was pretty clear
    that there was no swapping ...however I was not sure abt the CPU
    thingy ...

    Any one with same experience / workaround for this problem ??
    On Tue, Sep 27, 2011 at 11:19 PM, Aggarwal, Vaibhav wrote:
    You can choose to turn the speculative execution ON which might help you with few slow progressing tasks.
    mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution are the job conf options.


    -----Original Message-----
    From: bharath vissapragada
    Sent: Tuesday, September 27, 2011 1:22 AM
    To: hive-user@hadoop.apache.org
    Subject: Benchmarking problems

    Hey,

    I need some help regarding hive. I trying to benchmark Hive with TPCH SF 100 dataset. For a simple SPJ query I ran (Select count(*) from supplier,customer where s_nationekey=c_nationkey) ,

    out of my 13 reduce tasks , 12 completed in less than 2 hrs and 1 ran for 6 hours. Following are my cluster details :

    10 Nodes (1 Master + 9 TTs+DNs) , 3.5GB ram per TT , 2 maps and 2 reducers max per TT, 600MB per task , 200MB io.sort.MB.

    I saw that no swapping occurred while running the reduce task .
    Following is the tail of the log on that machine ..where reduce ran for 6 hrs

    2011-09-26 22:48:48,285 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
    47881000000 rows
    2011-09-26 22:48:48,607 INFO ExecReducer: ExecReducer: processed
    1280835 rows: used memory = 4840896
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 47881693522 rows
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 47881693522 rows
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 finished. closing...
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarded 0 rows
    2011-09-26 22:48:48,608 WARN
    org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush at close: size = 1
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarding 1 rows
    2011-09-26 22:48:48,608 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
    hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
    2011-09-26 22:48:48,609 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
    FS hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/_tmp.000004_0
    2011-09-26 22:48:48,609 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
    hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
    2011-09-26 22:48:48,739 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished.
    closing...
    2011-09-26 22:48:48,740 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
    2011-09-26 22:48:48,847 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 Close done
    2011-09-26 22:48:48,847 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
    2011-09-26 22:48:48,847 INFO
    org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
    2011-09-26 22:48:48,851 INFO org.apache.hadoop.mapred.TaskRunner:
    Task:attempt_201109261629_0001_r_000004_0 is done. And is in the process of commiting
    2011-09-26 22:48:48,854 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201109261629_0001_r_000004_0' done.


    One thing I noticed is that the stats of row forwarding are almost same across all the tasks ..however this task ran for 6hrs where as all other just ran for 1,2 hrs ..
    Any help?

    Thanks


    --
    Regards,
    Bharath .V
    w:http://researchweb.iiit.ac.in/~bharath.v


    --
    Regards,
    Bharath .V
    w:http://researchweb.iiit.ac.in/~bharath.v

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 27, '11 at 8:22a
activeSep 28, '11 at 4:09a
posts3
users2
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase