I need some help regarding hive. I trying to benchmark Hive with TPCH
SF 100 dataset. For a simple SPJ query I ran (Select count(*) from
supplier,customer where s_nationekey=c_nationkey) ,
out of my 13 reduce tasks , 12 completed in less than 2 hrs and 1 ran
for 6 hours. Following are my cluster details :
10 Nodes (1 Master + 9 TTs+DNs) , 3.5GB ram per TT , 2 maps and 2
reducers max per TT,
600MB per task , 200MB io.sort.MB.
I saw that no swapping occurred while running the reduce task .
Following is the tail of the log on that machine ..where reduce ran
for 6 hrs
2011-09-26 22:48:48,285 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
47881000000 rows
2011-09-26 22:48:48,607 INFO ExecReducer: ExecReducer: processed
1280835 rows: used memory = 4840896
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 47881693522
rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 47881693522
rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarded 0 rows
2011-09-26 22:48:48,608 WARN
org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush
at close: size = 1
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarding 1 rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
2011-09-26 22:48:48,609 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/_tmp.000004_0
2011-09-26 22:48:48,609 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
2011-09-26 22:48:48,739 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished.
closing...
2011-09-26 22:48:48,740 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 Close done
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
2011-09-26 22:48:48,851 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201109261629_0001_r_000004_0 is done. And is in the
process of commiting
2011-09-26 22:48:48,854 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201109261629_0001_r_000004_0' done.
One thing I noticed is that the stats of row forwarding are almost
same across all the tasks ..however this task ran for 6hrs where as
all other just ran for 1,2 hrs ..
Any help?
Thanks
--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v