Grokbase Groups Hive user April 2011
FAQ
Hi All,

I am using hive 0.7 with hadoop 0.20.2 . Cluster has 9 data nodes / task
tracker . I am running a query on table containing 900 GB data (approx) in
4000 partitions .

query construct is like ->

from table
insert overwrite table1 select distinct col1
insert overwrite table2 select distinct col2
insert overwrite table3 select distinct col3
insert overwrite table4 select distinct col4
insert overwrite table5 select distinct col5

It generates 5 Map reduce jobs

1st job having 18000 map attempts and 999 ( max limit ) reduce attempts .

In this way , it will not return in ways .

In order to speed up the query ,

Can query be constructed in some other , optimized way?
Can total number of map reduce jobs be decrerased?
Can map / reduce tasks be reduced?

Kindly suggest .

Thanks and Best Regards
Vaibhav Negi

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 21, '11 at 9:31a
activeApr 21, '11 at 9:31a
posts1
users1
websitehive.apache.org

1 user in discussion

Vaibhav negi: 1 post

People

Translate

site design / logo © 2022 Grokbase