I'm about to investigate the following situation, but I'd appreciate any
insight that can be given.

We have an external table which is comprised of 3 HDFS files.
We then run an INSERT OVERWRITE which is just a SELECT * from the external
The table being overwritten has N buckets.
The issue is that the INSERT OVERWRITE job has only one map task per input

I would have thought that there would be one map task per HDFS block.

The (slightly more general) question is:
Is there a way to utilize more of the hardware in the cluster when importing
data from flat files to a bucketized table?

Thanks for any help you might be able to provide.

And congratulations on Hive 0.6!


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedOct 29, '10 at 11:20p
activeOct 29, '10 at 11:20p

1 user in discussion

Phil young: 1 post



site design / logo © 2021 Grokbase