I'm about to investigate the following situation, but I'd appreciate any
insight that can be given.
We have an external table which is comprised of 3 HDFS files.
We then run an INSERT OVERWRITE which is just a SELECT * from the external
The table being overwritten has N buckets.
The issue is that the INSERT OVERWRITE job has only one map task per input
I would have thought that there would be one map task per HDFS block.
The (slightly more general) question is:
Is there a way to utilize more of the hardware in the cluster when importing
data from flat files to a bucketized table?
Thanks for any help you might be able to provide.
And congratulations on Hive 0.6!