I have about 5K input files so running a Hive job creates as many (small)
output files. Small-file merging seems to be enabled by default
(hive.merge.mapfiles=true) but it doesn't seem to work unless output
compression is disabled (hive.exec.compress.output=false). If I do that, I
get only 30 (uncompressed) output files which is much more manageable.
Is there a way to enable both compression and small-file merge?
If not, I am thinking about saving into an uncompressed temp table first,
then enabling compression and saving into the output table. Is there an
easier way?
Thanks.