Hi all,
I have a large number of small files, in some partitions as Hive does
not support merging small files when using dynamic partitioning.
To process these small files, I try to use the
set
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
with Hadoop 20.
I have an error, any one has seen such errors before.
java.io.IOException: cannot find dir =
hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091
91776626031/1318857972/1/emptyFile in partToPartitionInfo:
[/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
7972
/1]
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
Path(CombineHiveInputFormat.java:373)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
t.(CombineHiv
eInputFormat.java:298)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Job Submission failed with exception 'java.io.IOException(cannot find
dir = hdfs://namenodeurl/tmp/hive-viraj/
hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-
24_366_9191309191776626031/1318857972/1])'
Thanks Viraj