Grokbase Groups Hive user July 2010
FAQ
Hi all,

I have a large number of small files, in some partitions as Hive does
not support merging small files when using dynamic partitioning.

To process these small files, I try to use the

set
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
with Hadoop 20.



I have an error, any one has seen such errors before.





java.io.IOException: cannot find dir =
hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091

91776626031/1318857972/1/emptyFile in partToPartitionInfo:
[/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
7972

/1]

at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
Path(CombineHiveInputFormat.java:373)

at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
t.(CombineHiv
eInputFormat.java:298)

at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)

at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)

at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)

at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)

at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)

at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
55)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)

at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)

at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)

at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Job Submission failed with exception 'java.io.IOException(cannot find
dir = hdfs://namenodeurl/tmp/hive-viraj/

hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-

24_366_9191309191776626031/1318857972/1])'



Thanks Viraj

Search Discussions

  • Viraj Bhat at Jul 2, 2010 at 6:52 pm
    Hi all,

    We are hit by:

    https://issues.apache.org/jira/browse/HADOOP-5759

    So is there a way to use MultipleInputFormat in Hive?

    Viraj



    ________________________________

    From: Viraj Bhat
    Sent: Friday, July 02, 2010 1:00 AM
    To: [email protected]
    Subject: CombineInput Format does not seem to work correctly when
    accessing Dynamic partitions



    Hi all,

    I have a large number of small files, in some partitions as Hive does
    not support merging small files when using dynamic partitioning.

    To process these small files, I try to use the

    set
    hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
    with Hadoop 20.



    I have an error, any one has seen such errors before.





    java.io.IOException: cannot find dir =
    hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091

    91776626031/1318857972/1/emptyFile in partToPartitionInfo:
    [/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
    7972

    /1]

    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
    Path(CombineHiveInputFormat.java:373)

    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
    t.(CombineHiv
    eInputFormat.java:298)

    at
    org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)

    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)

    at
    org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)

    at
    org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)

    at
    org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)

    at
    org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
    55)

    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)

    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)

    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)

    at
    org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)

    at
    org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)

    at
    org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)

    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
    a:39)

    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
    Impl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    Job Submission failed with exception 'java.io.IOException(cannot find
    dir = hdfs://namenodeurl/tmp/hive-viraj/

    hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
    in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-

    24_366_9191309191776626031/1318857972/1])'



    Thanks Viraj

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 2, '10 at 8:01a
activeJul 2, '10 at 6:52p
posts2
users1
websitehive.apache.org

1 user in discussion

Viraj Bhat: 2 posts

People

Translate

site design / logo © 2023 Grokbase