Grokbase Groups Hive user April 2011
FAQ
hi,

I've tried to load gzip files into hive to save disk space, but failed.

hive> load data local inpath 'tmp_b.20110426.gz' into table raw_logs
partition ( dt=20110426 );
Copying data from file:/home/wd/t/tmp_b.20110426.gz
Copying file: file:/home/wd/t/tmp_b.20110426.gz
Loading data to table default.raw_logs partition (dt=20110426)
Failed with exception Wrong file format. Please check the file's format.
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

The raw_logs table is created by:
create table raw_logs ( ............) partitioned by ( dt int ) STORED AS
SEQUENCEFILE;

Is there something wrong? The error is same both in hive 0.5 and 0.7.

Search Discussions

  • Loren Siebert at Apr 28, 2011 at 4:53 am
    You have the file type as sequence file, but you are trying to load a GZip file. Won’t that only work if the table is defined as a text file?

    Hive isn’t doing anything on your behalf when you do LOAD DATA. It’s syntactic sugar for copying a file into a HDFS location. From there, if you want a RCFile table or a sequence file table or whatever, you can select from the raw_logs table into the new table (e.g., raw_logs_rcfile) that you have defined in the different format.

    On Apr 27, 2011, at 9:33 PM, wd wrote:

    hi,

    I've tried to load gzip files into hive to save disk space, but failed.

    hive> load data local inpath 'tmp_b.20110426.gz' into table raw_logs partition ( dt=20110426 );
    Copying data from file:/home/wd/t/tmp_b.20110426.gz
    Copying file: file:/home/wd/t/tmp_b.20110426.gz
    Loading data to table default.raw_logs partition (dt=20110426)
    Failed with exception Wrong file format. Please check the file's format.
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

    The raw_logs table is created by:
    create table raw_logs ( ............) partitioned by ( dt int ) STORED AS SEQUENCEFILE;

    Is there something wrong? The error is same both in hive 0.5 and 0.7.
  • Wd at Apr 28, 2011 at 10:35 am
    Thanks for your help

    2011/4/28 Loren Siebert <loren@siebert.org>
    You have the file type as sequence file, but you are trying to load a GZip
    file. Won’t that only work if the table is defined as a text file?
    I've think sequence = gzip file before, and now I realized it's not.
    It's work when table is defined as text file.

    Hive isn’t doing anything on your behalf when you do LOAD DATA. It’s
    syntactic sugar for copying a file into a HDFS location. From there, if you
    want a RCFile table or a sequence file table or whatever, you can select
    from the raw_logs table into the new table (e.g., raw_logs_rcfile) that you
    have defined in the different format.

    So, this is the only way I can put data into a table defined as sequence
    file? Can I generate the RCFile use a unix command or some tools ?

    On Apr 27, 2011, at 9:33 PM, wd wrote:

    hi,

    I've tried to load gzip files into hive to save disk space, but failed.

    hive> load data local inpath 'tmp_b.20110426.gz' into table raw_logs
    partition ( dt=20110426 );
    Copying data from file:/home/wd/t/tmp_b.20110426.gz
    Copying file: file:/home/wd/t/tmp_b.20110426.gz
    Loading data to table default.raw_logs partition (dt=20110426)
    Failed with exception Wrong file format. Please check the file's format.
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    The raw_logs table is created by:
    create table raw_logs ( ............) partitioned by ( dt int ) STORED AS
    SEQUENCEFILE;

    Is there something wrong? The error is same both in hive 0.5 and 0.7.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 28, '11 at 4:34a
activeApr 28, '11 at 10:35a
posts3
users2
websitehive.apache.org

2 users in discussion

Wd: 2 posts Loren Siebert: 1 post

People

Translate

site design / logo © 2022 Grokbase