FAQ
Hi,

insert overwrite directory "$dir" select * from xxx;

creates files of type attempt_201008201925_165088_r_000000_0.gz




insert overwrite table "$table" select * from xxx;

creates file of type attempt_201008201925_165088_r_000000_0



How can I configure "insert overwrite directory" to producesequence files ( non
.gz )





Regards,
Gaurav Jain

Search Discussions

  • Gaurav jain at Oct 6, 2010 at 4:26 pm
    Hi,

    insert overwrite directory "$dir" select * from xxx;

    creates files of type attempt_201008201925_165088_r_000000_0.gz




    insert overwrite table "$table" select * from xxx;

    creates file of type attempt_201008201925_165088_r_000000_0



    How can I configure "insert overwrite directory" to producesequence files ( non
    .gz )





    Regards,
    Gaurav Jain
  • Gaurav jain at Oct 6, 2010 at 8:18 pm
    How can I produce a sequence file from query

    insert overwrite directory ....


    I have set:

    SET io.seqfile.compression.type=BLOCK;
    SET hive.exec.compress.output=true;
    set mapred.output.compression.type=BLOCK;
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    It seems to produce Text .gz format files.



    Regards,
    Gaurav Jain
  • Gaurav jain at Oct 6, 2010 at 8:35 pm
    I do have that.

    However I am not writing directly to the table partition. Instead, I first write
    my data in a tmp directory (eventually moved to the hdfs table partition) and
    then publish that partition using alter table statement in metastore.

    Something like this:

    -- create table x ... stored as SeqFile
    -- insert overwrite directory 'd' select * from table y
    -- distcp 'd' x/dateint=.../hour=...
    -- alter table x add partition ....

    In the second step above I need to produce SeqFile.


    Thanks for prompt reply.
    Gaurav Jain


    ----- Original Message ----
    From: Yang <teddyyyy123@gmail.com>
    To: jainy_gaurav@yahoo.com
    Sent: Wed, October 6, 2010 1:28:42 PM
    Subject: Re: How to output SeqFile

    Gaurav:

    not sure if I understand your question correctly....
    when you create the output table, that has an option to set the
    output table SerDe

    Regards
    Yang
    On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain wrote:




    How can I produce a sequence file from query

    insert overwrite directory ....


    I have set:

    SET io.seqfile.compression.type=BLOCK;
    SET hive.exec.compress.output=true;
    set mapred.output.compression.type=BLOCK;
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    It seems to produce Text .gz format files.



    Regards,
    Gaurav Jain


  • Jacob R Rideout at Oct 6, 2010 at 8:43 pm

    On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain wrote:
    I do have that.

    However I am not writing directly to the table partition. Instead, I first write
    my data in a tmp directory (eventually moved to the hdfs table partition)  and
    then publish that partition using alter table statement in metastore.

    Something like this:

    -- create table x ... stored as SeqFile
    -- insert overwrite directory 'd' select * from table y
    -- distcp 'd'  x/dateint=.../hour=...
    -- alter table x add partition ....

    In the second step above I need to produce SeqFile.


    Thanks for prompt reply.
    Gaurav Jain


    ----- Original Message ----
    From: Yang <teddyyyy123@gmail.com>
    To: jainy_gaurav@yahoo.com
    Sent: Wed, October 6, 2010 1:28:42 PM
    Subject: Re: How to output SeqFile

    Gaurav:

    not sure if I understand your question correctly....
    when you create the output table, that has an option to set the
    output table SerDe

    Regards
    Yang
    On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain wrote:




    How can I produce a sequence file from query

    insert overwrite directory ....


    I have set:

    SET io.seqfile.compression.type=BLOCK;
    SET hive.exec.compress.output=true;
    set mapred.output.compression.type=BLOCK;
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    It seems to produce Text .gz format files.



    Regards,
    Gaurav Jain





    if you are inserting into the directory rather than the table, hive
    won't know to look at the metadata description of the table

    you need something like:
    insert overwrite table x select * from table y
  • Gaurav jain at Oct 6, 2010 at 8:47 pm
    I was hoping there would be a configuration where I can set the outputformat for
    my query.

    Regards,
    Gaurav Jain



    ----- Original Message ----
    From: Jacob R Rideout <apache@jacobrideout.net>
    To: hive-user@hadoop.apache.org
    Sent: Wed, October 6, 2010 1:42:57 PM
    Subject: Re: How to output SeqFile
    On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain wrote:
    I do have that.

    However I am not writing directly to the table partition. Instead, I first
    write
    my data in a tmp directory (eventually moved to the hdfs table partition) and
    then publish that partition using alter table statement in metastore.

    Something like this:

    -- create table x ... stored as SeqFile
    -- insert overwrite directory 'd' select * from table y
    -- distcp 'd' x/dateint=.../hour=...
    -- alter table x add partition ....

    In the second step above I need to produce SeqFile.


    Thanks for prompt reply.
    Gaurav Jain


    ----- Original Message ----
    From: Yang <teddyyyy123@gmail.com>
    To: jainy_gaurav@yahoo.com
    Sent: Wed, October 6, 2010 1:28:42 PM
    Subject: Re: How to output SeqFile

    Gaurav:

    not sure if I understand your question correctly....
    when you create the output table, that has an option to set the
    output table SerDe

    Regards
    Yang
    On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain wrote:




    How can I produce a sequence file from query

    insert overwrite directory ....


    I have set:

    SET io.seqfile.compression.type=BLOCK;
    SET hive.exec.compress.output=true;
    set mapred.output.compression.type=BLOCK;
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    It seems to produce Text .gz format files.



    Regards,
    Gaurav Jain





    if you are inserting into the directory rather than the table, hive
    won't know to look at the metadata description of the table

    you need something like:
    insert overwrite table x select * from table y
  • Yang at Oct 6, 2010 at 8:52 pm
    if this is indeed a feature that is yet missing, I have a hack:

    create a temp table that is seqFile format, then you dump to that table,
    then since you know the location, just copy the part files from that location.
    then delete that partition from the table manually. of course you may
    run into some issues
    such as "partition already exists" when you insert into the temp table
    the next time, so you may need
    to do an explicit delete from the temp table too.

    Y
    On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain wrote:
    I was hoping there would be a configuration where I can set the outputformat for
    my query.

    Regards,
    Gaurav Jain



    ----- Original Message ----
    From: Jacob R Rideout <apache@jacobrideout.net>
    To: hive-user@hadoop.apache.org
    Sent: Wed, October 6, 2010 1:42:57 PM
    Subject: Re: How to output SeqFile
    On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain wrote:
    I do have that.

    However I am not writing directly to the table partition. Instead, I first
    write
    my data in a tmp directory (eventually moved to the hdfs table partition)  and
    then publish that partition using alter table statement in metastore.

    Something like this:

    -- create table x ... stored as SeqFile
    -- insert overwrite directory 'd' select * from table y
    -- distcp 'd'  x/dateint=.../hour=...
    -- alter table x add partition ....

    In the second step above I need to produce SeqFile.


    Thanks for prompt reply.
    Gaurav Jain


    ----- Original Message ----
    From: Yang <teddyyyy123@gmail.com>
    To: jainy_gaurav@yahoo.com
    Sent: Wed, October 6, 2010 1:28:42 PM
    Subject: Re: How to output SeqFile

    Gaurav:

    not sure if I understand your question correctly....
    when you create the output table, that has an option to set the
    output table SerDe

    Regards
    Yang
    On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain wrote:




    How can I produce a sequence file from query

    insert overwrite directory ....


    I have set:

    SET io.seqfile.compression.type=BLOCK;
    SET hive.exec.compress.output=true;
    set mapred.output.compression.type=BLOCK;
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    It seems to produce Text .gz format files.



    Regards,
    Gaurav Jain





    if you are inserting into the directory rather than the table, hive
    won't know to look at the metadata description of the table

    you need something like:
    insert overwrite table x select * from table y



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedOct 6, '10 at 3:44a
activeOct 6, '10 at 8:52p
posts7
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase