FAQ
Hi,

I have a table created with

CREATE TABLE raw(partition1 string, partition2 string, data string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;

I want to further process "data" and put it in a partition (partition1, partition2) defined by the values in the relevant row.

I'm however stuck at trying to use dynamic partitions in a query. With predefined partition values it's straightforward:

FROM (
FROM raw
SELECT TRANSFORM(raw.data)
USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,STRING>)
CLUSTER BY date
) tmap
INSERT OVERWRITE TABLE polished PARTITION (partition1='p1', partition2='p2') SELECT foo, date, bar;

What would be the best way to define the partition using raw.partition1 and raw.partition2 as values?

Thanks much,

Adriaan

Search Discussions

  • Sumanth V at Sep 18, 2011 at 2:56 am
    Hi Adriaan,

    To use dynamic partition, follow the following steps inside hive shell -

    #Set the following values -

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.dynamic.partition=true

    #Create another table -

    create table raw_2
    (
    data string
    )
    partitioned by (partition1 string, partition2 string);

    #Now insert the values stored in table raw into table raw_2 using the
    following query -

    from raw
    insert overwrite table raw_2 partition (partition1, partition2)
    select data, partition1, partition2;

    This will dynamically create the 2 partitions based on the values of
    partition1 and partition2 and insert the values of 'data' in the appropriate
    partition.

    Regards,
    Sumanth



    On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
    wrote:
    Hi,

    I have a table created with

    CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
    FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;

    I want to further process "data" and put it in a partition (partition1,
    partition2) defined by the values in the relevant row.

    I'm however stuck at trying to use dynamic partitions in a query. With
    predefined partition values it's straightforward:

    FROM (
    FROM raw
    SELECT TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar
    MAP<STRING,STRING>)
    CLUSTER BY date
    ) tmap
    INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
    partition2='p2') SELECT foo, date, bar;

    What would be the best way to define the partition using raw.partition1 and
    raw.partition2 as values?

    Thanks much,

    Adriaan
  • Adriaan Tijsseling at Sep 18, 2011 at 7:45 am
    Thanks Sumanth, so it seems it cannot be done without an intermediary step using an extra table.
    On 2011/09/18, at 04:56, Sumanth V wrote:

    Hi Adriaan,

    To use dynamic partition, follow the following steps inside hive shell -

    #Set the following values -

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.dynamic.partition=true

    #Create another table -

    create table raw_2
    (
    data string
    )
    partitioned by (partition1 string, partition2 string);

    #Now insert the values stored in table raw into table raw_2 using the
    following query -

    from raw
    insert overwrite table raw_2 partition (partition1, partition2)
    select data, partition1, partition2;

    This will dynamically create the 2 partitions based on the values of
    partition1 and partition2 and insert the values of 'data' in the appropriate
    partition.

    Regards,
    Sumanth



    On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
    wrote:
    Hi,

    I have a table created with

    CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
    FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;

    I want to further process "data" and put it in a partition (partition1,
    partition2) defined by the values in the relevant row.

    I'm however stuck at trying to use dynamic partitions in a query. With
    predefined partition values it's straightforward:

    FROM (
    FROM raw
    SELECT TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar
    MAP<STRING,STRING>)
    CLUSTER BY date
    ) tmap
    INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
    partition2='p2') SELECT foo, date, bar;

    What would be the best way to define the partition using raw.partition1 and
    raw.partition2 as values?

    Thanks much,

    Adriaan
  • Adriaan Tijsseling at Sep 18, 2011 at 7:29 pm
    I looked at your solution, but the problem is still that the "data" column needs to be processed still. What I want is to process "data" and put the results into a table with partitioned defined by the other columns. With your solution, I get partitions but still with the same unprocessed data.

    Adriaan
    On 2011/09/18, at 04:56, Sumanth V wrote:

    Hi Adriaan,

    To use dynamic partition, follow the following steps inside hive shell -

    #Set the following values -

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.dynamic.partition=true

    #Create another table -

    create table raw_2
    (
    data string
    )
    partitioned by (partition1 string, partition2 string);

    #Now insert the values stored in table raw into table raw_2 using the
    following query -

    from raw
    insert overwrite table raw_2 partition (partition1, partition2)
    select data, partition1, partition2;

    This will dynamically create the 2 partitions based on the values of
    partition1 and partition2 and insert the values of 'data' in the appropriate
    partition.

    Regards,
    Sumanth



    On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
    wrote:
    Hi,

    I have a table created with

    CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
    FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;

    I want to further process "data" and put it in a partition (partition1,
    partition2) defined by the values in the relevant row.

    I'm however stuck at trying to use dynamic partitions in a query. With
    predefined partition values it's straightforward:

    FROM (
    FROM raw
    SELECT TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar
    MAP<STRING,STRING>)
    CLUSTER BY date
    ) tmap
    INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
    partition2='p2') SELECT foo, date, bar;

    What would be the best way to define the partition using raw.partition1 and
    raw.partition2 as values?

    Thanks much,

    Adriaan
  • Sumanth V at Sep 18, 2011 at 9:24 pm
    The earlier query was just to show the dynamic partitions concept.
    For your case, you will have to use a query something like the one below.
    Although I have not given it a try, in theory this should work -

    FROM raw
    insert overwrite table polished partition (partition1, partition2)
    select TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,
    STRING>)
    CLUSTER BY date


    Sumanth


    On Sun, Sep 18, 2011 at 12:28 PM, Adriaan Tijsseling wrote:

    I looked at your solution, but the problem is still that the "data" column
    needs to be processed still. What I want is to process "data" and put the
    results into a table with partitioned defined by the other columns. With
    your solution, I get partitions but still with the same unprocessed data.

    Adriaan
    On 2011/09/18, at 04:56, Sumanth V wrote:

    Hi Adriaan,

    To use dynamic partition, follow the following steps inside hive shell -

    #Set the following values -

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.dynamic.partition=true

    #Create another table -

    create table raw_2
    (
    data string
    )
    partitioned by (partition1 string, partition2 string);

    #Now insert the values stored in table raw into table raw_2 using the
    following query -

    from raw
    insert overwrite table raw_2 partition (partition1, partition2)
    select data, partition1, partition2;

    This will dynamically create the 2 partitions based on the values of
    partition1 and partition2 and insert the values of 'data' in the
    appropriate
    partition.

    Regards,
    Sumanth



    On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
    wrote:
    Hi,

    I have a table created with

    CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
    FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;

    I want to further process "data" and put it in a partition (partition1,
    partition2) defined by the values in the relevant row.

    I'm however stuck at trying to use dynamic partitions in a query. With
    predefined partition values it's straightforward:

    FROM (
    FROM raw
    SELECT TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar
    MAP<STRING,STRING>)
    CLUSTER BY date
    ) tmap
    INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
    partition2='p2') SELECT foo, date, bar;

    What would be the best way to define the partition using raw.partition1
    and
    raw.partition2 as values?

    Thanks much,

    Adriaan
  • Adriaan Tijsseling at Sep 18, 2011 at 10:04 pm
    Thanks much for your help. Your earlier query worked well, but the latest query produces an error about a mismatch:
    Error in semantic analysis: Line 2:23 Cannot insert into target table because column number/types are different brand: Table insclause-0 has 5 columns, but query has 3 columns.

    I tried the redundant version:

    FROM raw
    insert overwrite table polished partition (partition1, partition2)
    select TRANSFORM(raw.partition1, raw.partition2, raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,
    STRING>, partition1 STRING, partition2 STRING)
    CLUSTER BY date

    But that produced a table with partitons set to __HIVE_DEFAULT_PARTITION__

    This is turning to be a little bit harder than I expected ;)

    Adriaan
    On 2011/09/18, at 23:23, Sumanth V wrote:

    The earlier query was just to show the dynamic partitions concept.
    For your case, you will have to use a query something like the one below.
    Although I have not given it a try, in theory this should work -

    FROM raw
    insert overwrite table polished partition (partition1, partition2)
    select TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,
    STRING>)
    CLUSTER BY date


    Sumanth



    On Sun, Sep 18, 2011 at 12:28 PM, Adriaan Tijsseling <adriaan@tijsseling.com
    wrote:
    I looked at your solution, but the problem is still that the "data" column
    needs to be processed still. What I want is to process "data" and put the
    results into a table with partitioned defined by the other columns. With
    your solution, I get partitions but still with the same unprocessed data.

    Adriaan
    On 2011/09/18, at 04:56, Sumanth V wrote:

    Hi Adriaan,

    To use dynamic partition, follow the following steps inside hive shell -

    #Set the following values -

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.dynamic.partition=true

    #Create another table -

    create table raw_2
    (
    data string
    )
    partitioned by (partition1 string, partition2 string);

    #Now insert the values stored in table raw into table raw_2 using the
    following query -

    from raw
    insert overwrite table raw_2 partition (partition1, partition2)
    select data, partition1, partition2;

    This will dynamically create the 2 partitions based on the values of
    partition1 and partition2 and insert the values of 'data' in the
    appropriate
    partition.

    Regards,
    Sumanth



    On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
    wrote:
    Hi,

    I have a table created with

    CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
    FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;

    I want to further process "data" and put it in a partition (partition1,
    partition2) defined by the values in the relevant row.

    I'm however stuck at trying to use dynamic partitions in a query. With
    predefined partition values it's straightforward:

    FROM (
    FROM raw
    SELECT TRANSFORM(raw.data)
    USING 'python parser.py' AS (foo STRING, date STRING, bar
    MAP<STRING,STRING>)
    CLUSTER BY date
    ) tmap
    INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
    partition2='p2') SELECT foo, date, bar;

    What would be the best way to define the partition using raw.partition1
    and
    raw.partition2 as values?

    Thanks much,

    Adriaan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 17, '11 at 9:18p
activeSep 18, '11 at 10:04p
posts6
users2
websitehive.apache.org

2 users in discussion

Adriaan Tijsseling: 4 posts Sumanth V: 2 posts

People

Translate

site design / logo © 2022 Grokbase