Grokbase Groups Hive user June 2010
FAQ
Hi all,

I have a table known as "oldtable" which is partitioned by datestamp.



The schema of the "oldtable" is:



name string

age bigint

property string

region string

datestamp string





I now need to create a new table which is based of this old table and
partitioned by (datestamp, region, property)



The DDL for the new table looks like:



CREATE EXTERNAL TABLE newtable

(

newname string,

newage bigint,

)



PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)



STORED AS RCFILE

LOCATION '/user/viraj/rcfile;





When I try to populate this new table from my old table, I try to use
partitioning which uses values of old columns.



INSERT OVERWRITE TABLE newtable

PARTITION (newdatestamp='20100525', region, property)

SELECT

name,

age

from oldtable where datestamp='20100525';



The above statement causes an error and expects hardcoded values for
region and property.



FAILED: Error in semantic analysis: Partition column in the partition
specification does not exist.



How do I specify the partition information such that the new tables,
takes values from "property" and "region" from the old table and uses it
as partitions.



Is there a better way to achieve the above instead of hard coding values
for each and every partition?



=======================================================

Addendum: If the above is possible, how can I define some conditions
where I need to say, If region is not "us" or "asia", put it in another
partition known as misc?

=======================================================





Thanks Viraj

Search Discussions

  • Yongqiang he at Jun 17, 2010 at 12:00 pm
    Hive supports dynamic partition ( i think you need to use trunk code
    for this feature.?).

    here is an example:

    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;

    create table if not exists nzhang_part1 like srcpart;
    create table if not exists nzhang_part2 like srcpart;
    describe extended nzhang_part1;

    from srcpart
    insert overwrite table nzhang_part1 partition (ds, hr) select key,
    value, ds, hr where ds <= '2008-04-08'
    insert overwrite table nzhang_part2 partition(ds='2008-12-31', hr)
    select key, value, hr where ds > '2008-04-08';
    On Wed, Jun 16, 2010 at 4:07 PM, Viraj Bhat wrote:
    Hi all,

    I have a table known as “oldtable” which is partitioned by datestamp.



    The schema of the “oldtable” is:



    name string

    age bigint

    property string

    region string

    datestamp string





    I now need to create a new table which is based of this old table and
    partitioned by (datestamp, region, property)



    The DDL for the new table looks like:



    CREATE EXTERNAL TABLE newtable

    (

    newname string,

    newage bigint,

    )



    PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)



    STORED AS RCFILE

    LOCATION '/user/viraj/rcfile;





    When I try to populate this new table from my old table, I try to use
    partitioning which uses values of old columns.



    INSERT OVERWRITE TABLE newtable

    PARTITION (newdatestamp='20100525', region, property)

    SELECT

    name,

    age

    from oldtable where datestamp='20100525';



    The above statement causes an error and expects hardcoded values for region
    and property.



    FAILED: Error in semantic analysis: Partition column in the partition
    specification does not exist.



    How do I specify the partition information such that the new tables, takes
    values from “property” and “region” from the old table and uses it as
    partitions.



    Is there a better way to achieve the above instead of hard coding values for
    each and every partition?



    =======================================================

    Addendum: If the above is possible, how can I define some conditions where I
    need to say, If region is not “us” or “asia”, put it in another partition
    known as misc?

    =======================================================





    Thanks Viraj
  • Viraj Bhat at Jun 17, 2010 at 1:25 pm
    Hi Yongqiang,
    I am using the trunk code. I figured out what the problem was
    INSERT OVERWRITE TABLE newtable
    PARTITION (newdatestamp, myregion, myproperty)
    SELECT
    name,
    age,
    datestamp as newdatestamp,
    region as myregion,
    property as myproperty,
    from oldtable where datestamp='20100525';

    I need to specify the last 3 columns in the order of partitions, which I did not.

    Meanwhile the dynamic partitioning produced a partition which was named "__HIVE_DEFAULT_PARTITION__". Is this created by default?

    Thanks again for your help.
    Viraj

    -----Original Message-----
    From: yongqiang he
    Sent: Wednesday, June 16, 2010 5:46 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Creating partitions causes Error in semantic analysis

    Hive supports dynamic partition ( i think you need to use trunk code
    for this feature.?).

    here is an example:

    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;

    create table if not exists nzhang_part1 like srcpart;
    create table if not exists nzhang_part2 like srcpart;
    describe extended nzhang_part1;

    from srcpart
    insert overwrite table nzhang_part1 partition (ds, hr) select key,
    value, ds, hr where ds <= '2008-04-08'
    insert overwrite table nzhang_part2 partition(ds='2008-12-31', hr)
    select key, value, hr where ds > '2008-04-08';
    On Wed, Jun 16, 2010 at 4:07 PM, Viraj Bhat wrote:
    Hi all,

    I have a table known as "oldtable" which is partitioned by datestamp.



    The schema of the "oldtable" is:



    name string

    age bigint

    property string

    region string

    datestamp string





    I now need to create a new table which is based of this old table and
    partitioned by (datestamp, region, property)



    The DDL for the new table looks like:



    CREATE EXTERNAL TABLE newtable

    (

    newname string,

    newage bigint,

    )



    PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)



    STORED AS RCFILE

    LOCATION '/user/viraj/rcfile;





    When I try to populate this new table from my old table, I try to use
    partitioning which uses values of old columns.



    INSERT OVERWRITE TABLE newtable

    PARTITION (newdatestamp='20100525', region, property)

    SELECT

    name,

    age

    from oldtable where datestamp='20100525';



    The above statement causes an error and expects hardcoded values for region
    and property.



    FAILED: Error in semantic analysis: Partition column in the partition
    specification does not exist.



    How do I specify the partition information such that the new tables, takes
    values from "property" and "region" from the old table and uses it as
    partitions.



    Is there a better way to achieve the above instead of hard coding values for
    each and every partition?



    =======================================================

    Addendum: If the above is possible, how can I define some conditions where I
    need to say, If region is not "us" or "asia", put it in another partition
    known as misc?

    =======================================================





    Thanks Viraj
  • Ning Zhang at Jun 17, 2010 at 12:15 pm
    The __HIVE_DEFAULT_PARTITION__ is created by default if the partitioning column value (newdatestamp etc.) is NULL or empty string. Below is the wiki page that describes the syntax and semantics of dynamic partitioning, including some best practices.

    http://wiki.apache.org/hadoop/Hive/Tutorial#Dynamic-partition_Insert

    Ning
    On Jun 16, 2010, at 11:11 PM, Viraj Bhat wrote:

    Hi Yongqiang,
    I am using the trunk code. I figured out what the problem was
    INSERT OVERWRITE TABLE newtable
    PARTITION (newdatestamp, myregion, myproperty)
    SELECT
    name,
    age,
    datestamp as newdatestamp,
    region as myregion,
    property as myproperty,
    from oldtable where datestamp='20100525';

    I need to specify the last 3 columns in the order of partitions, which I did not.

    Meanwhile the dynamic partitioning produced a partition which was named "__HIVE_DEFAULT_PARTITION__". Is this created by default?

    Thanks again for your help.
    Viraj

    -----Original Message-----
    From: yongqiang he
    Sent: Wednesday, June 16, 2010 5:46 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Creating partitions causes Error in semantic analysis

    Hive supports dynamic partition ( i think you need to use trunk code
    for this feature.?).

    here is an example:

    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;

    create table if not exists nzhang_part1 like srcpart;
    create table if not exists nzhang_part2 like srcpart;
    describe extended nzhang_part1;

    from srcpart
    insert overwrite table nzhang_part1 partition (ds, hr) select key,
    value, ds, hr where ds <= '2008-04-08'
    insert overwrite table nzhang_part2 partition(ds='2008-12-31', hr)
    select key, value, hr where ds > '2008-04-08';
    On Wed, Jun 16, 2010 at 4:07 PM, Viraj Bhat wrote:
    Hi all,

    I have a table known as "oldtable" which is partitioned by datestamp.



    The schema of the "oldtable" is:



    name string

    age bigint

    property string

    region string

    datestamp string





    I now need to create a new table which is based of this old table and
    partitioned by (datestamp, region, property)



    The DDL for the new table looks like:



    CREATE EXTERNAL TABLE newtable

    (

    newname string,

    newage bigint,

    )



    PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)



    STORED AS RCFILE

    LOCATION '/user/viraj/rcfile;





    When I try to populate this new table from my old table, I try to use
    partitioning which uses values of old columns.



    INSERT OVERWRITE TABLE newtable

    PARTITION (newdatestamp='20100525', region, property)

    SELECT

    name,

    age

    from oldtable where datestamp='20100525';



    The above statement causes an error and expects hardcoded values for region
    and property.



    FAILED: Error in semantic analysis: Partition column in the partition
    specification does not exist.



    How do I specify the partition information such that the new tables, takes
    values from "property" and "region" from the old table and uses it as
    partitions.



    Is there a better way to achieve the above instead of hard coding values for
    each and every partition?



    =======================================================

    Addendum: If the above is possible, how can I define some conditions where I
    need to say, If region is not "us" or "asia", put it in another partition
    known as misc?

    =======================================================





    Thanks Viraj
  • John Sichi at Jun 17, 2010 at 12:05 pm
    I think you should be using myregion and myproperty in the PARTITION clause (not region and property).

    Also, regarding your question in the addendum, you could use a CASE expression in the source SELECT statement to transform the region.

    JVS

    ________________________________________
    From: Viraj Bhat [viraj@yahoo-inc.com]
    Sent: Wednesday, June 16, 2010 4:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Creating partitions causes Error in semantic analysis

    Hi all,
    I have a table known as “oldtable” which is partitioned by datestamp.

    The schema of the “oldtable” is:

    name string
    age bigint
    property string
    region string
    datestamp string


    I now need to create a new table which is based of this old table and partitioned by (datestamp, region, property)

    The DDL for the new table looks like:

    CREATE EXTERNAL TABLE newtable
    (
    newname string,
    newage bigint,
    )

    PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)

    STORED AS RCFILE
    LOCATION '/user/viraj/rcfile;


    When I try to populate this new table from my old table, I try to use partitioning which uses values of old columns.

    INSERT OVERWRITE TABLE newtable
    PARTITION (newdatestamp='20100525', region, property)
    SELECT
    name,
    age
    from oldtable where datestamp='20100525';

    The above statement causes an error and expects hardcoded values for region and property.

    FAILED: Error in semantic analysis: Partition column in the partition specification does not exist.

    How do I specify the partition information such that the new tables, takes values from “property” and “region” from the old table and uses it as partitions.

    Is there a better way to achieve the above instead of hard coding values for each and every partition?

    =======================================================
    Addendum: If the above is possible, how can I define some conditions where I need to say, If region is not “us” or “asia”, put it in another partition known as misc?
    =======================================================


    Thanks Viraj
  • Viraj Bhat at Jun 17, 2010 at 1:25 pm
    Hi John,
    Yes the problem was that in the select clause I needed to define the
    partition columns in exactly the similar order.
    INSERT OVERWRITE TABLE newtable
    PARTITION (newdatestamp, myregion, myproperty)
    SELECT
    name,
    age,
    datestamp as newdatestamp,
    region as myregion,
    property as myproperty
    from oldtable where datestamp='20100525';

    I will put in the CASE statement to obtain the necessary partition
    information.

    Is __HIVE_DEFAULT_PARTITION__ created by default?

    Thanks again for your help.
    Viraj

    -----Original Message-----
    From: John Sichi
    Sent: Wednesday, June 16, 2010 8:03 PM
    To: hive-user@hadoop.apache.org
    Subject: RE: Creating partitions causes Error in semantic analysis

    I think you should be using myregion and myproperty in the PARTITION
    clause (not region and property).

    Also, regarding your question in the addendum, you could use a CASE
    expression in the source SELECT statement to transform the region.

    JVS

    ________________________________________
    From: Viraj Bhat [viraj@yahoo-inc.com]
    Sent: Wednesday, June 16, 2010 4:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Creating partitions causes Error in semantic analysis

    Hi all,
    I have a table known as "oldtable" which is partitioned by datestamp.

    The schema of the "oldtable" is:

    name string
    age bigint
    property string
    region string
    datestamp string


    I now need to create a new table which is based of this old table and
    partitioned by (datestamp, region, property)

    The DDL for the new table looks like:

    CREATE EXTERNAL TABLE newtable
    (
    newname string,
    newage bigint,
    )

    PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)

    STORED AS RCFILE
    LOCATION '/user/viraj/rcfile;


    When I try to populate this new table from my old table, I try to use
    partitioning which uses values of old columns.

    INSERT OVERWRITE TABLE newtable
    PARTITION (newdatestamp='20100525', region, property)
    SELECT
    name,
    age
    from oldtable where datestamp='20100525';

    The above statement causes an error and expects hardcoded values for
    region and property.

    FAILED: Error in semantic analysis: Partition column in the partition
    specification does not exist.

    How do I specify the partition information such that the new tables,
    takes values from "property" and "region" from the old table and uses it
    as partitions.

    Is there a better way to achieve the above instead of hard coding values
    for each and every partition?

    =======================================================
    Addendum: If the above is possible, how can I define some conditions
    where I need to say, If region is not "us" or "asia", put it in another
    partition known as misc?
    =======================================================


    Thanks Viraj

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 16, '10 at 11:10p
activeJun 17, '10 at 1:25p
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase