Grokbase Groups Hive user August 2011
FAQ
suppose the table is partitioned by period_key, and the csv file also has a column named as period_key. The csv file contains multiple days of data, how can we load it in the the table?

I think of an workaround by first load the data into a non-partition table, and then insert the data from non-partition table to the partition table.

hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.


However it doesn't work also. please help.

Search Discussions

  • Wd at Aug 12, 2011 at 6:34 am
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

    2011/8/12 Daniel,Wu <hadoop_wu@163.com>
    suppose the table is partitioned by period_key, and the csv file also has
    a column named as period_key. The csv file contains multiple days of data,
    how can we load it in the the table?

    I think of an workaround by first load the data into a non-partition table,
    and then insert the data from non-partition table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition columns
    because the destination table is partitioned.


    However it doesn't work also. please help.

  • Vikas Srivastava at Aug 12, 2011 at 12:02 pm
    Hey ,

    Simpley you have run query like this

    FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) SELECT *


    Regards
    Vikas Srivastava


    2011/8/12 Daniel,Wu <hadoop_wu@163.com>
    suppose the table is partitioned by period_key, and the csv file also has
    a column named as period_key. The csv file contains multiple days of data,
    how can we load it in the the table?

    I think of an workaround by first load the data into a non-partition table,
    and then insert the data from non-partition table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition columns
    because the destination table is partitioned.


    However it doesn't work also. please help.


    --
    With Regards
    Vikas Srivastava

    DWH & Analytics Team
    Mob:+91 9560885900
    One97 | Let's get talking !
  • Bejoy_ks at Aug 12, 2011 at 1:58 pm
    Hi Daniel
    Just having a look at your requirement , to load data into a partition based hive table from any input file the most hassle free approach would be.
    1. Load the data into a non partitioned table that shares similar structure as the target table.
    2. Populate the target table with the data from non partitioned one using hive dynamic partition
    approach.
    With Dynamic partitions you don't need to manually identify the data partitions and distribute data accordingly.

    A similar implementation is described in the blog post
    www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

    Hope it helps

    Regards
    Bejoy K S

    -----Original Message-----
    From: Vikas Srivastava <vikas.srivastava@one97.net>
    Date: Fri, 12 Aug 2011 17:31:28
    To: <user@hive.apache.org>
    Reply-To: user@hive.apache.org
    Subject: Re: how to load data to partitioned table

    Hey ,

    Simpley you have run query like this

    FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) SELECT *


    Regards
    Vikas Srivastava


    2011/8/12 Daniel,Wu <hadoop_wu@163.com>
    suppose the table is partitioned by period_key, and the csv file also has
    a column named as period_key. The csv file contains multiple days of data,
    how can we load it in the the table?

    I think of an workaround by first load the data into a non-partition table,
    and then insert the data from non-partition table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition columns
    because the destination table is partitioned.


    However it doesn't work also. please help.


    --
    With Regards
    Vikas Srivastava

    DWH & Analytics Team
    Mob:+91 9560885900
    One97 | Let's get talking !
  • Hadoopman at Aug 14, 2011 at 2:57 pm
    Something else I've noticed is when loading LOTS of historical data, if
    you can try to say load a month of data at a time, try to just load THAT
    month of data and only that month. I've been able to load several years
    of data (depending on the data) at a single load however there have been
    times when loading a large dataset that I would run into memory issues
    during the reduce phase (usually during shuffle/sort). Things from out
    of memory to stack overflow messages (I've compiled a list of the more
    fun ones).

    Then I noticed that only loading data from say a single month loaded
    quickly and without the memory headaches during the reduce.

    Something to keep in mind and it works great!


    On 08/12/2011 07:58 AM, bejoy_ks@yahoo.com wrote:
    Hi Daniel
    Just having a look at your requirement , to load data into a partition
    based hive table from any input file the most hassle free approach
    would be.
    1. Load the data into a non partitioned table that shares similar
    structure as the target table.
    2. Populate the target table with the data from non partitioned one
    using hive dynamic partition
    approach.
    With Dynamic partitions you don't need to manually identify the data
    partitions and distribute data accordingly.

    A similar implementation is described in the blog post
    www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

    Hope it helps

    Regards
    Bejoy K S

    ------------------------------------------------------------------------
    *From: * Vikas Srivastava <vikas.srivastava@one97.net>
    *Date: *Fri, 12 Aug 2011 17:31:28 +0530
    *To: *<user@hive.apache.org>
    *ReplyTo: * user@hive.apache.org
    *Subject: *Re: how to load data to partitioned table

    Hey ,

    Simpley you have run query like this

    FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key)
    SELECT *


    Regards
    Vikas Srivastava


    2011/8/12 Daniel,Wu <hadoop_wu@163.com
    suppose the table is partitioned by period_key, and the csv file
    also has a column named as period_key. The csv file contains
    multiple days of data, how can we load it in the the table?

    I think of an workaround by first load the data into a
    non-partition table, and then insert the data from non-partition
    table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition
    columns because the destination table is partitioned.


    However it doesn't work also. please help.





    --
    With Regards
    Vikas Srivastava

    DWH & Analytics Team
    Mob:+91 9560885900
    One97 | Let's get talking !
  • Bejoy_ks at Aug 14, 2011 at 4:15 pm
    Ya I very much agree with you on those lines. Using the basic stuff would literally run into memory issues with large datasets. I had some of those resolved by using the DISTRIBUTE BY clause and so. In short a little work around over your hive queries could help you out in some cases.
    Regards
    Bejoy K S

    -----Original Message-----
    From: hadoopman <hadoopman@gmail.com>
    Date: Sun, 14 Aug 2011 08:57:12
    To: <user@hive.apache.org>
    Reply-To: user@hive.apache.org
    Subject: Re: how to load data to partitioned table

    Something else I've noticed is when loading LOTS of historical data, if
    you can try to say load a month of data at a time, try to just load THAT
    month of data and only that month. I've been able to load several years
    of data (depending on the data) at a single load however there have been
    times when loading a large dataset that I would run into memory issues
    during the reduce phase (usually during shuffle/sort). Things from out
    of memory to stack overflow messages (I've compiled a list of the more
    fun ones).

    Then I noticed that only loading data from say a single month loaded
    quickly and without the memory headaches during the reduce.

    Something to keep in mind and it works great!


    On 08/12/2011 07:58 AM, bejoy_ks@yahoo.com wrote:
    Hi Daniel
    Just having a look at your requirement , to load data into a partition
    based hive table from any input file the most hassle free approach
    would be.
    1. Load the data into a non partitioned table that shares similar
    structure as the target table.
    2. Populate the target table with the data from non partitioned one
    using hive dynamic partition
    approach.
    With Dynamic partitions you don't need to manually identify the data
    partitions and distribute data accordingly.

    A similar implementation is described in the blog post
    www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

    Hope it helps

    Regards
    Bejoy K S

    ------------------------------------------------------------------------
    *From: * Vikas Srivastava <vikas.srivastava@one97.net>
    *Date: *Fri, 12 Aug 2011 17:31:28 +0530
    *To: *<user@hive.apache.org>
    *ReplyTo: * user@hive.apache.org
    *Subject: *Re: how to load data to partitioned table

    Hey ,

    Simpley you have run query like this

    FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key)
    SELECT *


    Regards
    Vikas Srivastava


    2011/8/12 Daniel,Wu <hadoop_wu@163.com
    suppose the table is partitioned by period_key, and the csv file
    also has a column named as period_key. The csv file contains
    multiple days of data, how can we load it in the the table?

    I think of an workaround by first load the data into a
    non-partition table, and then insert the data from non-partition
    table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition
    columns because the destination table is partitioned.


    However it doesn't work also. please help.





    --
    With Regards
    Vikas Srivastava

    DWH & Analytics Team
    Mob:+91 9560885900
    One97 | Let's get talking !
  • Hadoopman at Aug 14, 2011 at 11:22 pm
    DISTRIBUTE BY and CLUSTER BY didn't resolve all the issues I've seen
    with very large data sets. I mean I'm loading a couple terabytes in a
    dataset and running into some rather interesting problems. I noticed
    however loading a couple months or two at a time (and making sure they
    were from the same time period) seem to resolve the problems I kept
    hitting over and over again.

    I have to keep reminding myself that hive / hadoop isn't a database and
    not to treat it as such. :-)

    On 08/14/2011 10:15 AM, bejoy_ks@yahoo.com wrote:
    Ya I very much agree with you on those lines. Using the basic stuff
    would literally run into memory issues with large datasets. I had some
    of those resolved by using the DISTRIBUTE BY clause and so. In short a
    little work around over your hive queries could help you out in some
    cases.

    Regards
    Bejoy K S

    ------------------------------------------------------------------------
    *From: * hadoopman <hadoopman@gmail.com>
    *Date: *Sun, 14 Aug 2011 08:57:12 -0600
    *To: *<user@hive.apache.org>
    *ReplyTo: * user@hive.apache.org
    *Subject: *Re: how to load data to partitioned table

    Something else I've noticed is when loading LOTS of historical data,
    if you can try to say load a month of data at a time, try to just load
    THAT month of data and only that month. I've been able to load
    several years of data (depending on the data) at a single load however
    there have been times when loading a large dataset that I would run
    into memory issues during the reduce phase (usually during
    shuffle/sort). Things from out of memory to stack overflow messages
    (I've compiled a list of the more fun ones).

    Then I noticed that only loading data from say a single month loaded
    quickly and without the memory headaches during the reduce.

    Something to keep in mind and it works great!


    On 08/12/2011 07:58 AM, bejoy_ks@yahoo.com wrote:
    Hi Daniel
    Just having a look at your requirement , to load data into a
    partition based hive table from any input file the most hassle free
    approach would be.
    1. Load the data into a non partitioned table that shares similar
    structure as the target table.
    2. Populate the target table with the data from non partitioned one
    using hive dynamic partition
    approach.
    With Dynamic partitions you don't need to manually identify the data
    partitions and distribute data accordingly.

    A similar implementation is described in the blog post
    www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

    Hope it helps

    Regards
    Bejoy K S

    ------------------------------------------------------------------------
    *From: * Vikas Srivastava <vikas.srivastava@one97.net>
    *Date: *Fri, 12 Aug 2011 17:31:28 +0530
    *To: *<user@hive.apache.org>
    *ReplyTo: * user@hive.apache.org
    *Subject: *Re: how to load data to partitioned table

    Hey ,

    Simpley you have run query like this

    FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key)
    SELECT *


    Regards
    Vikas Srivastava


    2011/8/12 Daniel,Wu <hadoop_wu@163.com >
    suppose the table is partitioned by period_key, and the csv
    file also has a column named as period_key. The csv file contains
    multiple days of data, how can we load it in the the table?

    I think of an workaround by first load the data into a
    non-partition table, and then insert the data from non-partition
    table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition
    columns because the destination table is partitioned.


    However it doesn't work also. please help.





    --
    With Regards
    Vikas Srivastava

    DWH & Analytics Team
    Mob:+91 9560885900
    One97 | Let's get talking !
  • Aggarwal, Vaibhav at Aug 12, 2011 at 5:19 pm
    If you want to insert data into a partitioned table without specifying the partition value, you need to enable dynamic partitioning.
    You can use the following switches:

    SET hive.exec.dynamic.partition=true;
    SET hive.exec.dynamic.partition.mode=nonstrict;

    Thanks
    Vaibhav

    From: Daniel,Wu
    Sent: Thursday, August 11, 2011 11:30 PM
    To: hive
    Subject: how to load data to partitioned table

    suppose the table is partitioned by period_key, and the csv file also has a column named as period_key. The csv file contains multiple days of data, how can we load it in the the table?

    I think of an workaround by first load the data into a non-partition table, and then insert the data from non-partition table to the partition table.

    hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
    FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.


    However it doesn't work also. please help.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 12, '11 at 6:30a
activeAug 14, '11 at 11:22p
posts8
users6
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase