Grokbase Groups Hive user April 2011
FAQ
hi,all
The dynamic partition function is amazing ,but only works in insert
clause. Can I use it while loading data into table?

For example: load data LOAD DATA LOCAL INPATH
`/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view
PARTITION(date='2008-06-08', country='US', type);
type is the dynamic partition key in the raw data?

This will be very cool! If is supported, I will not have to category the raw
data according to the type column.



--
haitao.yao@Beijing

Search Discussions

  • Ning Zhang at Apr 15, 2011 at 7:18 am
    The LOAD DATA command only copy the files to the destination directory. It doesn't read the records of the input file, so it cannot do partitioning based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the raw data according to the type column.



    --
    haitao.yao@Beijing
  • Erix Yao at Apr 15, 2011 at 7:31 am
    Does this mean if I want the type field as the partition key , I will have
    to split the raw data by myself and load the files into the target table?

    I see there's an example in tutorial:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url,
    pvs.referrer_url, null, null, pvs.ip, pvs.country

    but insert must overwrite the table whole table partition.
    Can I insert without the overwrite key word?


    2011/4/15 Ning Zhang <nzhang@fb.com>
    The LOAD DATA command only copy the files to the destination directory. It
    doesn't read the records of the input file, so it cannot do partitioning
    based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert
    clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH
    `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view
    PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the
    raw data according to the type column.



    --
    haitao.yao@Beijing





    --
    haitao.yao@Beijing
  • Ning Zhang at Apr 15, 2011 at 7:40 am
    The INSERT OVERWRITE command will not overwrite the whole table. If you specify a partition in that table, it will only overwrite that partition. If you specify dynamic partitions, it will only create/overwrite partitions that will be seen from the input query (pvs.country in the example).


    On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

    Does this mean if I want the type field as the partition key , I will have to split the raw data by myself and load the files into the target table?

    I see there's an example in tutorial:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

    but insert must overwrite the table whole table partition.
    Can I insert without the overwrite key word?


    2011/4/15 Ning Zhang <nzhang@fb.com
    The LOAD DATA command only copy the files to the destination directory. It doesn't read the records of the input file, so it cannot do partitioning based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the raw data according to the type column.



    --
    haitao.yao@Beijing








    --
    haitao.yao@Beijing
  • Erix Yao at Apr 15, 2011 at 7:50 am
    Oh, I see.

    just as the example we have:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url,
    pvs.referrer_url, null, null, pvs.ip, pvs.country

    The dynamic partition we have is on country, and the other partition is dt.

    In this implementation, what if I want to import the data into page_view more
    than 1 time ? Let us say, we import the data hourly, and with current
    dynamic partition implementation , the existing country partition will be
    overwritten!

    Is there any other way to avoid this without telling me to import the data
    once per day?




    2011/4/15 Ning Zhang <nzhang@fb.com>
    The INSERT OVERWRITE command will not overwrite the whole table. If you
    specify a partition in that table, it will only overwrite that partition. If
    you specify dynamic partitions, it will only create/overwrite partitions
    that will be seen from the input query (pvs.country in the example).


    On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

    Does this mean if I want the type field as the partition key , I will have
    to split the raw data by myself and load the files into the target table?

    I see there's an example in tutorial:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

    but insert must overwrite the table whole table partition.
    Can I insert without the overwrite key word?


    2011/4/15 Ning Zhang <nzhang@fb.com>
    The LOAD DATA command only copy the files to the destination directory. It
    doesn't read the records of the input file, so it cannot do partitioning
    based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert
    clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH
    `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view
    PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the
    raw data according to the type column.



    --
    haitao.yao@Beijing





    --
    haitao.yao@Beijing





    --
    haitao.yao@Beijing
  • Ning Zhang at Apr 15, 2011 at 5:04 pm
    You can create an hourly partitioned table say H with the partition columns(dt, country, hour), then the INSERVER OVERWRITE command will become: ...

    INSERT OVERWRITE TABLE H... PARTITION (dt='...', country, hr) ... select ... country, hour ..

    You can also keep the old table page_view_stg's schema unchanged but make it an external table pointing to H. In this way your old queries on page_view_stg don't need to be changed.

    On Apr 15, 2011, at 12:49 AM, Erix Yao wrote:

    Oh, I see.

    just as the example we have:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

    The dynamic partition we have is on country, and the other partition is dt.

    In this implementation, what if I want to import the data into page_view more than 1 time ? Let us say, we import the data hourly, and with current dynamic partition implementation , the existing country partition will be overwritten!

    Is there any other way to avoid this without telling me to import the data once per day?




    2011/4/15 Ning Zhang <nzhang@fb.com
    The INSERT OVERWRITE command will not overwrite the whole table. If you specify a partition in that table, it will only overwrite that partition. If you specify dynamic partitions, it will only create/overwrite partitions that will be seen from the input query (pvs.country in the example).


    On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

    Does this mean if I want the type field as the partition key , I will have to split the raw data by myself and load the files into the target table?

    I see there's an example in tutorial:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

    but insert must overwrite the table whole table partition.
    Can I insert without the overwrite key word?


    2011/4/15 Ning Zhang <nzhang@fb.com
    The LOAD DATA command only copy the files to the destination directory. It doesn't read the records of the input file, so it cannot do partitioning based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the raw data according to the type column.



    --
    haitao.yao@Beijing








    --
    haitao.yao@Beijing








    --
    haitao.yao@Beijing
  • Jasper Knulst at Apr 15, 2011 at 7:51 am
    Hi,

    I think you are looking for this:

    ALTER TABLE page_view ADD PARTITION (date='2008-06-08', country='US', type)
    LOCATION '/<hdfs path relative to location of base hdfs/hive table dir>';

    So you can go on adding new hdfs subdir's (based on the partition naming) to
    you base hive hdfs table dir, moving extra files to these subdirs and making
    them available to your hive users by adding new partitions afterwards.

    This works only in case of EXTERNAL tables I guess.


    Jasper


    2011/4/15 Ning Zhang <nzhang@fb.com>
    The INSERT OVERWRITE command will not overwrite the whole table. If you
    specify a partition in that table, it will only overwrite that partition. If
    you specify dynamic partitions, it will only create/overwrite partitions
    that will be seen from the input query (pvs.country in the example).


    On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

    Does this mean if I want the type field as the partition key , I will have
    to split the raw data by myself and load the files into the target table?

    I see there's an example in tutorial:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

    but insert must overwrite the table whole table partition.
    Can I insert without the overwrite key word?


    2011/4/15 Ning Zhang <nzhang@fb.com>
    The LOAD DATA command only copy the files to the destination directory. It
    doesn't read the records of the input file, so it cannot do partitioning
    based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert
    clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH
    `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view
    PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the
    raw data according to the type column.



    --
    haitao.yao@Beijing





    --
    haitao.yao@Beijing





    --
    Kind Regards \ Met Vriendelijke Groet,





    Jasper Knulst

    BI Consultant





    VLC Den Haag
    Gildeweg 5B
    2632 BD Nootdorp


    M: +31 (0)6 19 66 75 11

    T: +31 (0)15 764 07 50
    ------------------------------------------------------------

    Skype: jasper_knulst_vlc
  • Ning Zhang at Apr 15, 2011 at 5:08 pm
    The open source Hive doesn't support this yet. Someone mentioned Amazone EMR extended it with this kind of functionality.

    On Apr 15, 2011, at 12:51 AM, Jasper Knulst wrote:

    Hi,

    I think you are looking for this:

    ALTER TABLE page_view ADD PARTITION (date='2008-06-08', country='US', type) LOCATION '/<hdfs path relative to location of base hdfs/hive table dir>';

    So you can go on adding new hdfs subdir's (based on the partition naming) to you base hive hdfs table dir, moving extra files to these subdirs and making them available to your hive users by adding new partitions afterwards.

    This works only in case of EXTERNAL tables I guess.


    Jasper


    2011/4/15 Ning Zhang <nzhang@fb.com
    The INSERT OVERWRITE command will not overwrite the whole table. If you specify a partition in that table, it will only overwrite that partition. If you specify dynamic partitions, it will only create/overwrite partitions that will be seen from the input query (pvs.country in the example).


    On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

    Does this mean if I want the type field as the partition key , I will have to split the raw data by myself and load the files into the target table?

    I see there's an example in tutorial:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
    SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

    but insert must overwrite the table whole table partition.
    Can I insert without the overwrite key word?


    2011/4/15 Ning Zhang <nzhang@fb.com
    The LOAD DATA command only copy the files to the destination directory. It doesn't read the records of the input file, so it cannot do partitioning based on record values.

    On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

    hi,all
    The dynamic partition function is amazing ,but only works in insert clause. Can I use it while loading data into table?

    For example: load data LOAD DATA LOCAL INPATH `/tmp/pv_2008-06-08_us.txt` INTO TABLE page_view PARTITION(date='2008-06-08', country='US', type);
    type is the dynamic partition key in the raw data?

    This will be very cool! If is supported, I will not have to category the raw data according to the type column.



    --
    haitao.yao@Beijing








    --
    haitao.yao@Beijing








    --
    Kind Regards \ Met Vriendelijke Groet,





    Jasper Knulst

    BI Consultant

    <image001.gif>



    VLC Den Haag
    Gildeweg 5B
    2632 BD Nootdorp


    M: +31 (0)6 19 66 75 11

    T: +31 (0)15 764 07 50
    ------------------------------------------------------------

    Skype: jasper_knulst_vlc

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 15, '11 at 5:53a
activeApr 15, '11 at 5:08p
posts8
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase