Grokbase Groups Hive user August 2009
FAQ
Hi

I was wondering if anyone has thought about the possibility of having
dynamic partitioning in Hive? Right now you typically use LOAD DATA or
ALTER TABLE to add new partitions. It would be great for applications
like Scribe that can load data into HDFS, could just place the data
into the correct folder structure for your partitions on HDFS. Has
anyone investigated this? What is everyone else doing in regards to
things like this? It seems a little error prone to have a cron job run
everyday adding new partitions. It might not even be possible to do
dynamic partitioning since its meta data read. But I'd love to hear
thoughts?

-Chris

Search Discussions

  • Frederick Oko at Aug 13, 2009 at 9:13 am
    Actually this is what Hive originally did -- it used to trust partitions it
    discovered via HDFS -- this blind trust could be leveraged for just what you
    are requesting as partions do follow a simple directory scheme (and there is
    precedent for such out-of-band data loading). However, this blind trust
    became incompatible with extended feature set of external tables and
    per-partition schemas introduced earlier this year. The re-enabling of this
    behavior based on configuration is currently tracked as
    https://issues.apache.org/jira/browse/HIVE-493 'automatically infer existing
    partitions of table from HDFS files'.
    On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet wrote:

    Hi

    I was wondering if anyone has thought about the possibility of having
    dynamic partitioning in Hive? Right now you typically use LOAD DATA or ALTER
    TABLE to add new partitions. It would be great for applications like Scribe
    that can load data into HDFS, could just place the data into the correct
    folder structure for your partitions on HDFS. Has anyone investigated this?
    What is everyone else doing in regards to things like this? It seems a
    little error prone to have a cron job run everyday adding new partitions. It
    might not even be possible to do dynamic partitioning since its meta data
    read. But I'd love to hear thoughts?

    -Chris
  • Prasad Chakka at Aug 18, 2009 at 1:19 am
    We could make this feature per table property which doesn't have the extended feature set supported...


    ________________________________
    From: Frederick Oko <frederick.oko@gmail.com>
    Reply-To: <hive-user@hadoop.apache.org>
    Date: Thu, 13 Aug 2009 02:12:54 -0700
    To: <hive-user@hadoop.apache.org>
    Subject: Re: Dynamic Partitioning?

    Actually this is what Hive originally did -- it used to trust partitions it discovered via HDFS -- this blind trust could be leveraged for just what you are requesting as partions do follow a simple directory scheme (and there is precedent for such out-of-band data loading). However, this blind trust became incompatible with extended feature set of external tables and per-partition schemas introduced earlier this year. The re-enabling of this behavior based on configuration is currently tracked as https://issues.apache.org/jira/browse/HIVE-493 'automatically infer existing partitions of table from HDFS files'.

    On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet wrote:
    Hi

    I was wondering if anyone has thought about the possibility of having dynamic partitioning in Hive? Right now you typically use LOAD DATA or ALTER TABLE to add new partitions. It would be great for applications like Scribe that can load data into HDFS, could just place the data into the correct folder structure for your partitions on HDFS. Has anyone investigated this? What is everyone else doing in regards to things like this? It seems a little error prone to have a cron job run everyday adding new partitions. It might not even be possible to do dynamic partitioning since its meta data read. But I'd love to hear thoughts?

    -Chris
  • Chris Goffinet at Aug 18, 2009 at 1:38 am
    How much work is involved for such a feature?

    -Chris
    On Aug 17, 2009, at 6:19 PM, Prasad Chakka wrote:

    We could make this feature per table property which doesn’t have the
    extended feature set supported...


    From: Frederick Oko <frederick.oko@gmail.com>
    Reply-To: <hive-user@hadoop.apache.org>
    Date: Thu, 13 Aug 2009 02:12:54 -0700
    To: <hive-user@hadoop.apache.org>
    Subject: Re: Dynamic Partitioning?

    Actually this is what Hive originally did -- it used to trust
    partitions it discovered via HDFS -- this blind trust could be
    leveraged for just what you are requesting as partions do follow a
    simple directory scheme (and there is precedent for such out-of-band
    data loading). However, this blind trust became incompatible with
    extended feature set of external tables and per-partition schemas
    introduced earlier this year. The re-enabling of this behavior based
    on configuration is currently tracked as https://issues.apache.org/jira/browse/HIVE-493
    'automatically infer existing partitions of table from HDFS files'.

    On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet wrote:
    Hi

    I was wondering if anyone has thought about the possibility of
    having dynamic partitioning in Hive? Right now you typically use
    LOAD DATA or ALTER TABLE to add new partitions. It would be great
    for applications like Scribe that can load data into HDFS, could
    just place the data into the correct folder structure for your
    partitions on HDFS. Has anyone investigated this? What is everyone
    else doing in regards to things like this? It seems a little error
    prone to have a cron job run everyday adding new partitions. It
    might not even be possible to do dynamic partitioning since its meta
    data read. But I'd love to hear thoughts?

    -Chris
  • Prasad Chakka at Aug 18, 2009 at 1:47 am
    Well, the code to infer partitions from HDFS directory exists in old version of Hive. You need to bring that back (and possibly make some modifications to reflect latest code). But the work involved here is to disallow tables being marked as EXTERNAL and also disallow setting Partition properties. There may be couple of other things that need to be taken care of that I can't think of right now.

    It doesn't look like much.

    Prasad

    ________________________________
    From: Chris Goffinet <goffinet@digg.com>
    Reply-To: <hive-user@hadoop.apache.org>
    Date: Mon, 17 Aug 2009 18:38:40 -0700
    To: <hive-user@hadoop.apache.org>
    Subject: Re: Dynamic Partitioning?

    How much work is involved for such a feature?

    -Chris

    On Aug 17, 2009, at 6:19 PM, Prasad Chakka wrote:

    We could make this feature per table property which doesn't have the extended feature set supported...


    ________________________________
    From: Frederick Oko <frederick.oko@gmail.com <x-msg://89/frederick.oko@gmail.com> >
    Reply-To: <hive-user@hadoop.apache.org <x-msg://89/hive-user@hadoop.apache.org> >
    Date: Thu, 13 Aug 2009 02:12:54 -0700
    To: <hive-user@hadoop.apache.org <x-msg://89/hive-user@hadoop.apache.org> >
    Subject: Re: Dynamic Partitioning?

    Actually this is what Hive originally did -- it used to trust partitions it discovered via HDFS -- this blind trust could be leveraged for just what you are requesting as partions do follow a simple directory scheme (and there is precedent for such out-of-band data loading). However, this blind trust became incompatible with extended feature set of external tables and per-partition schemas introduced earlier this year. The re-enabling of this behavior based on configuration is currently tracked as https://issues.apache.org/jira/browse/HIVE-493 'automatically infer existing partitions of table from HDFS files'.

    On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet <goffinet@digg.com wrote:
    Hi

    I was wondering if anyone has thought about the possibility of having dynamic partitioning in Hive? Right now you typically use LOAD DATA or ALTER TABLE to add new partitions. It would be great for applications like Scribe that can load data into HDFS, could just place the data into the correct folder structure for your partitions on HDFS. Has anyone investigated this? What is everyone else doing in regards to things like this? It seems a little error prone to have a cron job run everyday adding new partitions. It might not even be possible to do dynamic partitioning since its meta data read. But I'd love to hear thoughts?

    -Chris
  • Schubert Zhang at Sep 28, 2009 at 4:47 pm
    We have following use case:

    1. We have a periodic MapReduce job to pre-process the source data (files)
    and want put the output data files into HDFS directory. The HDFS directory
    is correspond to a Hive table (this table should be partitioned). The above
    MapReduce job shall output data into different partitions based on data
    analysis.

    2. We want Hive to recognise any new raised partitions from HDFS
    sub-directories under the table's root directory. And the above MapReduce
    job may add new files into new created partitions or existing partitions.

    3. We also need a compact/merging process to periodic compact or merge the
    existing partitions to get bigger files.


    On Tue, Aug 18, 2009 at 9:46 AM, Prasad Chakka wrote:

    Well, the code to infer partitions from HDFS directory exists in old
    version of Hive. You need to bring that back (and possibly make some
    modifications to reflect latest code). But the work involved here is to
    disallow tables being marked as EXTERNAL and also disallow setting Partition
    properties. There may be couple of other things that need to be taken care
    of that I can’t think of right now.

    It doesn’t look like much.

    Prasad

    ------------------------------
    *From: *Chris Goffinet <goffinet@digg.com>
    *Reply-To: *<hive-user@hadoop.apache.org>
    *Date: *Mon, 17 Aug 2009 18:38:40 -0700
    *To: *<hive-user@hadoop.apache.org>
    *Subject: *Re: Dynamic Partitioning?

    How much work is involved for such a feature?

    -Chris

    On Aug 17, 2009, at 6:19 PM, Prasad Chakka wrote:

    We could make this feature per table property which doesn’t have the
    extended feature set supported...


    ------------------------------
    *From: *Frederick Oko <frederick.oko@gmail.com <x-msg:
    //89/frederick.oko@gmail.com <http:///89/frederick.oko@gmail.com>> >
    *Reply-To: *<hive-user@hadoop.apache.org <x-msg:
    //89/hive-user@hadoop.apache.org <http:///89/hive-user@hadoop.apache.org>>
    *Date: *Thu, 13 Aug 2009 02:12:54 -0700
    *To: *<hive-user@hadoop.apache.org <x-msg://89/hive-user@hadoop.apache.org<http:///89/hive-user@hadoop.apache.org>>
    *Subject: *Re: Dynamic Partitioning?

    Actually this is what Hive originally did -- it used to trust partitions it
    discovered via HDFS -- this blind trust could be leveraged for just what you
    are requesting as partions do follow a simple directory scheme (and there is
    precedent for such out-of-band data loading). However, this blind trust
    became incompatible with extended feature set of external tables and
    per-partition schemas introduced earlier this year. The re-enabling of this
    behavior based on configuration is currently tracked as
    https://issues.apache.org/jira/browse/HIVE-493 'automatically infer
    existing partitions of table from HDFS files'.

    On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet <goffinet@digg.com<x-msg:
    //89/goffinet@digg.com wrote:

    Hi

    I was wondering if anyone has thought about the possibility of having
    dynamic partitioning in Hive? Right now you typically use LOAD DATA or ALTER
    TABLE to add new partitions. It would be great for applications like Scribe
    that can load data into HDFS, could just place the data into the correct
    folder structure for your partitions on HDFS. Has anyone investigated this?
    What is everyone else doing in regards to things like this? It seems a
    little error prone to have a cron job run everyday adding new partitions. It
    might not even be possible to do dynamic partitioning since its meta data
    read. But I'd love to hear thoughts?

    -Chris




Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 11, '09 at 6:16p
activeSep 28, '09 at 4:47p
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase