Well, the code to infer partitions from HDFS directory exists in old version of Hive. You need to bring that back (and possibly make some modifications to reflect latest code). But the work involved here is to disallow tables being marked as EXTERNAL and also disallow setting Partition properties. There may be couple of other things that need to be taken care of that I can't think of right now.
It doesn't look like much.
Prasad
________________________________
From: Chris Goffinet <goffinet@digg.com>
Reply-To: <hive-user@hadoop.apache.org>
Date: Mon, 17 Aug 2009 18:38:40 -0700
To: <hive-user@hadoop.apache.org>
Subject: Re: Dynamic Partitioning?
How much work is involved for such a feature?
-Chris
On Aug 17, 2009, at 6:19 PM, Prasad Chakka wrote:
We could make this feature per table property which doesn't have the extended feature set supported...
________________________________
From: Frederick Oko <frederick.oko@gmail.com <x-msg://89/frederick.oko@gmail.com> >
Reply-To: <hive-user@hadoop.apache.org <x-msg://89/hive-user@hadoop.apache.org> >
Date: Thu, 13 Aug 2009 02:12:54 -0700
To: <hive-user@hadoop.apache.org <x-msg://89/hive-user@hadoop.apache.org> >
Subject: Re: Dynamic Partitioning?
Actually this is what Hive originally did -- it used to trust partitions it discovered via HDFS -- this blind trust could be leveraged for just what you are requesting as partions do follow a simple directory scheme (and there is precedent for such out-of-band data loading). However, this blind trust became incompatible with extended feature set of external tables and per-partition schemas introduced earlier this year. The re-enabling of this behavior based on configuration is currently tracked as
https://issues.apache.org/jira/browse/HIVE-493 'automatically infer existing partitions of table from HDFS files'.
On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet <goffinet@digg.com wrote:
Hi
I was wondering if anyone has thought about the possibility of having dynamic partitioning in Hive? Right now you typically use LOAD DATA or ALTER TABLE to add new partitions. It would be great for applications like Scribe that can load data into HDFS, could just place the data into the correct folder structure for your partitions on HDFS. Has anyone investigated this? What is everyone else doing in regards to things like this? It seems a little error prone to have a cron job run everyday adding new partitions. It might not even be possible to do dynamic partitioning since its meta data read. But I'd love to hear thoughts?
-Chris