Grokbase Groups Hive user July 2011
FAQ
Hi,

I have a data on HDFS that is already stored into directories as per date.
for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I create
external table with partition key as date to point to data in this
directory?
Please advise.

Thanks,
Aniket

Search Discussions

  • Prashanth R at Jul 1, 2011 at 11:12 pm
    Pasting an example here:

    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

    alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'

    - Prashanth


    On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi wrote:

    Hi,

    I have a data on HDFS that is already stored into directories as per date.
    for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I create
    external table with partition key as date to point to data in this
    directory?
    Please advise.

    Thanks,
    Aniket

    --
    - Prash
  • Aniket Mokashi at Jul 1, 2011 at 11:40 pm
    Thanks Prashanth,

    select Count(*) from segmentation_data where (dt='2011-07-01');

    java.io.IOException: Not a file:
    hdfs://hadoop01:9000/data_feed/sophia/segmentation_data/1970-01-01
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
    at
    org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261)

    I am not sure why it looks for 1970 year!
    Also, I am assuming I have to add all the partitions manually, but that
    seems reasonable.

    Thanks,
    Aniket
    On Fri, Jul 1, 2011 at 4:11 PM, Prashanth R wrote:

    Pasting an example here:

    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

    alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'

    - Prashanth



    On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi wrote:

    Hi,

    I have a data on HDFS that is already stored into directories as per date.
    for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I create
    external table with partition key as date to point to data in this
    directory?
    Please advise.

    Thanks,
    Aniket

    --
    - Prash


    --
    "...:::Aniket:::... Quetzalco@tl"
  • Aniket Mokashi at Jul 5, 2011 at 9:23 pm
    Hi,

    I would like hive to detect the partition automatically as the directory
    gets updated with new data (by MR job). Is it possible to do away with
    "alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'" command everytime I get some new
    partition.
    Can I have-
    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) Location '/abc/xyz';
    and hive would start scanning through all available partitions
    (sub-directories inside /abc/xyz)

    Thanks,
    Aniket

    On Fri, Jul 1, 2011 at 4:39 PM, Aniket Mokashi wrote:

    Thanks Prashanth,

    select Count(*) from segmentation_data where (dt='2011-07-01');

    java.io.IOException: Not a file:
    hdfs://hadoop01:9000/data_feed/sophia/segmentation_data/1970-01-01
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
    at
    org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261)

    I am not sure why it looks for 1970 year!
    Also, I am assuming I have to add all the partitions manually, but that
    seems reasonable.

    Thanks,
    Aniket
    On Fri, Jul 1, 2011 at 4:11 PM, Prashanth R wrote:

    Pasting an example here:

    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

    alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'

    - Prashanth



    On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi wrote:

    Hi,

    I have a data on HDFS that is already stored into directories as per
    date. for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I create
    external table with partition key as date to point to data in this
    directory?
    Please advise.

    Thanks,
    Aniket

    --
    - Prash


    --
    "...:::Aniket:::... Quetzalco@tl"


    --
    "...:::Aniket:::... Quetzalco@tl"
  • Prashanth R at Jul 6, 2011 at 4:18 pm
    Hey Aniket,

    Well. I dont think there is a way to insert data as you had described via
    your second command. However you could have a cron that invokes a script
    that keeps changing the insertdate and you could point it to the directory
    where it has nothing but only the files (that has data) which will be loaded
    on to hive.

    Let me know.

    - Prashanth
    On Tue, Jul 5, 2011 at 2:22 PM, Aniket Mokashi wrote:

    Hi,

    I would like hive to detect the partition automatically as the directory
    gets updated with new data (by MR job). Is it possible to do away with
    "alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'" command everytime I get some new
    partition.
    Can I have-
    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) Location '/abc/xyz';
    and hive would start scanning through all available partitions
    (sub-directories inside /abc/xyz)

    Thanks,
    Aniket

    On Fri, Jul 1, 2011 at 4:39 PM, Aniket Mokashi wrote:

    Thanks Prashanth,

    select Count(*) from segmentation_data where (dt='2011-07-01');

    java.io.IOException: Not a file:
    hdfs://hadoop01:9000/data_feed/sophia/segmentation_data/1970-01-01
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
    at
    org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261)

    I am not sure why it looks for 1970 year!
    Also, I am assuming I have to add all the partitions manually, but that
    seems reasonable.

    Thanks,
    Aniket
    On Fri, Jul 1, 2011 at 4:11 PM, Prashanth R wrote:

    Pasting an example here:

    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

    alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'

    - Prashanth



    On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi wrote:

    Hi,

    I have a data on HDFS that is already stored into directories as per
    date. for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I create
    external table with partition key as date to point to data in this
    directory?
    Please advise.

    Thanks,
    Aniket

    --
    - Prash


    --
    "...:::Aniket:::... Quetzalco@tl"


    --
    "...:::Aniket:::... Quetzalco@tl"


    --
    - Prash
  • Aniket Mokashi at Jul 6, 2011 at 7:53 pm
    Thanks Prashanth.

    But, it means I have to fire one alter table <tname> add partition query for
    every date-sub-directory I have inside '/abc/xyz'. Although, this doesn't
    seem unreasonable but it would have been simpler if hive could automatically
    identify the arrival of data. There was a similar example on this, but it
    doesnt seem to work with partitions- http://www.simon-fortelny.com/?p=137

    Thanks,
    Aniket
    On Wed, Jul 6, 2011 at 9:17 AM, Prashanth R wrote:

    Hey Aniket,

    Well. I dont think there is a way to insert data as you had described via
    your second command. However you could have a cron that invokes a script
    that keeps changing the insertdate and you could point it to the directory
    where it has nothing but only the files (that has data) which will be loaded
    on to hive.

    Let me know.

    - Prashanth

    On Tue, Jul 5, 2011 at 2:22 PM, Aniket Mokashi wrote:

    Hi,

    I would like hive to detect the partition automatically as the directory
    gets updated with new data (by MR job). Is it possible to do away with
    "alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'" command everytime I get some new
    partition.
    Can I have-
    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) Location '/abc/xyz';
    and hive would start scanning through all available partitions
    (sub-directories inside /abc/xyz)

    Thanks,
    Aniket

    On Fri, Jul 1, 2011 at 4:39 PM, Aniket Mokashi wrote:

    Thanks Prashanth,

    select Count(*) from segmentation_data where (dt='2011-07-01');

    java.io.IOException: Not a file:
    hdfs://hadoop01:9000/data_feed/sophia/segmentation_data/1970-01-01
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
    at
    org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261)

    I am not sure why it looks for 1970 year!
    Also, I am assuming I have to add all the partitions manually, but that
    seems reasonable.

    Thanks,
    Aniket
    On Fri, Jul 1, 2011 at 4:11 PM, Prashanth R wrote:

    Pasting an example here:

    CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
    (insertdate string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

    alter table tablename add partition (insertdate='2008-01-01') LOCATION
    's3n://' or 'hdfs://<path>/abc/xyz/'

    - Prashanth



    On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi wrote:

    Hi,

    I have a data on HDFS that is already stored into directories as per
    date. for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I create
    external table with partition key as date to point to data in this
    directory?
    Please advise.

    Thanks,
    Aniket

    --
    - Prash


    --
    "...:::Aniket:::... Quetzalco@tl"


    --
    "...:::Aniket:::... Quetzalco@tl"


    --
    - Prash


    --
    "...:::Aniket:::... Quetzalco@tl"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 1, '11 at 10:57p
activeJul 6, '11 at 7:53p
posts6
users2
websitehive.apache.org

2 users in discussion

Aniket Mokashi: 4 posts Prashanth R: 2 posts

People

Translate

site design / logo © 2022 Grokbase