Grokbase Groups Hive user March 2011
FAQ
Met a problem that data in an external table didn't get read by hive.

Here's how the table was created and data loaded.

- Created an external table w/ a partition, pointing to an existing location
in hdfs as follows :

create external table order_external (item string, quantity int) partitioned
by (dt string) row format delimited fields terminated by '\t' stored as
textfile location '/user/usera/data/hivetables/order';

- Data from a local file system copied to hdfs

Have 2 data files in local file system

order.2011-03-01.01, which contains 2 entries
order.2011-03-01.02, which contains 1 entry

cd to data file folder
hadoop fs -copyFromLocal order.*
/user/usera/data/hivetables/order/dt=2011-03-01

verify data is there
hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/*
returns 3 entries =>
android 2
iphone 3
ipad 1

- Now, query all items in partition dt='2011-03-01'

select * from order_external o where o.dt='2011-03-01';

this does not show any entry nor did "select * from order_external".

I also played with an external table created similar to above, the same
location (w/o 'dt=...' folder ofcourse) and data used, the same schema and
table name, etc., except that the only difference is this external table is
created without a partition. Querying the table shows correct results.
Didn't have this problem w/ "internal" table that has partitions.

So, what is wrong or missing? Any idea?

Thanks!
--mj

Search Discussions

  • Michael Jiang at Mar 23, 2011 at 10:03 pm
    solved.

    uh, thought that hive will by default look into the table folder in hdfs and
    match sub-folders with partition column names to recognize partitions
    automatically. But realized partition addition has to be done explicitly by
    giving partition name and location. So, by doing "alter table add partition
    (column=value)" solved this (no need to give location since "column=value"
    is a subfolder under table folder in hdfs ;) ...
    On Wed, Mar 23, 2011 at 12:41 PM, Michael Jiang wrote:

    Met a problem that data in an external table didn't get read by hive.

    Here's how the table was created and data loaded.

    - Created an external table w/ a partition, pointing to an existing
    location in hdfs as follows :

    create external table order_external (item string, quantity int)
    partitioned by (dt string) row format delimited fields terminated by '\t'
    stored as textfile location '/user/usera/data/hivetables/order';

    - Data from a local file system copied to hdfs

    Have 2 data files in local file system

    order.2011-03-01.01, which contains 2 entries
    order.2011-03-01.02, which contains 1 entry

    cd to data file folder
    hadoop fs -copyFromLocal order.*
    /user/usera/data/hivetables/order/dt=2011-03-01

    verify data is there
    hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/*
    returns 3 entries =>
    android 2
    iphone 3
    ipad 1

    - Now, query all items in partition dt='2011-03-01'

    select * from order_external o where o.dt='2011-03-01';

    this does not show any entry nor did "select * from order_external".

    I also played with an external table created similar to above, the same
    location (w/o 'dt=...' folder ofcourse) and data used, the same schema and
    table name, etc., except that the only difference is this external table is
    created without a partition. Querying the table shows correct results.
    Didn't have this problem w/ "internal" table that has partitions.

    So, what is wrong or missing? Any idea?

    Thanks!
    --mj
  • Michael Jiang at Mar 23, 2011 at 10:12 pm
    btw, it seems that aws hive has a cool feature to recover all partitions
    from subfolders by name of (col=value) under table folder in S3 without
    explicitly specifying add partition for each ...
    On Wed, Mar 23, 2011 at 3:03 PM, Michael Jiang wrote:

    solved.

    uh, thought that hive will by default look into the table folder in hdfs
    and match sub-folders with partition column names to recognize partitions
    automatically. But realized partition addition has to be done explicitly by
    giving partition name and location. So, by doing "alter table add partition
    (column=value)" solved this (no need to give location since "column=value"
    is a subfolder under table folder in hdfs ;) ...
    On Wed, Mar 23, 2011 at 12:41 PM, Michael Jiang wrote:

    Met a problem that data in an external table didn't get read by hive.

    Here's how the table was created and data loaded.

    - Created an external table w/ a partition, pointing to an existing
    location in hdfs as follows :

    create external table order_external (item string, quantity int)
    partitioned by (dt string) row format delimited fields terminated by '\t'
    stored as textfile location '/user/usera/data/hivetables/order';

    - Data from a local file system copied to hdfs

    Have 2 data files in local file system

    order.2011-03-01.01, which contains 2 entries
    order.2011-03-01.02, which contains 1 entry

    cd to data file folder
    hadoop fs -copyFromLocal order.*
    /user/usera/data/hivetables/order/dt=2011-03-01

    verify data is there
    hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/*
    returns 3 entries =>
    android 2
    iphone 3
    ipad 1

    - Now, query all items in partition dt='2011-03-01'

    select * from order_external o where o.dt='2011-03-01';

    this does not show any entry nor did "select * from order_external".

    I also played with an external table created similar to above, the same
    location (w/o 'dt=...' folder ofcourse) and data used, the same schema and
    table name, etc., except that the only difference is this external table is
    created without a partition. Querying the table shows correct results.
    Didn't have this problem w/ "internal" table that has partitions.

    So, what is wrong or missing? Any idea?

    Thanks!
    --mj

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 23, '11 at 7:41p
activeMar 23, '11 at 10:12p
posts3
users1
websitehive.apache.org

1 user in discussion

Michael Jiang: 3 posts

People

Translate

site design / logo © 2021 Grokbase