Grokbase Groups Hive user April 2011
FAQ
Hi,

The Hive documentation describes keyword "external" as following:

The EXTERNAL keyword lets you create a table and provide a LOCATION so that
Hive does not use a default location for this table. This comes in handy if
you already have data generated.

I have my data available in a directory in a bucket in s3. I am trying to
create a table like

CREATE EXTERNAL TABLE IF NOT EXISTS mslog ( TIME_STAMP STRING, SEQ
STRING) LOCATION 's3:// <bucket name>/processed/'

But the table isnt' populated with the data available at the s3 location. Am
i missing something here?


--
- Prash

Search Discussions

  • Christopher, Pat at Apr 11, 2011 at 10:26 pm
    Prash,

    1. You probably want to use the s3n filesystem, not the s3 one. If you use s3 you need to manage your file blocks manually. Swap it over to s3n, way easier.

    2. This could be hive failing to read the files. Hive is probably assuming that there are no readable files in 'processed' so its saying you have no data. Is the data compressed? If so, s3 file names need to end in gz/bzip/etc

    Pat

    From: Prashanth R
    Sent: Monday, April 11, 2011 2:10 PM
    To: user@hive.apache.org
    Subject: External table creation question

    Hi,

    The Hive documentation describes keyword "external" as following:

    The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated.

    I have my data available in a directory in a bucket in s3. I am trying to create a table like

    CREATE EXTERNAL TABLE IF NOT EXISTS mslog ( TIME_STAMP STRING, SEQ STRING) LOCATION 's3:// <bucket name>/processed/'

    But the table isnt' populated with the data available at the s3 location. Am i missing something here?


    --
    - Prash
  • Avram Aelony at Apr 12, 2011 at 4:51 pm
    Hi Prash,

    Try this:

    create external table mslog
    (
    time_stamp string,
    seq string
    ) row format delimited fields terminated by '\t' stored as textfile location 's3://your/bucket/path/'
    ;

    Important: your s3 bucket can only contain files that have the same schema format. Hive doesn't like it when the bucket contains files with a mixture of different columns.
    Also, check your logs if you don't think your data was successfully read.

    Hope this helps,
    ~Avram

    On Apr 11, 2011, at 2:09 PM, Prashanth R wrote:

    Hi,

    The Hive documentation describes keyword "external" as following:

    The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated.

    I have my data available in a directory in a bucket in s3. I am trying to create a table like

    CREATE EXTERNAL TABLE IF NOT EXISTS mslog ( TIME_STAMP STRING, SEQ STRING) LOCATION 's3:// <bucket name>/processed/'

    But the table isnt' populated with the data available at the s3 location. Am i missing something here?


    --
    - Prash

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 11, '11 at 9:10p
activeApr 12, '11 at 4:51p
posts3
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase