FAQ
Hey amigos,

I'm doing a EMR load for HDFS to S3 data. My example looks correct,
but I'm getting an odd error. Since all the EMR data is in one
directory, I'm copying the file to HDFS, then doing 'LOAD DATA INPATH'
to put it back into S3.

CREATE TABLE events(
..blahblah...
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://outputdir/table_out/events'
;

LOAD DATA INPATH '/user/hadoop/eos/events_20110107.csv.gz' overwrite
INTO TABLE events;

The error I get is:
FAILED: Error in semantic analysis: line 3:17 Path is not legal
'/user/hadoop/eos/events_20110430.csv.gz': Move from:
hdfs://domU-12-31-39-14-19-F1.compute-1.internal:9000/user/hadoop/eos/events_20110430.csv.gz
to: s3://outputdir/table_out/events is not valid. Please check that
values for params "default.fs.name" and "hive.metastore.warehouse.dir"
do not conflict.

This is EMR, and I've checked the params and see they do not conflict.


--
Bradford Stephens,
CEO and Founder, Drawn to Scale
http://drawntoscale.com
(530) 763-DATA

http://www.drawntoscale.com -- Spire, the "Heroku for Big Data"

Search Discussions

  • Jonathan Seidman at Sep 26, 2011 at 10:28 pm
    Hey Bradford - from my experience that error occurs when there's a conflict
    between the "default.fs.name" setting and the value in the
    metastore.SDS.location column in the Hive metadata. For us this has occurred
    when either migrating to a new cluster or changing the NN hostname. Not sure
    how all this works with AWS/EMR, but that's the first thing I'd check.

    Jonathan
    On Mon, Sep 26, 2011 at 5:16 PM, Bradford Stephens wrote:

    Hey amigos,

    I'm doing a EMR load for HDFS to S3 data. My example looks correct,
    but I'm getting an odd error. Since all the EMR data is in one
    directory, I'm copying the file to HDFS, then doing 'LOAD DATA INPATH'
    to put it back into S3.

    CREATE TABLE events(
    ..blahblah...
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    LOCATION 's3://outputdir/table_out/events'
    ;

    LOAD DATA INPATH '/user/hadoop/eos/events_20110107.csv.gz' overwrite
    INTO TABLE events;

    The error I get is:
    FAILED: Error in semantic analysis: line 3:17 Path is not legal
    '/user/hadoop/eos/events_20110430.csv.gz': Move from:

    hdfs://domU-12-31-39-14-19-F1.compute-1.internal:9000/user/hadoop/eos/events_20110430.csv.gz
    to: s3://outputdir/table_out/events is not valid. Please check that
    values for params "default.fs.name" and "hive.metastore.warehouse.dir"
    do not conflict.

    This is EMR, and I've checked the params and see they do not conflict.


    --
    Bradford Stephens,
    CEO and Founder, Drawn to Scale
    http://drawntoscale.com
    (530) 763-DATA

    http://www.drawntoscale.com -- Spire, the "Heroku for Big Data"
  • Miguel Cabero at Sep 27, 2011 at 12:11 am
    Hi Bradford,

    For tables stored on s3, you have to specify :
    create EXTERNAL table events …

    Regards,

    Miguel
    On 27 Sep 2011, at 00:28, Jonathan Seidman wrote:

    Hey Bradford - from my experience that error occurs when there's a conflict between the "default.fs.name" setting and the value in the metastore.SDS.location column in the Hive metadata. For us this has occurred when either migrating to a new cluster or changing the NN hostname. Not sure how all this works with AWS/EMR, but that's the first thing I'd check.

    Jonathan

    On Mon, Sep 26, 2011 at 5:16 PM, Bradford Stephens wrote:
    Hey amigos,

    I'm doing a EMR load for HDFS to S3 data. My example looks correct,
    but I'm getting an odd error. Since all the EMR data is in one
    directory, I'm copying the file to HDFS, then doing 'LOAD DATA INPATH'
    to put it back into S3.

    CREATE TABLE events(
    ..blahblah...
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    LOCATION 's3://outputdir/table_out/events'
    ;

    LOAD DATA INPATH '/user/hadoop/eos/events_20110107.csv.gz' overwrite
    INTO TABLE events;

    The error I get is:
    FAILED: Error in semantic analysis: line 3:17 Path is not legal
    '/user/hadoop/eos/events_20110430.csv.gz': Move from:
    hdfs://domU-12-31-39-14-19-F1.compute-1.internal:9000/user/hadoop/eos/events_20110430.csv.gz
    to: s3://outputdir/table_out/events is not valid. Please check that
    values for params "default.fs.name" and "hive.metastore.warehouse.dir"
    do not conflict.

    This is EMR, and I've checked the params and see they do not conflict.


    --
    Bradford Stephens,
    CEO and Founder, Drawn to Scale
    http://drawntoscale.com
    (530) 763-DATA

    http://www.drawntoscale.com -- Spire, the "Heroku for Big Data"
  • Bradford Stephens at Sep 27, 2011 at 8:43 pm
    I've told it to CREATE EXTERNAL TABLE, I'm still getting the same errors.

    fs.default.name=hdfs://ip-10-64-74-82.ec2.internal:9000
    metastore.SDS.locaion: s3://mapreduce.dev.evite.com/table_out/events

    Ideas? Should fs.default somehow point to S3?

    Cheers,
    B
    On Mon, Sep 26, 2011 at 5:11 PM, Miguel Cabero wrote:
    Hi Bradford,
    For tables stored on s3, you have to specify :
    create EXTERNAL table events …
    Regards,
    Miguel
    On 27 Sep 2011, at 00:28, Jonathan Seidman wrote:

    Hey Bradford - from my experience that error occurs when there's a conflict
    between the "default.fs.name" setting and the value in the
    metastore.SDS.location column in the Hive metadata. For us this has occurred
    when either migrating to a new cluster or changing the NN hostname. Not sure
    how all this works with AWS/EMR, but that's the first thing I'd check.

    Jonathan
    On Mon, Sep 26, 2011 at 5:16 PM, Bradford Stephens
    wrote:
    Hey amigos,

    I'm doing a EMR load for HDFS to S3 data. My example looks correct,
    but I'm getting an odd error. Since all the EMR data is in one
    directory, I'm copying the file to HDFS, then doing 'LOAD DATA INPATH'
    to put it back into S3.

    CREATE TABLE events(
    ..blahblah...
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    LOCATION 's3://outputdir/table_out/events'
    ;

    LOAD DATA INPATH '/user/hadoop/eos/events_20110107.csv.gz' overwrite
    INTO TABLE events;

    The error I get is:
    FAILED: Error in semantic analysis: line 3:17 Path is not legal
    '/user/hadoop/eos/events_20110430.csv.gz': Move from:

    hdfs://domU-12-31-39-14-19-F1.compute-1.internal:9000/user/hadoop/eos/events_20110430.csv.gz
    to: s3://outputdir/table_out/events is not valid. Please check that
    values for params "default.fs.name" and "hive.metastore.warehouse.dir"
    do not conflict.

    This is EMR, and I've checked the params and see they do not conflict.


    --
    Bradford Stephens,
    CEO and Founder, Drawn to Scale
    http://drawntoscale.com
    (530) 763-DATA

    http://www.drawntoscale.com -- Spire, the "Heroku for Big Data"


    --
    Bradford Stephens,
    CEO and Founder, Drawn to Scale
    http://drawntoscale.com
    (530) 763-DATA

    http://www.drawntoscale.com -- Spire, the "Heroku for Big Data"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 26, '11 at 10:17p
activeSep 27, '11 at 8:43p
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase