Grokbase Groups Hive user July 2009
FAQ
I'm trying to load data into table using the command below. However, I only
got a bunch of NULL in the field. The data fields are seperated by tab.

CREATE TABLE IF NOT EXISTS userweight(source INT, dist INT, weight DOUBLE)
row format delimited fields terminated by " \t";
load data local inpath "/tmp/Graph/edges_tag_jaccard_directed_2006.dat" into
table userweight

--
Thank you,
Keven Chen

Search Discussions

  • Manhee Jo at Jul 22, 2009 at 7:24 am
    Hi all,

    What really happens when a huge file (e.g. some tens of TB) is "LOADed DATA
    (LOCAL) INPATH ...
    INTO TABLE"? Does hive need to scan the entire file before processing
    anything even very simple (e.g. select)?
    If so, are there any solutions to decrease the number of disk access? Is
    partitioning a way to do it?

    Many Thanks,
    Manhee
  • Zheng Shao at Jul 22, 2009 at 8:58 pm
    If the huge file is already on HDFS (load data WITHOUT local), Hive
    will just *move* the file into the table (NOTE: that means user won't
    be able to see the file in its original directory afterwards)

    If you don't want that to happen, you might want to use "CREATE
    EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"

    If the huge file is on local file system, you will have to use (load
    data WITH local), and Hive will copy the file.


    Zheng

    On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jowrote:
    Hi all,

    What really happens when a huge file (e.g. some tens of TB) is "LOADed DATA
    (LOCAL) INPATH ...
    INTO TABLE"? Does hive need to scan the entire file before processing
    anything even very simple (e.g. select)?
    If so, are there any solutions to decrease the number of disk access? Is
    partitioning a way to do it?

    Many Thanks,
    Manhee


    --
    Yours,
    Zheng
  • Manhee Jo at Jul 23, 2009 at 12:34 am
    Thank you!

    ----- Original Message -----
    From: "Zheng Shao" <zshao9@gmail.com>
    To: <hive-user@hadoop.apache.org>
    Sent: Thursday, July 23, 2009 5:49 AM
    Subject: Re: loading data from HDFS or local file to


    If the huge file is already on HDFS (load data WITHOUT local), Hive
    will just *move* the file into the table (NOTE: that means user won't
    be able to see the file in its original directory afterwards)

    If you don't want that to happen, you might want to use "CREATE
    EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"

    If the huge file is on local file system, you will have to use (load
    data WITH local), and Hive will copy the file.


    Zheng

    On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jowrote:
    Hi all,

    What really happens when a huge file (e.g. some tens of TB) is "LOADed
    DATA
    (LOCAL) INPATH ...
    INTO TABLE"? Does hive need to scan the entire file before processing
    anything even very simple (e.g. select)?
    If so, are there any solutions to decrease the number of disk access? Is
    partitioning a way to do it?

    Many Thanks,
    Manhee


    --
    Yours,
    Zheng
  • Manhee Jo at Jul 23, 2009 at 2:11 am
    Hi Zheng,

    I've tried to load a sample file after creating an external table like
    below.

    hive> create external table extab (key int, val string)
    row format delimited fields terminated by '\t'
    lines terminated by '\n'
    location '/user/hive/warehouse/test/';
    Here, /user/hive/warehouse/test contains an HDFS file which I am going to
    load
    into table extab. this was OK. On load, though,

    hive> load data inpath '/user/hive/warehouse/test/kv1.txt'
    overwrite into table extab;
    I found an error like below

    FAILED: Error in semantic analysis: line 2:17 Path is not legal
    '/user/hive/warehouse/test/kv1.txt':
    Move from: hdfs://vm2:9000/user/hive/warehouse/test/kv1.txt to:
    /user/hive/warehouse/test/ is not valid.
    Please check that values for params "default.fs.name" and
    "hive.metastore.warehouse.dir" do not onflict.

    I've changed directories different ones, but to no avail. Can you suggest
    any solutions?

    By the way, is "default.fs.name" right? I could find "fs.default.name" but
    not "default.fs.name".

    Thank you,
    Manhee


    ----- Original Message -----
    From: "Zheng Shao" <zshao9@gmail.com>
    To: <hive-user@hadoop.apache.org>
    Sent: Thursday, July 23, 2009 5:49 AM
    Subject: Re: loading data from HDFS or local file to


    If the huge file is already on HDFS (load data WITHOUT local), Hive
    will just *move* the file into the table (NOTE: that means user won't
    be able to see the file in its original directory afterwards)

    If you don't want that to happen, you might want to use "CREATE
    EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"

    If the huge file is on local file system, you will have to use (load
    data WITH local), and Hive will copy the file.


    Zheng

    On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jowrote:
    Hi all,

    What really happens when a huge file (e.g. some tens of TB) is "LOADed
    DATA
    (LOCAL) INPATH ...
    INTO TABLE"? Does hive need to scan the entire file before processing
    anything even very simple (e.g. select)?
    If so, are there any solutions to decrease the number of disk access? Is
    partitioning a way to do it?

    Many Thanks,
    Manhee


    --
    Yours,
    Zheng
  • Zheng Shao at Jul 24, 2009 at 9:37 am
    Hi Manhee,

    You don't need to do "load" for an external table. You already
    specified the location of the external table in the "create external
    table" command, so you can directly use that external table.

    Zheng

    On Wed, Jul 22, 2009 at 7:12 PM, Manhee Jowrote:
    Hi Zheng,

    I've tried to load a sample file after creating an external table like
    below.

    hive> create external table extab (key int, val string)
    row format delimited fields terminated by '\t'
    lines terminated by '\n'
    location '/user/hive/warehouse/test/';
    Here, /user/hive/warehouse/test contains an HDFS file which I am going to
    load
    into table extab. this was OK. On load, though,

    hive> load data inpath '/user/hive/warehouse/test/kv1.txt'
    overwrite into table extab;
    I found an error like below

    FAILED: Error in semantic analysis: line 2:17 Path is not legal
    '/user/hive/warehouse/test/kv1.txt':
    Move from: hdfs://vm2:9000/user/hive/warehouse/test/kv1.txt to:
    /user/hive/warehouse/test/ is not valid.
    Please check that values for params "default.fs.name" and
    "hive.metastore.warehouse.dir" do not onflict.

    I've changed directories different ones, but to no avail. Can you suggest
    any solutions?

    By the way, is "default.fs.name" right? I could find "fs.default.name" but
    not "default.fs.name".

    Thank you,
    Manhee


    ----- Original Message ----- From: "Zheng Shao" <zshao9@gmail.com>
    To: <hive-user@hadoop.apache.org>
    Sent: Thursday, July 23, 2009 5:49 AM
    Subject: Re: loading data from HDFS or local file to


    If the huge file is already on HDFS (load data WITHOUT local), Hive
    will just *move* the file into the table (NOTE: that means user won't
    be able to see the file in its original directory afterwards)

    If you don't want that to happen, you might want to use "CREATE
    EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"

    If the huge file is on local file system, you will have to use (load
    data WITH local), and Hive will copy the file.


    Zheng

    On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jowrote:
    Hi all,

    What really happens when a huge file (e.g. some tens of TB) is "LOADed
    DATA
    (LOCAL) INPATH ...
    INTO TABLE"? Does hive need to scan the entire file before processing
    anything even very simple (e.g. select)?
    If so, are there any solutions to decrease the number of disk access? Is
    partitioning a way to do it?

    Many Thanks,
    Manhee


    --
    Yours,
    Zheng



    --
    Yours,
    Zheng
  • Manhee Jo at Jul 27, 2009 at 2:57 am
    Excellent! Thank you, Zheng.

    ----- Original Message -----
    From: "Zheng Shao" <zshao9@gmail.com>
    To: <hive-user@hadoop.apache.org>
    Sent: Friday, July 24, 2009 6:38 PM
    Subject: Re: loading data from HDFS or local file to


    Hi Manhee,

    You don't need to do "load" for an external table. You already
    specified the location of the external table in the "create external
    table" command, so you can directly use that external table.

    Zheng

    On Wed, Jul 22, 2009 at 7:12 PM, Manhee Jowrote:
    Hi Zheng,

    I've tried to load a sample file after creating an external table like
    below.

    hive> create external table extab (key int, val string)
    row format delimited fields terminated by '\t'
    lines terminated by '\n'
    location '/user/hive/warehouse/test/';
    Here, /user/hive/warehouse/test contains an HDFS file which I am going to
    load
    into table extab. this was OK. On load, though,

    hive> load data inpath '/user/hive/warehouse/test/kv1.txt'
    overwrite into table extab;
    I found an error like below

    FAILED: Error in semantic analysis: line 2:17 Path is not legal
    '/user/hive/warehouse/test/kv1.txt':
    Move from: hdfs://vm2:9000/user/hive/warehouse/test/kv1.txt to:
    /user/hive/warehouse/test/ is not valid.
    Please check that values for params "default.fs.name" and
    "hive.metastore.warehouse.dir" do not onflict.

    I've changed directories different ones, but to no avail. Can you suggest
    any solutions?

    By the way, is "default.fs.name" right? I could find "fs.default.name" but
    not "default.fs.name".

    Thank you,
    Manhee


    ----- Original Message ----- From: "Zheng Shao" <zshao9@gmail.com>
    To: <hive-user@hadoop.apache.org>
    Sent: Thursday, July 23, 2009 5:49 AM
    Subject: Re: loading data from HDFS or local file to


    If the huge file is already on HDFS (load data WITHOUT local), Hive
    will just *move* the file into the table (NOTE: that means user won't
    be able to see the file in its original directory afterwards)

    If you don't want that to happen, you might want to use "CREATE
    EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"

    If the huge file is on local file system, you will have to use (load
    data WITH local), and Hive will copy the file.


    Zheng

    On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jowrote:
    Hi all,

    What really happens when a huge file (e.g. some tens of TB) is "LOADed
    DATA
    (LOCAL) INPATH ...
    INTO TABLE"? Does hive need to scan the entire file before processing
    anything even very simple (e.g. select)?
    If so, are there any solutions to decrease the number of disk access? Is
    partitioning a way to do it?

    Many Thanks,
    Manhee


    --
    Yours,
    Zheng



    --
    Yours,
    Zheng
  • Zheng Shao at Jul 22, 2009 at 8:58 pm
    Hi Chen,

    Can you double check the format of the file? Is it plain text?

    CREATE TABLE IF NOT EXISTS userweight(source INT, dist INT, weight DOUBLE)
    ROW FORMAT delimited fields terminated by " \t"
    STORED AS TEXTFILE;

    The optional "STORED AS" clause tells Hive the format of your file.
    Hive supports TEXTFILE and SEQUENCEFILE natively.

    If your file has a customized format, you need to write your own
    fileformat classes.
    Please take a look at the example added by:
    https://issues.apache.org/jira/browse/HIVE-639


    Zheng

    On Mon, Jul 20, 2009 at 3:53 PM, chen kevenwrote:
    I'm trying to load data into table using the command below. However, I only
    got a bunch of NULL in the field. The data fields are seperated by tab.

    CREATE TABLE IF NOT EXISTS userweight(source INT, dist INT, weight DOUBLE)
    row format delimited fields terminated by " \t";
    load data local inpath "/tmp/Graph/edges_tag_jaccard_directed_2006.dat" into
    table userweight

    --
    Thank you,
    Keven Chen


    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 20, '09 at 10:52p
activeJul 27, '09 at 2:57a
posts8
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase