FAQ
Hi All,
I am a hive newbie.

LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

When I use LOCAL keyword does hive create a hdfs file for it?

I used above statement to put data into a hive table.
But I could not see any hdfs file in my hive.metastore.warehouse.dir (which
comes from hive-default.xml /user/hive/warehouse/ ).

Although when I did

FROM a INSERT OVERWRITE TABLE
b PARTITION(dt='2011-01-06') SELECT *

I see hdfs file is created.

Am I missing something?

Can somebody please explain?

Search Discussions

  • Ajo Fod at Feb 1, 2011 at 1:21 pm
    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    Hi All,
    I am a hive newbie.

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?

    I used above statement to put data into a hive table.
    But I could not see any hdfs file in my hive.metastore.warehouse.dir (which
    comes from hive-default.xml /user/hive/warehouse/ ).

    Although when I did

    FROM a INSERT OVERWRITE TABLE
    b PARTITION(dt='2011-01-06') SELECT *

    I see hdfs file is created.

    Am I missing something?

    Can somebody please explain?
  • Amlan Mandal at Feb 1, 2011 at 4:41 pm
    Thanks Ajo.
    Please confirm if my understanding is correct.
    That means when I do "LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO
    TABLE tablename" data in is local file system. If I need to run HIVE queries
    (which in turn would be converted to Map Reduce jobs) I need to pull the
    data some other table for which data is in HDFS by means of

    INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of)

    So those LOCAL tables are kind of temporary.

    Amlan

    On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:

    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    Hi All,
    I am a hive newbie.

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?

    I used above statement to put data into a hive table.
    But I could not see any hdfs file in my hive.metastore.warehouse.dir
    (which comes from hive-default.xml /user/hive/warehouse/ ).
    Although when I did

    FROM a INSERT OVERWRITE TABLE
    b PARTITION(dt='2011-01-06') SELECT *

    I see hdfs file is created.

    Am I missing something?

    Can somebody please explain?
  • Ping Zhu at Feb 1, 2011 at 4:53 pm
    create table test_table ( test string);
    load data local inpath '/root/1.csv' into table test_table;
    describe extended test_table; # "find the hdfs dir of table test_table, for
    my case the data of table test_table is saved under
    hdfs /root/hive/warehouse/test_table
    dfs -ls /root/hive/warehouse/test_table
    -rw-r--r-- 3 root supergroup 1445813 2011-02-01 11:50
    /root/hive/warehouse/test1/1.csv


    On Tue, Feb 1, 2011 at 8:41 AM, Amlan Mandal wrote:

    Thanks Ajo.
    Please confirm if my understanding is correct.
    That means when I do "LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO
    TABLE tablename" data in is local file system. If I need to run HIVE queries
    (which in turn would be converted to Map Reduce jobs) I need to pull the
    data some other table for which data is in HDFS by means of

    INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of)

    So those LOCAL tables are kind of temporary.

    Amlan

    On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:

    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    Hi All,
    I am a hive newbie.

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?

    I used above statement to put data into a hive table.
    But I could not see any hdfs file in my hive.metastore.warehouse.dir
    (which comes from hive-default.xml /user/hive/warehouse/ ).
    Although when I did

    FROM a INSERT OVERWRITE TABLE
    b PARTITION(dt='2011-01-06') SELECT *

    I see hdfs file is created.

    Am I missing something?

    Can somebody please explain?
  • Ajo Fod at Feb 1, 2011 at 5:08 pm
    HDFS gives you the ability to distribute disk access for jobs across
    computers.

    You don't need to have the file in HDFS to run a hive job.

    Local tables are like hive tables in all other senses except that they are
    on the local disk rather than HDFS. The only other difference I know of is
    that when you call "drop table" on a local table, only the metadata on the
    table gets deleted. For tables on HDFS, the table data gets deleted with the
    metadata.

    -Ajo.
    On Tue, Feb 1, 2011 at 8:41 AM, Amlan Mandal wrote:

    Thanks Ajo.
    Please confirm if my understanding is correct.
    That means when I do "LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO
    TABLE tablename" data in is local file system. If I need to run HIVE queries
    (which in turn would be converted to Map Reduce jobs) I need to pull the
    data some other table for which data is in HDFS by means of

    INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of)

    So those LOCAL tables are kind of temporary.

    Amlan

    On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:

    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    Hi All,
    I am a hive newbie.

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?

    I used above statement to put data into a hive table.
    But I could not see any hdfs file in my hive.metastore.warehouse.dir
    (which comes from hive-default.xml /user/hive/warehouse/ ).
    Although when I did

    FROM a INSERT OVERWRITE TABLE
    b PARTITION(dt='2011-01-06') SELECT *

    I see hdfs file is created.

    Am I missing something?

    Can somebody please explain?
  • Thiruvel Thirumoolan at Feb 1, 2011 at 8:49 pm
    Local tables are like hive tables in all other senses except that they are on the local disk rather than HDFS. The only other difference I know of is that when you call "drop table" on a local table, only the metadata on the table gets deleted. For tables on HDFS, the table data gets deleted with the metadata.


    Ajo,

    Guess there is a confusion here. No concept of Local tables in Hive AFAIK. The behavior you mention is for EXTERNAL tables. And the data for external tables can be on local file system or HDFS, depending on configuration. The other tables are addressed as MANAGED tables for which Hive creates a directory under warehouse dir.

    -Ajo.

    On Tue, Feb 1, 2011 at 8:41 AM, Amlan Mandal wrote:
    Thanks Ajo.
    Please confirm if my understanding is correct.
    That means when I do "LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename" data in is local file system. If I need to run HIVE queries (which in turn would be converted to Map Reduce jobs) I need to pull the data some other table for which data is in HDFS by means of

    INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of)

    So those LOCAL tables are kind of temporary.

    See - http://wiki.apache.org/hadoop/Hive/LanguageManual/DML That should clarify load local.


    Amlan

    On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:

    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?
    Yes. Hive creates a file for it on HDFS.

    As Ping Zhu mentioned, do a 'describe formatted <tablename>' or 'describe extended <tablename>' after loading data. Check that location on HDFS.

    You can also check the logs (they are usually at /tmp/<username>/hive.log). You can see the local file getting copied to HDFS scratch directory and then being moved to a directory under warehouse. If you find anything strange, can u please post them here?
  • Amlan Mandal at Feb 2, 2011 at 1:21 pm
    Sent from Amlan's iPhone
    On 02-Feb-2011, at 2:17 AM, Thiruvel Thirumoolan wrote:


    Local tables are like hive tables in all other senses except that they are on the local disk rather than HDFS. The only other difference I know of is that when you call "drop table" on a local table, only the metadata on the table gets deleted. For tables on HDFS, the table data gets deleted with the metadata.
    Ajo,

    Guess there is a confusion here. No concept of Local tables in Hive AFAIK. The behavior you mention is for EXTERNAL tables.
    Can you please let me know the configuration name to configure that?
    And the data for external tables can be on local file system or HDFS, depending on configuration. The other tables are addressed as MANAGED tables for which Hive creates a directory under warehouse dir.
    -Ajo.

    On Tue, Feb 1, 2011 at 8:41 AM, Amlan Mandal wrote:
    Thanks Ajo.
    Please confirm if my understanding is correct.
    That means when I do "LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename" data in is local file system. If I need to run HIVE queries (which in turn would be converted to Map Reduce jobs) I need to pull the data some other table for which data is in HDFS by means of

    INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of)

    So those LOCAL tables are kind of temporary.
    See - http://wiki.apache.org/hadoop/Hive/LanguageManual/DML That should clarify load local.
    Amlan

    On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:

    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?
    Yes. Hive creates a file for it on HDFS.

    As Ping Zhu mentioned, do a 'describe formatted <tablename>' or 'describe extended <tablename>' after loading data. Check that location on HDFS.

    You can also check the logs (they are usually at /tmp/<username>/hive.log). You can see the local file getting copied to HDFS scratch directory and then being moved to a directory under warehouse. If you find anything strange, can u please post them here?
  • Edward Capriolo at Feb 2, 2011 at 5:29 pm

    On Wed, Feb 2, 2011 at 8:20 AM, Amlan Mandal wrote:

    Sent from Amlan's iPhone
    On 02-Feb-2011, at 2:17 AM, Thiruvel Thirumoolan wrote:



    Local tables are like hive tables in all other senses except that they are
    on the local disk rather than HDFS. The only other difference I know of is
    that when you call "drop table" on a local table, only the metadata on the
    table gets deleted. For tables on HDFS, the table data gets deleted with the
    metadata.


    Ajo,
    Guess there is a confusion here. No concept of Local tables in Hive AFAIK.
    The behavior you mention is for EXTERNAL tables.

    Can you please let me know the configuration name to configure that?

    And the data for external tables can be on local file system or HDFS,
    depending on configuration. The other tables are addressed as MANAGED tables
    for which Hive creates a directory under warehouse dir.

    -Ajo.
    On Tue, Feb 1, 2011 at 8:41 AM, Amlan Mandal wrote:

    Thanks Ajo.
    Please confirm if my understanding is correct.
    That means when I do "LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO
    TABLE tablename" data in is local file system. If I need to run HIVE queries
    (which in turn would be converted to Map Reduce jobs) I need to pull the
    data some other table for which data is in HDFS by means of

    INSERT OVERWRITE TABLE tablename_new SELECT *  FROM tablename ... (kind
    of)

    So those LOCAL tables are kind of temporary.
    See - http://wiki.apache.org/hadoop/Hive/LanguageManual/DML That should
    clarify load local.
    Amlan

    On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:

    Look up for local :
    http://wiki.apache.org/hadoop/Hive/GettingStarted

    -Ajo.
    On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:

    LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

    When I use LOCAL keyword does hive create a hdfs file for it?
    Yes. Hive creates a file for it on HDFS.
    As Ping Zhu mentioned, do a 'describe formatted <tablename>' or 'describe
    extended <tablename>' after loading data.  Check that location on HDFS.
    You can also check the logs (they are usually at /tmp/<username>/hive.log).
    You can see the local file getting copied to HDFS scratch directory and then
    being moved to a directory under warehouse. If you find anything strange,
    can u please post them here?
    Ow boy.
    Hive has a directory in HDFS called the warehouse directory default
    /user/hive/warehouse
    When you run 'create table atable' it is managed by a directory
    /user/hive/warehouse/atable
    If you load afile.txt into that table it goes here
    /user/hive/warehouse/atable/afile

    The differences (to name a few)
    1) is EXTERNAL tables are NOT inside /user/hive/warehouse. They are
    anywhere because external tables allow you
    to specify LOCATION /user/edward/bla for the table, (and for
    partitions that may be inside the table)
    2) If DROP an EXTERNAL table no data is delete from HDFS.

    Your confusion might stem from the fact that tables are either normal
    'create table X' or external 'create external table X' but hive has no
    'internal' keyword.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 1, '11 at 11:15a
activeFeb 2, '11 at 5:29p
posts8
users5
websitehive.apache.org

People

Translate

site design / logo © 2023 Grokbase