Grokbase Groups Hive user July 2011
FAQ
Hi,

I am doing some simple tests to create table, load data using Hive. I
am working on the VM provided by cloudera
(https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).

I have a text file with each line containing an IP address and a name, e.g.,

123.45.67.89 tom
123.45.67.92 mark

I create a table using following command:

CREATE TABLE ip_name(
ip STRING,
name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
"input.regex" = "^([\d.]+) ([a-z]+)",
"output.format.string" = "%1$s %2$s"
)
STORED AS TEXTFILE;

Then, I use the following command to load data into the table:

LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE ip_name;

Table was successfully created and file was also loaded, but all are
NULL (the number of rows in the table is the same as the number of
rows in the file). What could be the problem?

Thanks a lot!

William

Search Discussions

  • Yichuan (William) Hu at Jul 1, 2011 at 6:32 pm
    I tried \\d instead of \d, problem solved. Thanks, Neerja!
    On Fri, Jul 1, 2011 at 2:28 PM, Neerja Bhatnagar wrote:
    Hi,
    When I was having this issue, it was due to 2 problems - 1. the slashes in
    the regex need to be escaped Java-style so \d would be \\d and my regex's
    were wrong.
    Hope this helps!
    Cheers, Neerja
    On Fri, Jul 1, 2011 at 11:16 AM, Yichuan (William) Hu wrote:

    Hi,

    I am doing some simple tests to create table, load data using Hive. I
    am working on the VM provided by cloudera
    (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).

    I have a text file with each line containing an IP address and a name,
    e.g.,

    123.45.67.89 tom
    123.45.67.92 mark

    I create a table using following command:

    CREATE TABLE ip_name(
    ip STRING,
    name STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES(
    "input.regex" = "^([\d.]+) ([a-z]+)",
    "output.format.string" = "%1$s %2$s"
    )
    STORED AS TEXTFILE;

    Then, I use the following command to load data into the table:

    LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
    ip_name;

    Table was successfully created and file was also loaded, but all are
    NULL (the number of rows in the table is the same as the number of
    rows in the file). What could be the problem?

    Thanks a lot!

    William


    --
    Thank you! Neerja
  • Edward Capriolo at Jul 1, 2011 at 6:35 pm

    On Fri, Jul 1, 2011 at 2:16 PM, Yichuan (William) Hu wrote:

    Hi,

    I am doing some simple tests to create table, load data using Hive. I
    am working on the VM provided by cloudera
    (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).

    I have a text file with each line containing an IP address and a name,
    e.g.,

    123.45.67.89 tom
    123.45.67.92 mark

    I create a table using following command:

    CREATE TABLE ip_name(
    ip STRING,
    name STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES(
    "input.regex" = "^([\d.]+) ([a-z]+)",
    "output.format.string" = "%1$s %2$s"
    )
    STORED AS TEXTFILE;

    Then, I use the following command to load data into the table:

    LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
    ip_name;

    Table was successfully created and file was also loaded, but all are
    NULL (the number of rows in the table is the same as the number of
    rows in the file). What could be the problem?

    Thanks a lot!

    William
    You do not need the regex serde for this. Specify the table normally and use
    space as the delimiter.

    CREATE EXTERNAL TABLE logdata(
    xxx STRING,
    yyy STRING,
    ...
    z_t)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\040'
    STORED AS TEXTFILE;

    http://www.mail-archive.com/common-user@hadoop.apache.org/msg11178.html
  • Yichuan (William) Hu at Jul 1, 2011 at 6:38 pm
    Thanks, Edward, I am just using this simple example to test regex
    serde. Ultimately I would like to use regex to parse various types of
    log files.
    On Fri, Jul 1, 2011 at 2:34 PM, Edward Capriolo wrote:

    On Fri, Jul 1, 2011 at 2:16 PM, Yichuan (William) Hu wrote:

    Hi,

    I am doing some simple tests to create table, load data using Hive. I
    am working on the VM provided by cloudera
    (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).

    I have a text file with each line containing an IP address and a name,
    e.g.,

    123.45.67.89 tom
    123.45.67.92 mark

    I create a table using following command:

    CREATE TABLE ip_name(
    ip STRING,
    name STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES(
    "input.regex" = "^([\d.]+) ([a-z]+)",
    "output.format.string" = "%1$s %2$s"
    )
    STORED AS TEXTFILE;

    Then, I use the following command to load data into the table:

    LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
    ip_name;

    Table was successfully created and file was also loaded, but all are
    NULL (the number of rows in the table is the same as the number of
    rows in the file). What could be the problem?

    Thanks a lot!

    William
    You do not need the regex serde for this. Specify the table normally and use
    space as the delimiter.

    CREATE EXTERNAL TABLE logdata(
    xxx STRING,
    yyy STRING,
    ...

    z_t)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\040'
    STORED AS TEXTFILE;

    http://www.mail-archive.com/common-user@hadoop.apache.org/msg11178.html

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 1, '11 at 6:17p
activeJul 1, '11 at 6:38p
posts4
users2
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase