Grokbase Groups Hive user July 2011
FAQ
I'm new to hive and I'm having an issue loading a simple set of data via
regex.

I have a data file called test.txt that contains the following:

TESTONE-1
TESTTWO-2
TESTTHREE-3
TESTFOUR-4
TESTFIVE-5

I have this hive script:

hive> CREATE TABLE test
(
field_1 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "([^ ]*)",
"output.regex" = "%1$s"
)
STORED AS TEXTFILE;
Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe
OK
Time taken: 0.064 seconds

hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
Copying data from file:/home/hadoop/test
Loading data to table test
OK
Time taken: 0.213 seconds

hive> SELECT * FROM test LIMIT 10;
OK
TESTONE-1
TESTTWO-2
TESTTHREE-3
TESTFOUR-4
TESTFIVE-5
Time taken: 0.153 seconds

Which produces the expected output.

When I alter the hive script to include two fields, I get all null values:

hive> CREATE TABLE test
(
field_1 STRING,
field_2 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "([a-z,A-Z]*)(-\d*)",
"output.regex" = "%1$s %2$s"
)
STORED AS TEXTFILE;
Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe
OK
Time taken: 0.025 seconds

hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
Copying data from file:/home/hadoop/test
Loading data to table test
OK
Time taken: 0.187 seconds

hive> SELECT * FROM test LIMIT 10;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.162 seconds

I've checked the regular expression against http://regexpal.com/ and it
seems to check out. I think there may be an issue with SerDe, but I
don't know how to go about trouble shooting it.

I'm running this on Amazon's Elastic MapReduce

Any help is appreciated.

-Sal

Search Discussions

  • Yichuan Hu at Jul 1, 2011 at 11:25 pm
    Use \\d instead of \d.
    On Jul 1, 2011, at 6:52 PM, Sal Scalisi wrote:

    I'm new to hive and I'm having an issue loading a simple set of data via regex.

    I have a data file called test.txt that contains the following:

    TESTONE-1
    TESTTWO-2
    TESTTHREE-3
    TESTFOUR-4
    TESTFIVE-5

    I have this hive script:

    hive> CREATE TABLE test
    (
    field_1 STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES
    (
    "input.regex" = "([^ ]*)",
    "output.regex" = "%1$s"
    )
    STORED AS TEXTFILE;
    Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe
    OK
    Time taken: 0.064 seconds

    hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
    Copying data from file:/home/hadoop/test
    Loading data to table test
    OK
    Time taken: 0.213 seconds

    hive> SELECT * FROM test LIMIT 10;
    OK
    TESTONE-1
    TESTTWO-2
    TESTTHREE-3
    TESTFOUR-4
    TESTFIVE-5
    Time taken: 0.153 seconds

    Which produces the expected output.

    When I alter the hive script to include two fields, I get all null values:

    hive> CREATE TABLE test
    (
    field_1 STRING,
    field_2 STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES
    (
    "input.regex" = "([a-z,A-Z]*)(-\d*)",
    "output.regex" = "%1$s %2$s"
    )
    STORED AS TEXTFILE;
    Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe
    OK
    Time taken: 0.025 seconds

    hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
    Copying data from file:/home/hadoop/test
    Loading data to table test
    OK
    Time taken: 0.187 seconds

    hive> SELECT * FROM test LIMIT 10;
    OK
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    Time taken: 0.162 seconds

    I've checked the regular expression against http://regexpal.com/ and it seems to check out. I think there may be an issue with SerDe, but I don't know how to go about trouble shooting it.

    I'm running this on Amazon's Elastic MapReduce

    Any help is appreciated.

    -Sal

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 1, '11 at 10:52p
activeJul 1, '11 at 11:25p
posts2
users2
websitehive.apache.org

2 users in discussion

Sal Scalisi: 1 post Yichuan Hu: 1 post

People

Translate

site design / logo © 2023 Grokbase