regex.
I have a data file called test.txt that contains the following:
TESTONE-1
TESTTWO-2
TESTTHREE-3
TESTFOUR-4
TESTFIVE-5
I have this hive script:
hive> CREATE TABLE test
(
field_1 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "([^ ]*)",
"output.regex" = "%1$s"
)
STORED AS TEXTFILE;
Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDefield_1 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "([^ ]*)",
"output.regex" = "%1$s"
)
STORED AS TEXTFILE;
OK
Time taken: 0.064 seconds
hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
Copying data from file:/home/hadoop/test
Loading data to table test
OK
Time taken: 0.213 seconds
hive> SELECT * FROM test LIMIT 10;
OK
TESTONE-1
TESTTWO-2
TESTTHREE-3
TESTFOUR-4
TESTFIVE-5
Time taken: 0.153 seconds
Which produces the expected output.
When I alter the hive script to include two fields, I get all null values:
hive> CREATE TABLE test
(
field_1 STRING,
field_2 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "([a-z,A-Z]*)(-\d*)",
"output.regex" = "%1$s %2$s"
)
STORED AS TEXTFILE;
Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDefield_1 STRING,
field_2 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "([a-z,A-Z]*)(-\d*)",
"output.regex" = "%1$s %2$s"
)
STORED AS TEXTFILE;
OK
Time taken: 0.025 seconds
hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
Copying data from file:/home/hadoop/test
Loading data to table test
OK
Time taken: 0.187 seconds
hive> SELECT * FROM test LIMIT 10;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.162 seconds
I've checked the regular expression against http://regexpal.com/ and it
seems to check out. I think there may be an issue with SerDe, but I
don't know how to go about trouble shooting it.
I'm running this on Amazon's Elastic MapReduce
Any help is appreciated.
-Sal