I have customized InputFormat class to read our log format in our hadoop job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying InputFormat, and
a Serde when I create a table like below:

CREATE TABLE rawlog_test (
user_id STRING,
my_timestamp STRING )
ROW FORMAT SERDE 'x.y.z.mySerDe'
STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;

Then I run:
load data inpath '/rawlog.txt' into table rawlog_test;

No error show up on screen but I found the deserialize function never got
called. An when I use select * from rawlog_test; An error was threw out:
FAILED: Error in semantic analysis: line 1:14 Input Format must implement
InputFormat rawlog_test

I search this on internet, found this might be related to Hive using old
api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
worked on Hive? Adapt my code to old api need lots of work, and even if I
get it done, maintaining two version of code sounds like a bit unnecessary,
( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
Hive at different situations. ) , are there any way that I can work around
this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 4 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 22, '10 at 4:07a
activeSep 22, '10 at 6:17a

2 users in discussion

Tianqiang Li: 3 posts Edward Capriolo: 1 post



site design / logo © 2022 Grokbase