FAQ
Hi,
I have customized InputFormat class to read our log format in our hadoop job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying InputFormat, and
a Serde when I create a table like below:

CREATE TABLE rawlog_test (
user_id STRING,
tag STRING,
my_timestamp STRING )
ROW FORMAT SERDE 'x.y.z.mySerDe'
STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;

Then I run:
load data inpath '/rawlog.txt' into table rawlog_test;

No error show up on screen but I found the deserialize function never got
called. An when I use select * from rawlog_test; An error was threw out:
---
FAILED: Error in semantic analysis: line 1:14 Input Format must implement
InputFormat rawlog_test
--

I search this on internet, found this might be related to Hive using old
api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
worked on Hive? Adapt my code to old api need lots of work, and even if I
get it done, maintaining two version of code sounds like a bit unnecessary,
( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
Hive at different situations. ) , are there any way that I can work around
this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.

Regards,
Peter

Search Discussions

  • Tianqiang Li at Sep 22, 2010 at 4:09 am
    Hi,
    I have customized InputFormat class to read our log format in our hadoop job
    and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
    this inputformat to load data into Hive table by specifying InputFormat, and
    a Serde when I create a table like below:

    CREATE TABLE rawlog_test (
    user_id STRING,
    tag STRING,
    my_timestamp STRING )
    ROW FORMAT SERDE 'x.y.z.mySerDe'
    STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;

    Then I run:
    load data inpath '/rawlog.txt' into table rawlog_test;

    No error show up on screen but I found the deserialize function never got
    called. An when I use select * from rawlog_test; An error was threw out:

    FAILED: Error in semantic analysis: line 1:14 Input Format must implement
    InputFormat rawlog_test

    I search this on internet, found this might be related to Hive using old
    api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
    worked on Hive? Adapt my code to old api need lots of work, and even if I
    get it done, maintaining two version of code sounds like a bit unnecessary,
    ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
    Hive at different situations. ) , are there any way that I can work around
    this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.

    Regards,
    Peter
  • Edward Capriolo at Sep 22, 2010 at 4:17 am

    On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang Li wrote:
    Hi,
    I have customized InputFormat class to read our log format in our hadoop job
    and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
    this inputformat to load data into Hive table by specifying InputFormat, and
    a Serde when I create a table like below:

    CREATE TABLE rawlog_test (
    user_id  STRING,
    tag  STRING,
    my_timestamp  STRING )
    ROW FORMAT SERDE 'x.y.z.mySerDe'
    STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;

    Then I run:
    load data inpath '/rawlog.txt' into table rawlog_test;

    No error show up on screen but I found the deserialize function never got
    called. An when I use select * from rawlog_test; An error was threw out:

    FAILED: Error in semantic analysis: line 1:14 Input Format must implement
    InputFormat rawlog_test

    I search this on internet, found this might be related to Hive using old
    api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
    worked on Hive? Adapt my code to old api need lots of work, and even if I
    get it done, maintaining two version of code sounds like a bit unnecessary,
    ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
    Hive at different situations. ) , are there any way that I can work around
    this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.

    Regards,
    Peter
    You can make a 20 InputFormat work with hive but its real PITA. The
    hbase and cassandra handler both do it.Essentially you have to Extend
    the new mapreduce input format and then implement methods in the old
    one, use final variables and chained method calls. Example here:
    https://issues.apache.org/jira/secure/attachment/12452140/hive-1434-4-patch.txt
    Essentially it if your input format is simple enough it is likely
    easier to write two separate classes for both old api and new. Use the
    mapred.* InputFormat with hive.
  • Tianqiang Li at Sep 22, 2010 at 6:17 am
    Hi, Edward,
    Thanks for your hints, let me start with the old api first.
    Just curious, does hive have the plan to support 20 api?

    -Peter
    On Tue, Sep 21, 2010 at 9:17 PM, Edward Capriolo wrote:
    On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang Li wrote:
    Hi,
    I have customized InputFormat class to read our log format in our hadoop job
    and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
    this inputformat to load data into Hive table by specifying InputFormat, and
    a Serde when I create a table like below:

    CREATE TABLE rawlog_test (
    user_id STRING,
    tag STRING,
    my_timestamp STRING )
    ROW FORMAT SERDE 'x.y.z.mySerDe'
    STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;

    Then I run:
    load data inpath '/rawlog.txt' into table rawlog_test;

    No error show up on screen but I found the deserialize function never got
    called. An when I use select * from rawlog_test; An error was threw out:

    FAILED: Error in semantic analysis: line 1:14 Input Format must implement
    InputFormat rawlog_test

    I search this on internet, found this might be related to Hive using old
    api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
    worked on Hive? Adapt my code to old api need lots of work, and even if I
    get it done, maintaining two version of code sounds like a bit
    unnecessary,
    ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
    Hive at different situations. ) , are there any way that I can work around
    this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.

    Regards,
    Peter
    You can make a 20 InputFormat work with hive but its real PITA. The
    hbase and cassandra handler both do it.Essentially you have to Extend
    the new mapreduce input format and then implement methods in the old
    one, use final variables and chained method calls. Example here:

    https://issues.apache.org/jira/secure/attachment/12452140/hive-1434-4-patch.txt
    Essentially it if your input format is simple enough it is likely
    easier to write two separate classes for both old api and new. Use the
    mapred.* InputFormat with hive.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 22, '10 at 4:07a
activeSep 22, '10 at 6:17a
posts4
users2
websitehive.apache.org

2 users in discussion

Tianqiang Li: 3 posts Edward Capriolo: 1 post

People

Translate

site design / logo © 2021 Grokbase