Grokbase Groups Hive user June 2010
FAQ
Most of my Hadoop data is produced by Java MR jobs that store data as
custom Writable pairs in SequenceFiles. I'm excited to bring that
data into a Hive table so that I can start building out and
prototyping more derived analytics. Can anyone point me towards a
relevant example? Since I'm just getting started I've begun with
hive-0.5.0. Thus far I've started with the RegexSerDe example and
tried to whittle it down a bit to make it into what I want but I'm
lacking context.

Since I'm not trying to take data and write it it back into these
SequenceFiles, I only need to implement the Deserializer interface,
right?

How do I tell Hive that the underlying data InputFormat is a
SequenceFile? What's the relationship between the Writable that
arrives as the parameter to the deserialize function and the contents
of the underlying SequenceFile?

regards, Andrew

Search Discussions

  • Ning Zhang at Jun 6, 2010 at 8:24 pm
    Hi Andrew,

    You can specify that your input data is stored in SequenceFile by defining an external table stored as SequenceFile.

    create external table T (a int, b double) stored as SequenceFile;

    assuming that the custom Writable pair are IntWritable and DoubleWritable respectively. Hive also support array, struct and map types for fields. So you may not need to write your own Deserializer.

    Ning
    On Jun 6, 2010, at 11:44 AM, Andrew Rothstein wrote:

    Most of my Hadoop data is produced by Java MR jobs that store data as
    custom Writable pairs in SequenceFiles. I'm excited to bring that
    data into a Hive table so that I can start building out and
    prototyping more derived analytics. Can anyone point me towards a
    relevant example? Since I'm just getting started I've begun with
    hive-0.5.0. Thus far I've started with the RegexSerDe example and
    tried to whittle it down a bit to make it into what I want but I'm
    lacking context.

    Since I'm not trying to take data and write it it back into these
    SequenceFiles, I only need to implement the Deserializer interface,
    right?

    How do I tell Hive that the underlying data InputFormat is a
    SequenceFile? What's the relationship between the Writable that
    arrives as the parameter to the deserialize function and the contents
    of the underlying SequenceFile?

    regards, Andrew

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 6, '10 at 6:45p
activeJun 6, '10 at 8:24p
posts2
users2
websitehive.apache.org

2 users in discussion

Ning Zhang: 1 post Andrew Rothstein: 1 post

People

Translate

site design / logo © 2023 Grokbase