Grokbase Groups Hive user June 2010
Hi Andrew,

You can specify that your input data is stored in SequenceFile by defining an external table stored as SequenceFile.

create external table T (a int, b double) stored as SequenceFile;

assuming that the custom Writable pair are IntWritable and DoubleWritable respectively. Hive also support array, struct and map types for fields. So you may not need to write your own Deserializer.

On Jun 6, 2010, at 11:44 AM, Andrew Rothstein wrote:

Most of my Hadoop data is produced by Java MR jobs that store data as
custom Writable pairs in SequenceFiles. I'm excited to bring that
data into a Hive table so that I can start building out and
prototyping more derived analytics. Can anyone point me towards a
relevant example? Since I'm just getting started I've begun with
hive-0.5.0. Thus far I've started with the RegexSerDe example and
tried to whittle it down a bit to make it into what I want but I'm
lacking context.

Since I'm not trying to take data and write it it back into these
SequenceFiles, I only need to implement the Deserializer interface,

How do I tell Hive that the underlying data InputFormat is a
SequenceFile? What's the relationship between the Writable that
arrives as the parameter to the deserialize function and the contents
of the underlying SequenceFile?

regards, Andrew

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 6, '10 at 6:45p
activeJun 6, '10 at 8:24p

2 users in discussion

Ning Zhang: 1 post Andrew Rothstein: 1 post



site design / logo © 2022 Grokbase