Most of my Hadoop data is produced by Java MR jobs that store data as
custom Writable pairs in SequenceFiles. I'm excited to bring that
data into a Hive table so that I can start building out and
prototyping more derived analytics. Can anyone point me towards a
relevant example? Since I'm just getting started I've begun with
hive-0.5.0. Thus far I've started with the RegexSerDe example and
tried to whittle it down a bit to make it into what I want but I'm
lacking context.
Since I'm not trying to take data and write it it back into these
SequenceFiles, I only need to implement the Deserializer interface,
right?
How do I tell Hive that the underlying data InputFormat is a
SequenceFile? What's the relationship between the Writable that
arrives as the parameter to the deserialize function and the contents
of the underlying SequenceFile?
regards, Andrew