Hey folks,

Not sure if this has been discussed already or if this is due to some
limitation in pig, hadoop, or java - but is there a particular reason
the PiggyBank SequenceFileLoader doesn't support the BytesWritable type
for sequence file keys/values?


Looking at the code, it maps the pig-specific DataByteArray class to the
pig type "bytearray" - I don't understand this choice. Why use a
pig-specific class here (which is not very friendly for a mixed
pig/non-pig hadoop ecosystem)?

In fact, if you look at the SequenceFileLoader code you will see
something that looks very strange:

protected Object translateWritableToPigDataType(*Writable w*, byte
dataType) {
switch(dataType) {
case DataType.CHARARRAY: return ((Text) w).toString();
* case DataType.BYTEARRAY: return((DataByteArray) w).get();*
case DataType.INTEGER: return ((IntWritable) w).get();
case DataType.LONG: return ((LongWritable) w).get();
case DataType.FLOAT: return ((FloatWritable) w).get();
case DataType.DOUBLE: return ((DoubleWritable) w).get();
case DataType.BYTE: return ((ByteWritable) w).get();

return null;

This code smells - the method takes a Writeable - which makes sense, but
then for the BYTEARRAY type it's casting it to a DataByteArray, which
doesn't implement Writable! WTF, mate?

I'm going to try my hand at switching this to use BytesWritable instead
and see what explodes.


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 4 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 27, '10 at 8:30p
activeSep 28, '10 at 12:11a



site design / logo © 2021 Grokbase