FAQ
Oh, gosh, well that makes me uneasy, since I was intending to really use
this, in production.

Is there something in particular about this class that makes it not
intended for real-world use? Performance? The way it's written (i.e.
still depends on old APIs, etc.)?

Is there a loader you suggest I look at using instead that has been more
battle-tested?

-Zach

Dmitriy Ryaboy wrote:
Zach,
Perhaps I should've documented that better.
That class is *not intended for real use*. As far as I know, it's never been
used by anyone for anything in production.
It's a demo of how one would go about writing a real SequenceFileLoader for
whatever internal stuff you are using. Feel free to replace anything that
makes sense for you in your implementation.

-D

On Mon, Sep 27, 2010 at 1:23 PM, Zach Baileywrote:
Hey folks,

Not sure if this has been discussed already or if this is due to some
limitation in pig, hadoop, or java - but is there a particular reason the
PiggyBank SequenceFileLoader doesn't support the BytesWritable type for
sequence file keys/values?


http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/BytesWritable.html

Looking at the code, it maps the pig-specific DataByteArray class to the
pig type "bytearray" - I don't understand this choice. Why use a
pig-specific class here (which is not very friendly for a mixed pig/non-pig
hadoop ecosystem)?

In fact, if you look at the SequenceFileLoader code you will see something
that looks very strange:

protected Object translateWritableToPigDataType(*Writable w*, byte
dataType) {
switch(dataType) {
case DataType.CHARARRAY: return ((Text) w).toString();
* case DataType.BYTEARRAY: return((DataByteArray) w).get();*
case DataType.INTEGER: return ((IntWritable) w).get();
case DataType.LONG: return ((LongWritable) w).get();
case DataType.FLOAT: return ((FloatWritable) w).get();
case DataType.DOUBLE: return ((DoubleWritable) w).get();
case DataType.BYTE: return ((ByteWritable) w).get();
}

return null;
}

This code smells - the method takes a Writeable - which makes sense, but
then for the BYTEARRAY type it's casting it to a DataByteArray, which
doesn't implement Writable! WTF, mate?

I'm going to try my hand at switching this to use BytesWritable instead and
see what explodes.

Cheers,
-Zach

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 4 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 27, '10 at 8:30p
activeSep 28, '10 at 12:11a
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase