Hello all -
I am currently trying to integrate the numpy Python fast-arrays package with
Hadoop. I am basically looking for a way to read binary data similar to a
SequenceFile, except without keys. That is, similar to how the
TextInputFormat emits the position in the file as the key, I would like a
SequenceFileInputFormat that simply emits the position or index in the file
and the value.
Does such a facility exist? If not, how would I go about implementing this?