FAQ
Hello all -

I am currently trying to integrate the numpy Python fast-arrays package with
Hadoop. I am basically looking for a way to read binary data similar to a
SequenceFile, except without keys. That is, similar to how the
TextInputFormat emits the position in the file as the key, I would like a
SequenceFileInputFormat that simply emits the position or index in the file
and the value.

Does such a facility exist? If not, how would I go about implementing this?

Thanks,

Pete

Search Discussions

  • William Kinney at May 3, 2010 at 5:19 pm
    Not sure if anything else exists, but you can easily implement your own
    RecordReader that gets a FSDataInputStream from the FileSystem for the
    FileSplit, and then read records from that like you would any other
    InputStream (with offset, length, byte[], etc).

    On Thu, Apr 29, 2010 at 5:36 AM, Pete Hunt wrote:

    Hello all -

    I am currently trying to integrate the numpy Python fast-arrays package
    with
    Hadoop. I am basically looking for a way to read binary data similar to a
    SequenceFile, except without keys. That is, similar to how the
    TextInputFormat emits the position in the file as the key, I would like a
    SequenceFileInputFormat that simply emits the position or index in the file
    and the value.

    Does such a facility exist? If not, how would I go about implementing this?

    Thanks,

    Pete

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 29, '10 at 9:37a
activeMay 3, '10 at 5:19p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Pete Hunt: 1 post William Kinney: 1 post

People

Translate

site design / logo © 2022 Grokbase