FAQ
Hi,
I have input files, that contain NO carriage returns/line feeds. Each
record is a fixed length (i.e. 202 bytes).

Which FileInputFormat should I be using? so that each call to my
Mapper receives one K,V pair, where the KEY is null or something (I
don't care) and the VALUE is the 202 byte record?

thanks!

Search Discussions

  • Aaron Kimball at Oct 21, 2009 at 5:01 am
    You'll need to write your own, I'm afraid. You should subclass
    FileInputFormat and go from there. You may want to look at TextInputFormat /
    LineRecordReader for an example of how an IF/RR gets put together, but there
    isn't an existing fixed-len record reader.

    - Aaron
    On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 wrote:

    Hi,
    I have input files, that contain NO carriage returns/line feeds. Each
    record is a fixed length (i.e. 202 bytes).

    Which FileInputFormat should I be using? so that each call to my Mapper
    receives one K,V pair, where the KEY is null or something (I don't care) and
    the VALUE is the 202 byte record?

    thanks!
  • Yz5od2 at Oct 28, 2009 at 6:16 pm
    Hi all,
    I am working on writing a FixedLengthInputFormat class and a
    corresponding FixedLengthRecordReader.

    Would the Hadoop commons project have interest in these? Basically
    these are for reading inputs of textual record data, where each record
    is a fixed length, (no carriage returns or separators etc)

    thanks

    On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:

    You'll need to write your own, I'm afraid. You should subclass
    FileInputFormat and go from there. You may want to look at
    TextInputFormat /
    LineRecordReader for an example of how an IF/RR gets put together,
    but there
    isn't an existing fixed-len record reader.

    - Aaron

    On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-
    outdoors@yahoo.com>wrote:
    Hi,
    I have input files, that contain NO carriage returns/line feeds. Each
    record is a fixed length (i.e. 202 bytes).

    Which FileInputFormat should I be using? so that each call to my
    Mapper
    receives one K,V pair, where the KEY is null or something (I don't
    care) and
    the VALUE is the 202 byte record?

    thanks!
  • Aaron Kimball at Oct 28, 2009 at 7:59 pm
    I think these would be good to add to mapreduce in the
    {{org.apache.hadoop.mapreduce.lib.input}} package. Please file a JIRA and
    apply a patch!
    - Aaron
    On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 wrote:

    Hi all,
    I am working on writing a FixedLengthInputFormat class and a corresponding
    FixedLengthRecordReader.

    Would the Hadoop commons project have interest in these? Basically these
    are for reading inputs of textual record data, where each record is a fixed
    length, (no carriage returns or separators etc)

    thanks



    On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:

    You'll need to write your own, I'm afraid. You should subclass
    FileInputFormat and go from there. You may want to look at TextInputFormat
    /
    LineRecordReader for an example of how an IF/RR gets put together, but
    there
    isn't an existing fixed-len record reader.

    - Aaron

    On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-outdoors@yahoo.com
    wrote: Hi,
    I have input files, that contain NO carriage returns/line feeds. Each
    record is a fixed length (i.e. 202 bytes).

    Which FileInputFormat should I be using? so that each call to my Mapper
    receives one K,V pair, where the KEY is null or something (I don't care)
    and
    the VALUE is the 202 byte record?

    thanks!
  • Yz5od2 at Nov 1, 2009 at 6:44 pm
    Hi all,
    I've contributed a couple of classes to support fixed length/width
    records in input files. The JIRA issue and attachments are located here:

    https://issues.apache.org/jira/browse/MAPREDUCE-1176

    thanks, and I hope this helps others out.
    On Oct 28, 2009, at 1:58 PM, Aaron Kimball wrote:

    I think these would be good to add to mapreduce in the
    {{org.apache.hadoop.mapreduce.lib.input}} package. Please file a
    JIRA and
    apply a patch!
    - Aaron

    On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 <woods5242-
    outdoors@yahoo.com>wrote:
    Hi all,
    I am working on writing a FixedLengthInputFormat class and a
    corresponding
    FixedLengthRecordReader.

    Would the Hadoop commons project have interest in these? Basically
    these
    are for reading inputs of textual record data, where each record is
    a fixed
    length, (no carriage returns or separators etc)

    thanks



    On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:

    You'll need to write your own, I'm afraid. You should subclass
    FileInputFormat and go from there. You may want to look at
    TextInputFormat
    /
    LineRecordReader for an example of how an IF/RR gets put together,
    but
    there
    isn't an existing fixed-len record reader.

    - Aaron

    On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-outdoors@yahoo.com
    wrote: Hi,
    I have input files, that contain NO carriage returns/line feeds.
    Each
    record is a fixed length (i.e. 202 bytes).

    Which FileInputFormat should I be using? so that each call to my
    Mapper
    receives one K,V pair, where the KEY is null or something (I
    don't care)
    and
    the VALUE is the 202 byte record?

    thanks!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 20, '09 at 8:53p
activeNov 1, '09 at 6:44p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Yz5od2: 3 posts Aaron Kimball: 2 posts

People

Translate

site design / logo © 2022 Grokbase