FAQ
I'm using hadoop streaming and currently have these properties in my
command line:
-Dstream.map.output.field.separator=' ' \
-Dstream.num.map.output.key.fields=1 \

This works for me as my test data happens to have a space at column 14.
If I want to use a fixed length split, is there a simple cut function I
could use like undefining the separator and counting 13 bytes?
-Dstream.map.output.field.separator= \
-Dstream.num.map.output.key.fields=13 \

I have searched the forum for discussions on fixed length or splitting
keys but have not found my answer. Perhaps this is not possible, at
least on the command line?

Thanks,
Kevin

Search Discussions

  • Harsh Chouraria at Apr 5, 2011 at 5:38 am
    Hello Kevin,
    On Fri, Mar 25, 2011 at 12:52 AM, wrote:
    -Dstream.map.output.field.separator= \
    -Dstream.num.map.output.key.fields=13 \

    I have searched the forum for discussions on fixed length or splitting
    keys but have not found my answer. Perhaps this is not possible, at
    least on the command line?
    I'm not aware of any streaming provided functionality that gives you
    this support. Your mapper code will have to achieve this on its own
    before emitting, I think (Or your InputFormat can do it at read time,
    perhaps).

    --
    Harsh J
    Support Engineer, Cloudera
  • Elton sky at Apr 5, 2011 at 7:21 am
    Agree with Harsh,

    I think you need to write your own RecordRead.
    On Tue, Apr 5, 2011 at 3:37 PM, Harsh Chouraria wrote:

    Hello Kevin,
    On Fri, Mar 25, 2011 at 12:52 AM, wrote:
    -Dstream.map.output.field.separator= \
    -Dstream.num.map.output.key.fields=13 \

    I have searched the forum for discussions on fixed length or splitting
    keys but have not found my answer. Perhaps this is not possible, at
    least on the command line?
    I'm not aware of any streaming provided functionality that gives you
    this support. Your mapper code will have to achieve this on its own
    before emitting, I think (Or your InputFormat can do it at read time,
    perhaps).

    --
    Harsh J
    Support Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 24, '11 at 7:23p
activeApr 5, '11 at 7:21a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase