Grokbase Groups Pig user May 2011
FAQ
I am new to hadoop and from what I understand by default hadoop splits
the input into blocks. Now this might result in splitting a line of
record into 2 pieces and getting spread accross 2 maps. For eg: Line
"abcd" might get split into "ab" and "cd". How can one prevent this in
hadoop and pig? I am looking for some examples where I can see how I
can specify my own split so that it logically splits based on the
record delimiter and not the block size. For some reason I am not able
to get right examples online.

Search Discussions

  • Alex Rovner at May 27, 2011 at 7:19 pm
    Hadoop figures out the start and end by knowing the record delimiters. You don't have to do that manually.

    Sent from my iPhone
    On May 27, 2011, at 12:55 PM, Mohit Anchlia wrote:

    I am new to hadoop and from what I understand by default hadoop splits
    the input into blocks. Now this might result in splitting a line of
    record into 2 pieces and getting spread accross 2 maps. For eg: Line
    "abcd" might get split into "ab" and "cd". How can one prevent this in
    hadoop and pig? I am looking for some examples where I can see how I
    can specify my own split so that it logically splits based on the
    record delimiter and not the block size. For some reason I am not able
    to get right examples online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 27, '11 at 4:59p
activeMay 27, '11 at 7:19p
posts2
users2
websitepig.apache.org

2 users in discussion

Mohit Anchlia: 1 post Alex Rovner: 1 post

People

Translate

site design / logo © 2021 Grokbase