Hi guys,
If I have a text file of 10 GB and I want to convert it to sequence file
using map-reduce and make filesplits of 1 GB each so that 10 mappers work in
parallel on it and convert it to Sequence file output.
Can I combine these 10 mapper outputs into 1 sequence file of 10 GB size in
reduce stage ? Is it possible ? If yes, would it be a very slow operation ?

-thanks
JJ

Search Discussions

  • Harsh J at Feb 26, 2011 at 5:46 am
    Unless some transformation can additionally be applied in the Mapper
    phases for your text input, this operation could be done without
    MapReduce itself; since your requirement of a single file output will
    incur unnecessary intermediate-phase costs.
    On Sat, Feb 26, 2011 at 7:22 AM, Mapred Learn wrote:
    Hi guys,
    If I have a text file of 10 GB and I want to convert it to sequence file
    using map-reduce and make filesplits of 1 GB each so that 10 mappers work in
    parallel on it and convert it to Sequence file output.
    Can I combine these 10 mapper outputs into 1 sequence file of 10 GB size in
    reduce stage ? Is it possible ? If yes, would it be a very slow operation ?

    -thanks
    JJ


    --
    Harsh J
    www.harshj.com
  • Mapred Learn at Feb 28, 2011 at 5:51 pm
    Hey Harsh,
    I was trying to use parallelism of mappers to do it quickly.

    If I don't use map-reduce, a 10 GB text file to sequence file conversion
    would be very slow. Isn't it so ?

    Also what kind of transformation in mapper phase are you referring to ?

    -JJ
    On Fri, Feb 25, 2011 at 9:45 PM, Harsh J wrote:

    Unless some transformation can additionally be applied in the Mapper
    phases for your text input, this operation could be done without
    MapReduce itself; since your requirement of a single file output will
    incur unnecessary intermediate-phase costs.
    On Sat, Feb 26, 2011 at 7:22 AM, Mapred Learn wrote:
    Hi guys,
    If I have a text file of 10 GB and I want to convert it to sequence file
    using map-reduce and make filesplits of 1 GB each so that 10 mappers work in
    parallel on it and convert it to Sequence file output.
    Can I combine these 10 mapper outputs into 1 sequence file of 10 GB size in
    reduce stage ? Is it possible ? If yes, would it be a very slow operation ?
    -thanks
    JJ


    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedFeb 26, '11 at 1:52a
activeFeb 28, '11 at 5:51p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Mapred Learn: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase