FAQ
I am using CombineFileInputFormat and CombineFileSplit to group small input files as fed to the mappers. The job runs properly and the output is correct, but I get only one mapper task, so I lose all my paralleization in the map stage.

I realize I'm not providing much detail yet because I'm not sure what to say. Feel free to ask questions for clarification.

What might cause this problem and how might I diagnose -- must less fix -- it?

Thank you.

________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com www.keithwiley.com

"And what if we picked the wrong religion? Every week, we're just making God
madder and madder!"
-- Homer Simpson
________________________________________________________________________________

Search Discussions

  • Aleksandar Stupar at Apr 30, 2010 at 7:56 am
    Hi,

    if the mapred.max.split.size is not set (and it's not by default) than CombineFileInputFormat
    only takes racks in account when grouping blocks. So if you set this property it will take also
    block placement on machines into account and you should get multiple mappers.

    Hope this helps,
    Aleksandar Stupar.




    ________________________________
    From: Keith Wiley <kwiley@keithwiley.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, April 29, 2010 11:23:35 PM
    Subject: CombineFileInputFormat not producing multiple mappers

    I am using CombineFileInputFormat and CombineFileSplit to group small input files as fed to the mappers. The job runs properly and the output is correct, but I get only one mapper task, so I lose all my paralleization in the map stage.

    I realize I'm not providing much detail yet because I'm not sure what to say. Feel free to ask questions for clarification.

    What might cause this problem and how might I diagnose -- must less fix -- it?

    Thank you.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com www.keithwiley.com

    "And what if we picked the wrong religion? Every week, we're just making God
    madder and madder!"
    -- Homer Simpson
    ________________________________________________________________________________
  • Keith Wiley at Apr 30, 2010 at 7:59 am
    Yep, that was part of it. Thank you. Also, I was not setting
    splittable true for the Combined Input because I knew the contained
    files themselves were no splittable. Setting the Combined Input's
    splittable to true appears to have been important as well.

    Thank you.
    On 2010, Apr 29, at 11:53 PM, Aleksandar Stupar wrote:

    Hi,

    if the mapred.max.split.size is not set (and it's not by default)
    than CombineFileInputFormat
    only takes racks in account when grouping blocks. So if you set this
    property it will take also
    block placement on machines into account and you should get multiple
    mappers.

    Hope this helps,
    Aleksandar Stupar.




    ________________________________
    From: Keith Wiley <kwiley@keithwiley.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, April 29, 2010 11:23:35 PM
    Subject: CombineFileInputFormat not producing multiple mappers

    I am using CombineFileInputFormat and CombineFileSplit to group
    small input files as fed to the mappers. The job runs properly and
    the output is correct, but I get only one mapper task, so I lose all
    my paralleization in the map stage.

    I realize I'm not providing much detail yet because I'm not sure
    what to say. Feel free to ask questions for clarification.

    What might cause this problem and how might I diagnose -- must less
    fix -- it?

    Thank you.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com www.keithwiley.com

    "And what if we picked the wrong religion? Every week, we're just
    making God
    madder and madder!"
    -- Homer Simpson
    ________________________________________________________________________________

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com keithwiley.com
    music.keithwiley.com

    "Luminous beings are we, not this crude matter."
    -- Yoda
    ________________________________________________________________________________

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 29, '10 at 9:24p
activeApr 30, '10 at 7:59a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Keith Wiley: 2 posts Aleksandar Stupar: 1 post

People

Translate

site design / logo © 2022 Grokbase