FAQ
I realized that I made a mistake in my earlier post. So here is the correct one.

I have a job ("loadgen") with only 1 input (say) part-00000 of size
1368654 bytes.

So when I submit this job, I get the following output:

INFO mapred.FileInputFormat: Total input paths to process : 1

However, in the JobTracker log, I see the following entry:

Split info for job:job_201003131110_0043 with 2 splits

and subsequently 2 map tasks are started to process these two splits.
The size of input splits to these 2 map tasks is 6843283. So the input
is divided equally into two splits.

My question is: Why are two map tasks created instead of one and why
is the combined size of the two splits greater than the size of my
input?

I also noticed that if I run the same job with 2 inputs (say)
part-00000 and part-00001, then only 2 map tasks are created.

To my knowledge, the number of map tasks should be the same as the
number of inputs.

Thanks,

Search Discussions

  • Ravi Phulari at Mar 25, 2010 at 4:33 am
    Hello Abhishek ,

    Unless you have modified conf/mapred-site.xml file MapReduce will use configuration values specified in $HADOOP_HOME/src/mapred/mapred-default.xml
    In this configuration file mapred.map.tasks is configured as 2. And due to this your job is running 2 map tasks.

    <property>
    <name>mapred.map.tasks</name>
    <value>2</value>
    <description>The default number of map tasks per job.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Hope this helps.
    -
    Ravi

    On 3/24/10 7:27 PM, "abhishek sharma" wrote:

    I realized that I made a mistake in my earlier post. So here is the correct one.

    I have a job ("loadgen") with only 1 input (say) part-00000 of size
    1368654 bytes.

    So when I submit this job, I get the following output:

    INFO mapred.FileInputFormat: Total input paths to process : 1

    However, in the JobTracker log, I see the following entry:

    Split info for job:job_201003131110_0043 with 2 splits

    and subsequently 2 map tasks are started to process these two splits.
    The size of input splits to these 2 map tasks is 6843283. So the input
    is divided equally into two splits.

    My question is: Why are two map tasks created instead of one and why
    is the combined size of the two splits greater than the size of my
    input?

    I also noticed that if I run the same job with 2 inputs (say)
    part-00000 and part-00001, then only 2 map tasks are created.

    To my knowledge, the number of map tasks should be the same as the
    number of inputs.

    Thanks,


    Ravi
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 25, '10 at 2:27a
activeMar 25, '10 at 4:33a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Abhishek sharma: 1 post Ravi Phulari: 1 post

People

Translate

site design / logo © 2022 Grokbase