FAQ
Hi,

I want to control the number of mappers tasks running simultaneously. Is there a way to do that if I run Pig jobs on hadoop ?

Any input is helpful.

Thanks,
Rahul

Search Discussions

  • Medha Atre at Sep 16, 2010 at 5:59 pm
    Number of mappers can be controlled inside the code through
    conf.setNumMapTasks(); But from what I have read in Hadoop
    documentation, hadoop only "honors" this number and is not guaranteed
    to obey it.

    You can read more on
    http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Mapper

    Hope that helps.
    On Tue, Sep 14, 2010 at 5:32 PM, Rahul Malviya wrote:
    Hi,

    I want to control the number of mappers tasks running simultaneously. Is there a way to do that if I run Pig jobs on hadoop ?

    Any input is helpful.

    Thanks,
    Rahul
  • Amogh Vasekar at Sep 17, 2010 at 5:55 am
    Hi Rahul,
    Can you please be more specific? Do you want to control mappers running simultaneously for your job ( I guess ) or the cluster as a whole?
    If for your job, and you want to control it on a per node basis, one way is to allocate more memory to each of your mapper so it occupies more than one slot. If a slot is free, a task will be scheduled on it and that's more or less out of your control, especially so in pig.
    In case you want a global cap on simultaneous mappers, its a little more complicated and inefficient too. A little more detail on your use case should get you better response on the list.
    Sorry if I misunderstood your quesiton.

    Amogh


    On 9/15/10 3:02 AM, "Rahul Malviya" wrote:

    Hi,

    I want to control the number of mappers tasks running simultaneously. Is there a way to do that if I run Pig jobs on hadoop ?

    Any input is helpful.

    Thanks,
    Rahul
  • Allen Wittenauer at Sep 17, 2010 at 3:34 pm

    On Sep 16, 2010, at 10:54 PM, Amogh Vasekar wrote:
    If for your job, and you want to control it on a per node basis, one way is to allocate more memory to each of your mapper so it occupies more than one slot. If a slot is free, a task will be scheduled on it and that's more or less out of your control, especially so in pig.
    I'm fairly certain this is only true if you are running Y!'s build or Apache 0.21+ with capacity scheduler.
  • Rahul at Sep 17, 2010 at 9:13 pm
    Hi Allen,

    Thank You for your input. Basically I am using Hadoop 0.20.2 along with Pig 0.7.0 so Is it possible in this scenario and also Please let me know exactly can we allocate more memory to each mapper.

    Thanks,
    Rahul
    On Sep 17, 2010, at 8:33 AM, Allen Wittenauer wrote:

    On Sep 16, 2010, at 10:54 PM, Amogh Vasekar wrote:
    If for your job, and you want to control it on a per node basis, one way is to allocate more memory to each of your mapper so it occupies more than one slot. If a slot is free, a task will be scheduled on it and that's more or less out of your control, especially so in pig.
    I'm fairly certain this is only true if you are running Y!'s build or Apache 0.21+ with capacity scheduler.
  • Rahul at Sep 17, 2010 at 6:39 pm
    HI Amogh,

    Thanks for the Input. Basically when I run a pig job on the Hadoop ad I monitor the job through tracker I always see 4 mappers in the Running section. So, I just wanted to know whether we have any parameter through which we can control the number of mappers in the running section to speed up the process. As for the number of reducers if I set default_parallel 4; in my pig script the number of reducers increases. So I was wondering is there a way to increase the total number of mappers as well as the total running simultaneously.

    And Is a rough calculation through which I can calculate the total simultaneous mappers for my job ?

    So I think you got my question right. But you mentioned we can allocate more to each mapper, how that is possible if I run a pig job ?

    Just one more verification. I should be able to control the number of mappers for my job if I write my own mappers or reducer using the hadoop api and direct the jobs through my custom code. Please let me know my understanding is correct in this case.

    Thanks,
    Rahul

    On Sep 16, 2010, at 10:54 PM, Amogh Vasekar wrote:

    Hi Rahul,
    Can you please be more specific? Do you want to control mappers running simultaneously for your job ( I guess ) or the cluster as a whole?
    If for your job, and you want to control it on a per node basis, one way is to allocate more memory to each of your mapper so it occupies more than one slot. If a slot is free, a task will be scheduled on it and that's more or less out of your control, especially so in pig.
    In case you want a global cap on simultaneous mappers, its a little more complicated and inefficient too. A little more detail on your use case should get you better response on the list.
    Sorry if I misunderstood your quesiton.

    Amogh


    On 9/15/10 3:02 AM, "Rahul Malviya" wrote:

    Hi,

    I want to control the number of mappers tasks running simultaneously. Is there a way to do that if I run Pig jobs on hadoop ?

    Any input is helpful.

    Thanks,
    Rahul

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 14, '10 at 9:33p
activeSep 17, '10 at 9:13p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase