|| at Sep 17, 2010 at 6:39 pm
Thanks for the Input. Basically when I run a pig job on the Hadoop ad I monitor the job through tracker I always see 4 mappers in the Running section. So, I just wanted to know whether we have any parameter through which we can control the number of mappers in the running section to speed up the process. As for the number of reducers if I set default_parallel 4; in my pig script the number of reducers increases. So I was wondering is there a way to increase the total number of mappers as well as the total running simultaneously.
And Is a rough calculation through which I can calculate the total simultaneous mappers for my job ?
So I think you got my question right. But you mentioned we can allocate more to each mapper, how that is possible if I run a pig job ?
Just one more verification. I should be able to control the number of mappers for my job if I write my own mappers or reducer using the hadoop api and direct the jobs through my custom code. Please let me know my understanding is correct in this case.
On Sep 16, 2010, at 10:54 PM, Amogh Vasekar wrote:
Can you please be more specific? Do you want to control mappers running simultaneously for your job ( I guess ) or the cluster as a whole?
If for your job, and you want to control it on a per node basis, one way is to allocate more memory to each of your mapper so it occupies more than one slot. If a slot is free, a task will be scheduled on it and that's more or less out of your control, especially so in pig.
In case you want a global cap on simultaneous mappers, its a little more complicated and inefficient too. A little more detail on your use case should get you better response on the list.
Sorry if I misunderstood your quesiton.
On 9/15/10 3:02 AM, "Rahul Malviya" wrote:
I want to control the number of mappers tasks running simultaneously. Is there a way to do that if I run Pig jobs on hadoop ?
Any input is helpful.