I'm not sure that is possible. You can use the NLineInputFormat as a control file and have a line per node in the cluster. I've used that technique for a data generation program and it works well. This will run a pre-determined number of mappers. However, it's up to the scheduler to decide when and where they run. If other jobs are running concurrently, I don't believe you can be guaranteed you'll get a distinct mapper per node.
Running my data generator job on a quiet cluster did run one mapper per node as I wanted. But if you don't have more control over your cluster, I believe the behavior is not deterministic.
From: Massimo Schiavon
Sent: Friday, April 15, 2011 10:04 AM
Subject: Force single map task execution per node for a job
I need that during the execution of a particular job, a maximum of one map task execute on each cluster node.
I've tried setting mapred.tasktracker.map.tasks.maximum=1 on job configuration but seems not to work.
Anyone coul'd help?
DISCLAIMER: This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.