I am trying to run 8 map tasks with 2 reduce on 3 machines. Each task
runs on a 6 MB text file and 500 such files. The monitoring page shows
very few number of Map tasks running than intended. Sometimes some nodes
doesn't even get any tasks assigned though there are large number of
files remaining needs to be scheduled for map operation. Is it due to
distributing the files across nodes? In fact, my file system is set to
local.
Some important parameters are listed below
Io.sort.factor=100
Io.sort.mb = 1000
Io.file.buffer.size = 4096000
Io.bytes.checksum=128
Mapred.map.tasks=16
Mapred.reduce.tasks=2
Mapred.tasktracker.tasks.maximum=4
Mapred.combine.buffer.size=100000
Is there any parameter I am missing to maximize the use of all CPUS?
Thanks,
VJ