To test bottle neck, I tried to figure out if some processes/threads are often blocked and wait for either disk or network i/o and why if either mapper or reducer runs slow. In my case, on each slave, up to 12 mappers are allowed to run simultaneously. CPU are more than 90% of time in idle mode and about at most 2% in iowait. But I found most mappers (from "top" and "jps") were in sleep and strace shows that they (including tasktracker and datanode) were blocked on futex(0x4035b9d0, FUTEX_WAIT, 12566, NULL,
Here's a list of accumulated open files (including network, pipe, socket, etc) of data node grouped by type;
Here's a list of accumulated open files (including network, pipe,
socket, etc) of task tracker grouped by type;
Here's a typical mapper thread:
A mapper would block on futex for about a minute or so. It seems to me that various i/o cannot catch up with CPU. Would it be helpful to increase some buffer parameters to handle this? OR does this stats imply sth else? BTW, what is an effective way to analyze peformance of a hadoop cluster and what about good tools? Any recommendations?