For one long running job we are noticing that the mapper jvms do not exit
even after the mapper is done. Any suggestions on why this could be
happening.
The java processes get cleaned up if I do a hadoop job -kill <job_id>. The
java processes get cleaned up of I run in it in a smaller batch and the job
gets done fairly quickly(say half an hour). For larger inputs the nodes
eventually run out of memory because of these java processes that the
cluster thinks are gone but they haven't been cleaned up yet. I am
suspecting the TaskTrackers are failing to kill JVMs for some reason by
themselves.
The following exceptions can be seen in the hadoop logs.
2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process
2011-05-12 13:52:08,071 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such process
2011-05-12 13:52:09,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such process
2011-05-12 13:52:12,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such process
2011-05-12 13:52:13,306 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such process
2011-05-12 13:52:14,996 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such process
2011-05-12 15:51:41,105 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such process
2011-05-12 15:51:43,481 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such process
2011-05-12 15:51:45,916 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such process
2011-05-12 15:52:06,328 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such process
2011-05-12 15:52:34,503 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such process
2011-05-12 15:52:38,607 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such process
2011-05-12 15:52:39,292 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such process
2011-05-12 15:52:46,547 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such process
Some other exceptions also seen in the logs may or may not be related to the
above problem.
2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 80 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 59 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:01:58,977 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:02:04,040 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 37 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:02:09,095 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 100 on 33465 caught: java.nio.channels.ClosedChannelException
Thanks.
-Adi