Devaraj Das updated HADOOP-6029:
Jothi and I came across another TestReduceFetch failure.
Testcase: testReduceFromDisk took 78.436 sec
Testcase: testReduceFromPartialMem took 60.701 sec
Expected at least 1MB fewer bytes read from local (21159650) than written to HDFS (21036680)
junit.framework.AssertionFailedError: Expected at least 1MB fewer bytes read from local (21159650) than written to HDFS (21036680)
Testcase: testReduceFromMem took 52.097 sec
The above failure actually looks like a memory issue. In ReduceTask.ReduceCopier.ShuffleRamManager, a memory reservation is done for in-memory shuffle, and that uses Runtime.getRuntime().maxMemory(). The return value of this seems to be machine-dependent. For the case where it failed with the exception trace above, the value returned by Runtime.maxMemory is smaller compared to the case using which the test passes. When the former happens, shuffled files start hitting the disk, and the testcase fails since it doesn't expect that many files to hit the disk.. I am attaching two logs - one of the successful testcase (all tests successful) and another for the failed testReduceFromPartialMem run. In both the logs, job_0002 is the job for the testReduceFromPartialMem test.
Nicholas, could you please upload the logs of the test failure you saw. Thanks!
Project: Hadoop Core
Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Attachments: FAILING-PARTIALMEM-TEST-org.apache.hadoop.mapred.TestReduceFetch.txt, TEST-org.apache.hadoop.mapred.TestReduceFetch.txt
Testcase: testReduceFromMem took 23.625 sec
Non-zero read from local: 83
junit.framework.AssertionFailedError: Non-zero read from local: 83
Ran TestReduceFetch a few times on a clean trunk. It failed consistently.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.