[
https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658455#action_12658455 ]
Devaraj Das commented on HADOOP-4927:
-------------------------------------
Okay, so i figured that I was referring to the old MapReduce API *smile*
There seems to be two approaches anyways. For the old API:
Today, the getRecordWriter calls relevant to the tasks are made in two places - in DirectMapOutputCollector (in the constructor) and in ReduceTask.java (just before starting to call the user's reduce method). We can probably move the calls to the respective OutputCollect.collect implementations:
{code}
if (out == null) {
out = job.getOutputFormat().getRecordWriter(fs, job, finalName, reporter);
}
{code}
For the new API, I am not yet sure what the good approach is. Maybe we could delay creating the recordwriter until TaskInputOutputContext.write is invoked.
The other approach is to delay the creation of the files on the output filesystem, until it is necessary, in the respective RecordWriter implementations. But this requires users (who have implemented recordwriters or are implementing them in the future) to be aware of such a change and thus is vulnerable to problems..
Thoughts?
Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there
--------------------------------------------------------------------------------------------------------------------------
Key: HADOOP-4927
URL:
https://issues.apache.org/jira/browse/HADOOP-4927Project: Hadoop Core
Issue Type: Bug
Reporter: Devaraj Das
Fix For: 0.20.0
When OutputFormat.getRecordWriter is invoked, a part file is created on the output filesystem. But the created RecordWriter is not used until the OutputCollector.collect call is made by the task (user's code). This results in empty part files even if the OutputCollector.collect is never invoked by the corresponding tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.