FAQ
[ https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658455#action_12658455 ]

Devaraj Das commented on HADOOP-4927:
-------------------------------------

Okay, so i figured that I was referring to the old MapReduce API *smile*
There seems to be two approaches anyways. For the old API:
Today, the getRecordWriter calls relevant to the tasks are made in two places - in DirectMapOutputCollector (in the constructor) and in ReduceTask.java (just before starting to call the user's reduce method). We can probably move the calls to the respective OutputCollect.collect implementations:
{code}
if (out == null) {
out = job.getOutputFormat().getRecordWriter(fs, job, finalName, reporter);
}
{code}

For the new API, I am not yet sure what the good approach is. Maybe we could delay creating the recordwriter until TaskInputOutputContext.write is invoked.

The other approach is to delay the creation of the files on the output filesystem, until it is necessary, in the respective RecordWriter implementations. But this requires users (who have implemented recordwriters or are implementing them in the future) to be aware of such a change and thus is vulnerable to problems..

Thoughts?
Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there
--------------------------------------------------------------------------------------------------------------------------

Key: HADOOP-4927
URL: https://issues.apache.org/jira/browse/HADOOP-4927
Project: Hadoop Core
Issue Type: Bug
Reporter: Devaraj Das
Fix For: 0.20.0


When OutputFormat.getRecordWriter is invoked, a part file is created on the output filesystem. But the created RecordWriter is not used until the OutputCollector.collect call is made by the task (user's code). This results in empty part files even if the OutputCollector.collect is never invoked by the corresponding tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 48 | next ›
Discussion Overview
groupcommon-dev @
categorieshadoop
postedDec 22, '08 at 6:01a
activeFeb 23, '09 at 3:19p
posts48
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 48 posts

People

Translate

site design / logo © 2022 Grokbase