[ https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669812#action_12669812 ]

Chris Douglas commented on HADOOP-4927:

bq. this feature is a sort of generic across the different output formats and having the framework support this would be useful.

True. Still, while it is generic functionality, it's neither difficult nor inefficient in user-space. Absent either of the latter criteria, putting it into the framework seems premature, at least. If this should be abstracted, wouldn't it make sense as an OutputFormat in lib? That seems no less brittle than using a framework configuration variable and I've difficulty seeing this setting scoped to the whole cluster...

bq. I am okay with the current patch in terms of the behavior and the framework support it adds for lazy creation of the destination output "something" (where "something" is easy to explain when interpreted as a file).

Unless there's a non-FileOutputFormat use case that's also easy to explain, it remains unclear that this is the correct abstraction. Creating a setting on FileOutputFormat seems like a good idea. Whether that's implemented in FileOutputFormat only or via an OuputFormat wrapper class depends on how generally applicable the abstraction is. Since its motivation is expensive clutter in HDFS, it's not obvious to me that the latter is justified, let alone tight integration with the framework.
Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there

Key: HADOOP-4927
URL: https://issues.apache.org/jira/browse/HADOOP-4927
Project: Hadoop Core
Issue Type: New Feature
Components: mapred
Reporter: Devaraj Das
Assignee: Jothi Padmanabhan
Fix For: 0.21.0

Attachments: hadoop-4927-v1.patch, hadoop-4927-v2.patch, hadoop-4927.patch

When OutputFormat.getRecordWriter is invoked, a part file is created on the output filesystem. But the created RecordWriter is not used until the OutputCollector.collect call is made by the task (user's code). This results in empty part files even if the OutputCollector.collect is never invoked by the corresponding tasks.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 25 of 48 | next ›
Discussion Overview
groupcommon-dev @
postedDec 22, '08 at 6:01a
activeFeb 23, '09 at 3:19p

1 user in discussion

Hudson (JIRA): 48 posts



site design / logo © 2022 Grokbase