Combiner that aggregates all the mappers from a machine

Key: HADOOP-5340
URL: https://issues.apache.org/jira/browse/HADOOP-5340
Project: Hadoop Core
Issue Type: New Feature
Affects Versions: 0.19.1
Reporter: Nathan Marz

From what I can tell, the Combiner just aggregates data from a single map task. It would be useful, especially during map-only jobs, to have a combiner that aggregates data from all the map tasks on a given machine. My use case for this is to vertically partition a set of records which start out in the same files. By doing this in a map-only task, way too many files are created (About 50 files are created per input split). By pumping all the data through a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I would get 50*number of machines files rather than 50*number of input splits files for this use case.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
postedFeb 26, '09 at 7:07p
activeFeb 26, '09 at 7:07p

1 user in discussion

Nathan Marz (JIRA): 1 post



site design / logo © 2022 Grokbase