FAQ
If you don't make the assumption in your reduce function that you can
fit all values for a key in memory, what's the preferred way of
outputting a collection of values? I've been using ArrayWritable, but
this requires you first build up an array of values in memory. This
worked until I ramped up the size of the input and started getting out
of memory errors.

IdentityReducer would work, but it seems wasteful to output the key
for each value. Right now I'm doing emit(key, "") for the key and
emit("", value) for each value, but this feels like a hack. It also
makes for additional work to serialize back into key/value pairs,
unlike the (memory-consuming) ArrayWritable approach.

Ed

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJan 16, '10 at 10:16p
activeJan 16, '10 at 10:16p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Ed Mazur: 1 post

People

Translate

site design / logo © 2022 Grokbase