FAQ
Hi!

I've noticed that streaming has big problems handling long lines, when
streaming.

In my special case the output of a reducer process takes very long time to run
and sometimes crashes with a number of random effects, a Java OutOfMemory
being the nicest one.

(which is a fact. A reducer outputing 10000 32000 byte lines takes ~11 minutes
to run. A reducer outputing 10 32000000 byte lines takes ~110 minutes)

So my questions are:
-) are the sorts used by hadoop stable?
-) how does hadoop arrive at the allocation of input lines to reducer? => I've
got situations were 19 reducers have processed 20M input lines, while 1
reducer needed to process 80M input lines.

TIA,

Andreas

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 8, '08 at 11:17a
activeJul 8, '08 at 11:17a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Andreas Kostyrka: 1 post

People

Translate

site design / logo © 2022 Grokbase