FAQ
What is mapred.child.ulimit set to? This configuration options specifics
how much memory child processes are allowed to have. You may want to up
this limit and see what happens.

Let me know if that doesn't get you anywhere.

Alex
On Wed, Jun 10, 2009 at 9:40 AM, Scott wrote:

Complete newby map/reduce question here. I am using hadoop streaming as I
come from a Perl background, and am trying to prototype/test a process to
load/clean-up ad server log lines from multiple input files into one large
file on the hdfs that can then be used as the source of a hive db table.
I have a perl map script that reads an input line from stdin, does the
needed cleanup/manipulation, and writes back to stdout. I don't really
need a reduce step, as I don't care what order the lines are written in, and
there is no summary data to produce. When I run the job with -reducer NONE
I get valid output, however I get multiple part-xxxxx files rather than one
big file.
So I wrote a trivial 'reduce' script that reads from stdin and simply
splits the key/value, and writes the value back to stdout.

I am executing the code as follows:

./hadoop jar ../contrib/streaming/hadoop-0.19.1-streaming.jar -mapper
"/usr/bin/perl /home/hadoop/scripts/map_parse_log_r2.pl" -reducer
"/usr/bin/perl /home/hadoop/scripts/reduce_parse_log.pl" -input /logs/*.log
-output test9

The code I have works when given a small set of input files. However, I
get the following error when attempting to run the code on a large set of
input files:

hadoop-hadoop-jobtracker-testdw0b00.log.2009-06-09:2009-06-09 15:43:00,905
WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node
tracker_testdw0b00:localhost.localdomain/127.0.0.1:53245 has 2004049920
bytes free; but we expect reduce input to take 22138478392

I assume this is because the all the map output is being buffered in memory
prior to running the reduce step? If so, what can I change to stop the
buffering? I just need the map output to go directly to one large file.

Thanks,
Scott

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 4 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 10, '09 at 4:40p
activeJun 11, '09 at 6:13a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase