Am curious how folks are sizing memory for Task nodes.
It didn't seem to me that either map (memory needed ~ chunk size) or
reduce (memory needed ~ io.sort.mb - yahoo's benchmark sort run sets it
to low hundreds) tasks consumed a lot of memory in the normal course of
affairs.
(there could be exceptions I guess if the reduce group size is extremely
large - but that seems like an outlier).
So curious on why we might want to configure more than 1-1.5GB per core
for task nodes.
-----Original Message-----
From: Allen Wittenauer
Sent: Tuesday, September 25, 2007 10:02 AM
To: hadoop-user@lucene.apache.org
Subject: Re: hardware specs for hadoop nodes
On 9/25/07 9:27 AM, "Bob Futrelle" wrote:>
I'm in the market to buy a few machines to set up a small cluster and am
wondering what I should consider.
If it helps, we're using quad core x86s with anywhere from 4g to 16g
of
ram. We've got 4x500g sata drives per box, no raid, swap and root
taking a
chunk out of each drive and the rest used for HDFS and/or MR work.
While you can certainly go a much more heterogeneous route than we
have,
it should be noted that the more differences in the hardware/software
layout, the more difficult is going to be to maintain them. This is
especially true for large grids where hand-tuning individual machines
just
isn't worth the return on effort.
Or should I just spread Hadoop over some friendly machines already in my
College, buying nothing?
Given the current lack of a security model in Hadoop and the
direction
that a smattering of Jira's are heading, "friendly" could go either way:
either not friendly enough or too friendly. :)