FAQ
Hi people,
I've a cluster where around 40% of nodes are low on disk space. The output after the maps is too big for the mapred.local.dir in the nodes low on disk space and with hadoop it happens a lot that it tries to flush the output to these nodes, fails, tries on some other node until it finally flushes the data into one of the nodes with large disk space.

So my question is whether its possible for hadoop to select or for us to be able to notify hadoop of the nodes which have larger disk space so that it doesn't waste time on nodes with low disk space.

Many thanks

H

Search Discussions

  • Allen Wittenauer at Feb 12, 2010 at 10:03 pm

    On 2/12/10 1:42 PM, "himanshu chandola" wrote:
    So my question is whether its possible for hadoop to select or for us to be
    able to notify hadoop of the nodes which have larger disk space so that it
    doesn't waste time on nodes with low disk space.
    The only way I know of is for you to build a custom scheduler that takes
    space into consideration.

    Another possiblity is to have two job trackers, one with the big nodes, the
    other with the small nodes. Then run jobs on the appropriate job trackers.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 12, '10 at 9:42p
activeFeb 12, '10 at 10:03p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase