FAQ
Hi,

I am working on a project that is suited to Hadoop and so want to create a
small cluster (only 5 machines!) on our servers. The servers are however
used during the day and (mostly) idle at night.

So, I want Hadoop to run at full throttle at night and either scale back or
suspend itself during certain times.

Is it possible to do this? I've found very little information on it.

Thanks for your help!
John

Search Discussions

  • Piotr Praczyk at May 19, 2009 at 1:42 pm
    Hi John

    I don't know if there is a Hadoop support for such thing, but You can do
    this easily writing a crontab script. It could start hadoop at specified
    hour and shut it down ( disable some nodes) at another one.

    There can be some problems with HDFS however ( if you disable all the nodes
    holding replicas of some blocks, the files will become inaccessible)

    Piotr

    2009/5/19 John Clarke <clarkemjj@gmail.com>
    Hi,

    I am working on a project that is suited to Hadoop and so want to create a
    small cluster (only 5 machines!) on our servers. The servers are however
    used during the day and (mostly) idle at night.

    So, I want Hadoop to run at full throttle at night and either scale back or
    suspend itself during certain times.

    Is it possible to do this? I've found very little information on it.

    Thanks for your help!
    John
  • John Clarke at May 19, 2009 at 1:47 pm
    Hi Piotr,

    Thanks for the prompt reply.

    If the cron script shuts down Hadoop surely it won't pick up where it left
    off when it is restarted?

    All the machines will be used during the day so it is not an option to turn
    Hadoop off on only some of them.

    John




    2009/5/19 Piotr Praczyk <piotr.praczyk@gmail.com>
    Hi John

    I don't know if there is a Hadoop support for such thing, but You can do
    this easily writing a crontab script. It could start hadoop at specified
    hour and shut it down ( disable some nodes) at another one.

    There can be some problems with HDFS however ( if you disable all the nodes
    holding replicas of some blocks, the files will become inaccessible)

    Piotr

    2009/5/19 John Clarke <clarkemjj@gmail.com>
    Hi,

    I am working on a project that is suited to Hadoop and so want to create a
    small cluster (only 5 machines!) on our servers. The servers are however
    used during the day and (mostly) idle at night.

    So, I want Hadoop to run at full throttle at night and either scale back or
    suspend itself during certain times.

    Is it possible to do this? I've found very little information on it.

    Thanks for your help!
    John
  • Steve Loughran at May 19, 2009 at 2:24 pm

    John Clarke wrote:
    Hi,

    I am working on a project that is suited to Hadoop and so want to create a
    small cluster (only 5 machines!) on our servers. The servers are however
    used during the day and (mostly) idle at night.

    So, I want Hadoop to run at full throttle at night and either scale back or
    suspend itself during certain times.
    You could add/remove new task trackers on idle systems, but
    * you don't want to take away datanodes, as there's a risk that data
    will become unavailable.
    * there's nothing in the scheduler to warn that machines will go away at
    a certain time
    If you only want to run the cluster at night, I'd just configure the
    entire cluster to go up and down
  • Kevin Weil at May 19, 2009 at 2:35 pm
    Will your jobs be running night and day, or just over a specified period?
    Depending on your setup, and on what you mean by "scale back" (CPU vs disk
    IO vs memory), you could potentially restart your cluster with different
    settings at different times of the day via cron. This will kill any running
    jobs, so it'll only work if you can find or create a few free minutes. But
    then you could scale back on CPU by running with HADOOP_NICENESS nonzero
    (see conf/hadoop-env.sh), you could scale back on memory by setting the
    various process memory limits low in conf/hadoop-site.xml, and you could
    scale back on datanode work entirely by setting the maximum number of
    mappers or reducers to 1 per node during the day (also in
    conf/hadoop-site.xml).

    Kevin
    On Tue, May 19, 2009 at 7:23 AM, Steve Loughran wrote:

    John Clarke wrote:
    Hi,

    I am working on a project that is suited to Hadoop and so want to create a
    small cluster (only 5 machines!) on our servers. The servers are however
    used during the day and (mostly) idle at night.

    So, I want Hadoop to run at full throttle at night and either scale back
    or
    suspend itself during certain times.
    You could add/remove new task trackers on idle systems, but
    * you don't want to take away datanodes, as there's a risk that data will
    become unavailable.
    * there's nothing in the scheduler to warn that machines will go away at a
    certain time
    If you only want to run the cluster at night, I'd just configure the entire
    cluster to go up and down
  • John Clarke at May 19, 2009 at 4:01 pm
    The jobs will be of different sizes and some may take days to complete with
    only 5 machines, so yes some will run night and day.

    By scale back, I mean scale back on system resources (CPU, IO, RAM) so the
    machine can be used for other tasks during the day.

    I understand (as you pointed out) I can reduce the resources Hadoop uses by
    editing the hadoop-env.sh and hadoop-site.xml but only at startup, there is
    no way to do this on the fly so to speak. Is that correct?

    I think ideally a way to suspend and continue a job is preferable to scaling
    back on resources. i.e write current progress to disk in the morning and
    suspend processing and then start up again where it left off at night.

    Cheers,
    John



    2009/5/19 Kevin Weil <kevinweil@gmail.com>
    Will your jobs be running night and day, or just over a specified period?
    Depending on your setup, and on what you mean by "scale back" (CPU vs disk
    IO vs memory), you could potentially restart your cluster with different
    settings at different times of the day via cron. This will kill any
    running
    jobs, so it'll only work if you can find or create a few free minutes. But
    then you could scale back on CPU by running with HADOOP_NICENESS nonzero
    (see conf/hadoop-env.sh), you could scale back on memory by setting the
    various process memory limits low in conf/hadoop-site.xml, and you could
    scale back on datanode work entirely by setting the maximum number of
    mappers or reducers to 1 per node during the day (also in
    conf/hadoop-site.xml).

    Kevin
    On Tue, May 19, 2009 at 7:23 AM, Steve Loughran wrote:

    John Clarke wrote:
    Hi,

    I am working on a project that is suited to Hadoop and so want to create
    a
    small cluster (only 5 machines!) on our servers. The servers are however
    used during the day and (mostly) idle at night.

    So, I want Hadoop to run at full throttle at night and either scale back
    or
    suspend itself during certain times.
    You could add/remove new task trackers on idle systems, but
    * you don't want to take away datanodes, as there's a risk that data will
    become unavailable.
    * there's nothing in the scheduler to warn that machines will go away at a
    certain time
    If you only want to run the cluster at night, I'd just configure the entire
    cluster to go up and down

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 19, '09 at 1:36p
activeMay 19, '09 at 4:01p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase