FAQ
Hi all,

I'm a newcomer to Hadoop development, and I'm planning to work on an idea
that I wanted to run by the dev community.

My apologies if this is not the right place to post this.

Amazon has an "Elastic MapReduce" Service (
http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop.
The service allows dynamic/runtime changes in resource allocation: more
specifically, varying the number of
compute nodes that a job is running on.

I was wondering if such a facility could be added to the publicly available
Hadoop MapReduce.

Does this idea make sense, has any previous work been done on this?
I'd appreciate it if someone could point me the right way to find out more!

Thanks a lot in advance!
--
Bharath Ravi

Search Discussions

  • Amandeep Khurana at Sep 14, 2011 at 9:19 pm
    Hi Bharath,

    Amazon EMR has two kinds of nodes - Task and Core. Core nodes run HDFS and
    MapReduce but task nodes run only MapReduce. You can only add core nodes but
    you can add and remove task nodes in a running cluster. In other words, you
    can't reduce the size of HDFS. You can only increase it.

    There is nothing that stops you from doing that in other Hadoop clusters.
    You can configure new nodes that point to the master (for HDFS and
    MapReduce) and they will get added to the cluster. In order to remove nodes
    from the cluster, you can decommission them. Is there a specific use case
    you are trying to solve that already existing mechanism does not solve?

    -ak
    On Wed, Sep 14, 2011 at 1:27 PM, Bharath Ravi wrote:

    Hi all,

    I'm a newcomer to Hadoop development, and I'm planning to work on an idea
    that I wanted to run by the dev community.

    My apologies if this is not the right place to post this.

    Amazon has an "Elastic MapReduce" Service (
    http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop.
    The service allows dynamic/runtime changes in resource allocation: more
    specifically, varying the number of
    compute nodes that a job is running on.

    I was wondering if such a facility could be added to the publicly available
    Hadoop MapReduce.

    Does this idea make sense, has any previous work been done on this?
    I'd appreciate it if someone could point me the right way to find out more!

    Thanks a lot in advance!
    --
    Bharath Ravi
  • Ted Dunning at Sep 14, 2011 at 9:21 pm
    This makes a bit of sense, but you have to worry about the inertia of the
    data. Adding compute resources is easy. Adding data resources, not so
    much. And if the computation is not near the data, then it is likely to be
    much less effective.
    On Wed, Sep 14, 2011 at 4:27 PM, Bharath Ravi wrote:

    Hi all,

    I'm a newcomer to Hadoop development, and I'm planning to work on an idea
    that I wanted to run by the dev community.

    My apologies if this is not the right place to post this.

    Amazon has an "Elastic MapReduce" Service (
    http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop.
    The service allows dynamic/runtime changes in resource allocation: more
    specifically, varying the number of
    compute nodes that a job is running on.

    I was wondering if such a facility could be added to the publicly available
    Hadoop MapReduce.

    Does this idea make sense, has any previous work been done on this?
    I'd appreciate it if someone could point me the right way to find out more!

    Thanks a lot in advance!
    --
    Bharath Ravi
  • Steve Loughran at Sep 15, 2011 at 9:25 am

    On 14/09/11 22:20, Ted Dunning wrote:
    This makes a bit of sense, but you have to worry about the inertia of the
    data. Adding compute resources is easy. Adding data resources, not so
    much.
    I've done it. Like Ted says, pure compute nodes generate more network
    traffic on both reads and writes, if you bring up Datanodes then you
    have to leave them around. The strength is that the infrastructure can
    sell them to you for a lower $/hour with the condition that they can
    take them back when demand gets high; these compute-only nodes would be
    infrastructure-pre-emptible. Look for the presentation "farming hadoop
    in the cloud" for more details, though I don't discuss pre-emption or
    infrastructure-specifics.

    if the computation is not near the data, then it is likely to be
    much less effective.
    Which implies your infrastructure needs to be data aware and know to
    bring up the new VMs on the same racks as the HDFS nodes.

    There are some infrastructure-aware runtimes -I am thinking of the
    Technical University of Berlin's Stratosphere project- which takes a job
    at higher level than MR commands -more like Pig or Hive- and comes up
    with an execution plan that can schedule for optimal acquisition and
    re-use of VMs, knowing both the cost of machines and the hysteresis of
    VM setup/teardown and hourly costs. You can then impose policies like
    "fast execution" or "lowest cost", and have different plans
    created/executed.

    This is all PhD grade research with a group of highly skilled postgrads
    led by a professor who has worked in VLDBs, I would not attempt to
    retrofit this into Hadoop on my own. That said, if you want a PhD, you
    could contact that team or the UC Irvine people working on Algebricks
    and Hyracks and convince them that you should joint their teams.

    -steve
  • Arun C Murthy at Sep 14, 2011 at 9:25 pm

    On Sep 14, 2011, at 1:27 PM, Bharath Ravi wrote:

    Hi all,

    I'm a newcomer to Hadoop development, and I'm planning to work on an idea
    that I wanted to run by the dev community.

    My apologies if this is not the right place to post this.

    Amazon has an "Elastic MapReduce" Service (
    http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop.
    The service allows dynamic/runtime changes in resource allocation: more
    specifically, varying the number of
    compute nodes that a job is running on.

    I was wondering if such a facility could be added to the publicly available
    Hadoop MapReduce.
    From a long while you can bring up either DataNodes or TaskTrackers and point them (via config) to the NameNode/JobTracker and they will be part of the cluster.

    Similarly you can just kill the DataNode or TaskTracker and the respective masters will deal with their loss.

    Arun
  • Bharath Ravi at Sep 15, 2011 at 1:02 am
    Thanks a lot, all!

    An end goal of mine was to make Hadoop as flexible as possible.
    Along the same lines, but unrelated to the above idea, was another I
    encountered,
    courtesy http://hadoopblog.blogspot.com/2010/11/hadoop-research-topics.html

    The blog mentions the ability to dynamically append Input.
    Specifically, can I append input to the Map and Reduce tasks after they've
    been started?

    I haven't been able to find something like this at a precursory glance, but
    could someone
    advice me on this before I dig deeper?

    1. Does such functionality exist, or is it being attempted?
    2. I would assume most cases would simply require starting a second Job for
    the new input.
    However, are there practical use cases to such a feature?
    3. Are there any other ideas on such "flexibility" of the system that I
    could contribute to?

    Thanks again for your help!

    On 14 September 2011 17:24, Arun C Murthy wrote:

    On Sep 14, 2011, at 1:27 PM, Bharath Ravi wrote:

    Hi all,

    I'm a newcomer to Hadoop development, and I'm planning to work on an idea
    that I wanted to run by the dev community.

    My apologies if this is not the right place to post this.

    Amazon has an "Elastic MapReduce" Service (
    http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop.
    The service allows dynamic/runtime changes in resource allocation: more
    specifically, varying the number of
    compute nodes that a job is running on.

    I was wondering if such a facility could be added to the publicly available
    Hadoop MapReduce.
    From a long while you can bring up either DataNodes or TaskTrackers and
    point them (via config) to the NameNode/JobTracker and they will be part of
    the cluster.

    Similarly you can just kill the DataNode or TaskTracker and the respective
    masters will deal with their loss.

    Arun

    --
    Bharath Ravi
  • Steve Loughran at Sep 15, 2011 at 9:17 am

    On 15/09/11 02:01, Bharath Ravi wrote:
    Thanks a lot, all!

    An end goal of mine was to make Hadoop as flexible as possible.
    Along the same lines, but unrelated to the above idea, was another I
    encountered,
    courtesy http://hadoopblog.blogspot.com/2010/11/hadoop-research-topics.html

    The blog mentions the ability to dynamically append Input.
    Specifically, can I append input to the Map and Reduce tasks after they've
    been started?
    Dhruba is referring to something that they've actually implemented in
    their version of Hive, which is the ability to gradually increase the
    data input to a running Hive job.

    This lets them do a query like "find 8 friends in california" without
    searching the entire dataset; pick a subset, search that, and if there
    are enough results, stop. If not: feed in some more data.

    I have a paper on it that shows that for data with little or no skew,
    this is much faster than a full scan; for skewed data where all the
    results are in a subset of blocks it is about the same as a full scan
    -it depends on which block size is found.
    I haven't been able to find something like this at a precursory glance, but
    could someone
    advice me on this before I dig deeper?

    1. Does such functionality exist, or is it being attempted?
    It exists for Hive though not in trunk, to get it in there would be
    mostly a matter of taking the existing code and slotting it in.
    2. I would assume most cases would simply require starting a second Job for
    the new input.
    No, because that loses all existing work and requires rescheduling more
    work. The goal of this is to execute one job that can bail out early.

    The Facebook code runs with Hive, for classic MR jobs the first step
    would be to allow Map tasks to finish early. I think there may be a way
    to do that and plan to do some experiments to see if I'm right.

    What would be more dramatic would be for the JT to be aware that jobs
    may finish early and have it slowly ramp up the map operations if they
    don't set some "finished" flag (which would presumably be a shared
    counter), until the entire dataset gets processed if the early finish
    doesn't work. This slow-start could be taken into account in the
    scheduler which could than know that the initial resource needs of the
    Job are quite low, but may increase.
    However, are there practical use cases to such a feature? See above
    3. Are there any other ideas on such "flexibility" of the system that I
    could contribute to?
    While it's great that you want to do big things in Hadoop, I'd recommend
    you start using it and learning your way around the codebase -especially
    of SVN trunk or the unreleased 0.23 branch, as they are where all major
    changes will go, and the MR engine has been radically reworked for
    better scheduling.

    Start writing MR jobs that work under the new engine, using existing
    public datasets, or look at the layers above, then think how things
    could be improved.
  • Junping Du at Sep 15, 2011 at 9:14 am
    Hello Arun and all,
    I think current hadoop have a good capability of scale out but not so good at scale in. As its design for dedicated cluster and machines, there is not too much attention for "scale in" capability in a long time. However, I noticed that there are more and more users to deploy hadoop clusters in Cloud (ec2, eucalyptus, etc.) or shared infrastructures(vmware, xen) that "scale in" capability can contribute to save resource utilization for other clusters or applications. The current "scale in" solution (as you proposed in previous mail) have some significant drawbacks:
    1. It doesn't use a formal way to handle scale-in case but rather a temporary workaround base on a disaster recovery mechanism.
    2. It is not convenient, Hadoop admin have to manually kill datanode one by one(in fact, maximum to be N(replica number) -1 each time to avoid possible data loss) and wait replica back To shrink a cluster from 1000 nodes to 500 nodes, how much time and effort it could be?
    3. It is not efficient as it is not well planned. Let's say both node A, B and C should be eliminated from cluster. At first, A and B will be eliminated from cluster ( suppose N =3), and it is possible that C can get some replicas for block in A or B. This problem is serious if big shrink happens.
    Thus, I think it is necessary to have a good discussion to let hadoop have this cool "elastic" features. Here I am volunteer for proposing one possible solution and welcome better solutions:
    1. We can think of breaking out the assumption of coexist of Datanode and TaskTracker on one machine and let some machines only have task node. I think network traffic inside a rack is not so expensive, but you may say that it waste some local I/O resource for machines only with task node. Hey, don't look at these machines as dedicated resource for this hadoop cluster. They can be used by other clusters and application(so they should be eliminated at some time). To this cluster, these machines are better than nothing, right?
    2. The percentage of machines with only task node in whole cluster is a "elastic" factor for this cluster. Take a example, if this cluster want to be scalable between "500"-"1000", the elastic factor could be 1/2, and it should have 500 normal machines with both data and task nodes and another 500 machines with task node only.
    3. Elastic factor can be configured by hadoop admin and non-dedicated machines in this cluster can be marked through some script like what have been done in rack-awareness.
    4. One command is provided to hadoop admin to shrink the cluster to the target size directly. Some policy can be applied here for waiting or not waiting task completed. If target size is smaller than elastic factor * current size, some data node will be killed too but in a well planned way.
    My 2 cents.

    Thanks,

    Junping


    ________________________________
    From: Arun C Murthy <acm@hortonworks.com>
    To: common-dev@hadoop.apache.org
    Sent: Thursday, September 15, 2011 5:24 AM
    Subject: Re: Adding Elasticity to Hadoop MapReduce

    On Sep 14, 2011, at 1:27 PM, Bharath Ravi wrote:

    Hi all,

    I'm a newcomer to Hadoop development, and I'm planning to work on an idea
    that I wanted to run by the dev community.

    My apologies if this is not the right place to post this.

    Amazon has an "Elastic MapReduce" Service (
    http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop.
    The service allows dynamic/runtime changes in resource allocation: more
    specifically, varying the number of
    compute nodes that a job is running on.

    I was wondering if such a facility could be added to the publicly available
    Hadoop MapReduce.
    From a long while  you can bring up either DataNodes or TaskTrackers and point them (via config) to the NameNode/JobTracker and they will be part of the cluster.

    Similarly you can just kill the DataNode or TaskTracker and the respective masters will deal with their loss.

    Arun
  • Steve Loughran at Sep 15, 2011 at 9:27 am

    On 15/09/11 10:14, Junping Du wrote:
    Hello Arun and all,
    I think current hadoop have a good capability of scale out but not so good at scale in. As its design for dedicated cluster and machines, there is not too much attention for "scale in" capability in a long time. However, I noticed that there are more and more users to deploy hadoop clusters in Cloud (ec2, eucalyptus, etc.) or shared infrastructures(vmware, xen) that "scale in" capability can contribute to save resource utilization for other clusters or applications. The current "scale in" solution (as you proposed in previous mail) have some significant drawbacks:
    1. It doesn't use a formal way to handle scale-in case but rather a temporary workaround base on a disaster recovery mechanism.
    2. It is not convenient, Hadoop admin have to manually kill datanode one by one(in fact, maximum to be N(replica number) -1 each time to avoid possible data loss) and wait replica back To shrink a cluster from 1000 nodes to 500 nodes, how much time and effort it could be?
    3. It is not efficient as it is not well planned. Let's say both node A, B and C should be eliminated from cluster. At first, A and B will be eliminated from cluster ( suppose N =3), and it is possible that C can get some replicas for block in A or B. This problem is serious if big shrink happens.
    Thus, I think it is necessary to have a good discussion to let hadoop have this cool "elastic" features. Here I am volunteer for proposing one possible solution and welcome better solutions:
    1. We can think of breaking out the assumption of coexist of Datanode and TaskTracker on one machine and let some machines only have task node. I think network traffic inside a rack is not so expensive, but you may say that it waste some local I/O resource for machines only with task node. Hey, don't look at these machines as dedicated resource for this hadoop cluster. They can be used by other clusters and application(so they should be eliminated at some time). To this cluster, these machines are better than nothing, right?
    2. The percentage of machines with only task node in whole cluster is a "elastic" factor for this cluster. Take a example, if this cluster want to be scalable between "500"-"1000", the elastic factor could be 1/2, and it should have 500 normal machines with both data and task nodes and another 500 machines with task node only.
    3. Elastic factor can be configured by hadoop admin and non-dedicated machines in this cluster can be marked through some script like what have been done in rack-awareness.
    4. One command is provided to hadoop admin to shrink the cluster to the target size directly. Some policy can be applied here for waiting or not waiting task completed. If target size is smaller than elastic factor * current size, some data node will be killed too but in a well planned way.
    My 2 cents.
    These are all good ideas. The other trick -which has been discussed
    recently in the context of the Platform Scheduler- is to run HDFS across
    all nodes, but switch the workload of the cluster between Hadoop jobs
    (MR, Graph, Hamster), and other work (Grid jobs). That way the
    filesystem is just a very large FS for anything. If some grid jobs don't
    use the HDFS, the nodes can still serve up their data.
  • Milind Bhandarkar at Sep 19, 2011 at 7:38 pm



    These are all good ideas. The other trick -which has been discussed
    recently in the context of the Platform Scheduler- is to run HDFS across
    all nodes, but switch the workload of the cluster between Hadoop jobs
    (MR, Graph, Hamster), and other work (Grid jobs). That way the
    filesystem is just a very large FS for anything. If some grid jobs don't
    use the HDFS, the nodes can still serve up their data.
    This used to be called Hadoop On Demand (HoD), which used to deploy a
    mapreduce cluster on-demand, using torque to allocate nodes. :-)

    - milind

    ---
    Milind Bhandarkar
    Greenplum Labs, EMC
    (Disclaimer: Opinions expressed in this email are those of the author, and
    do not necessarily represent the views of any organization, past or
    present, the author might be affiliated with.)
  • Arun C Murthy at Sep 19, 2011 at 7:47 pm

    On Sep 15, 2011, at 2:26 AM, Steve Loughran wrote:
    These are all good ideas. The other trick -which has been discussed recently in the context of the Platform Scheduler- is to run HDFS across all nodes, but switch the workload of the cluster between Hadoop jobs (MR, Graph, Hamster), and other work (Grid jobs). That way the filesystem is just a very large FS for anything. If some grid jobs don't use the HDFS, the nodes can still serve up their data.
    Or, one can port other work to run on MR2 ;)

    Arun

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 14, '11 at 8:28p
activeSep 19, '11 at 7:47p
posts11
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase