FAQ
Hi,

I was actually the person who mentioned the idea the Zookeeper nameservice
to Mr. Dunning at Hadoop Summit, which he mentioned in the "Hadoop Master
and Slave Discovery" thread. My idea was a little more complex than just a
DNS server. The DNS server would be useful for finding web frontends for
various Hadoop services (JobTracker, TaskTracker, Namenode, etc.). However,
the real magic would be creating a zookeeper specific naming service that
could notify the listeners whenever an address:port changes. Keeping in mind
that I am not very good with zookeeper yet, consider the following (with
next gen mapreduce names):


- /zns/$cluster/resource_manager/elected_master - contains ip:address of
current resource manager
- /zns/$cluster/resource_manager/managers/* - each node is named after
one of the members of the resource master group of machines and contains
ip:address of the resource manager
- /zns/$cluster/node_managers/* - each node is named after one of the
node managers and contains the ip:address of the node manager

Then one could use zookeeper instead of DNS to discover everything and be
notified when it changes (thus avoiding the DNS TTL issue). For example,
when a new node manager comes up, it could create and ephemeral node under
the node_managers/ hierarchy. Then the master would be notified when that
happens and the master could contact and configure the machine. All that the
resource and node managers would have to know is where the root of the
zookeeper node hierarchy is.

Also, there could really nice way to access web services identified by those
nodes. There could be DNS server that is authoritative for names in the
$cluster.zns.example domain. It could answer with SRV or A or AAAA records
for names like elected_master.resource_manager.$cluster.zns.examplebe. In
the case of a SRV record, you can include a address and port. In the case of
A or AAAA, you could respond with an address for a web proxy that serves up
the appropriate ip:port or a web redirector that redirects to http://ip:port
/$query_str.

I could also imagine having /zns/$cluster/jobs/$username/$jobname/$taskid
(and $taskid.$jobname.$username.jobs.$cluster.zns.example via DNS/web proxy)
link to a specific task in a job. The /zns/$cluster/jobs/$username/$jobname
node could contain a list of all tasks for a particular job.
The /zns/$cluster/jobs/$username node could contain a list of all jobs
running under a specific user. /zns/$cluster/jobs could contain a list of
all jobs managed by the resource master.

I'm sure not all of what I have said is sound design, but I am hoping it
conveys my message. Also, I think something like this could be really cool.

Thanks,
wt

Search Discussions

  • Eric Yang at Jul 6, 2011 at 9:50 pm
    Hi Warren,

    A varient of same idea has been prototyped, and proven to work. In
    https://issues.apache.org/jira/browse/HADOOP-7417, the proposed hadoop
    deployment system is using mDNS to locate zookeeper location, and the
    cluster topology is described as
    /clusters/$cluster_name/$hostname/[$action_queue|$status_queue]. Each
    agent use its own hostname to look up the path structure to resolve
    what software installation and configuration procedure to work on. It
    is a great working model to coordinate large scale machines to perform
    staged procedures. From the prototype, we know that zookeeper and the
    design works well and can scale to 10s of thousands of machines.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 1:46 PM, Warren Turkal wrote:
    Hi,

    I was actually the person who mentioned the idea the Zookeeper nameservice
    to Mr. Dunning at Hadoop Summit, which he mentioned in the "Hadoop Master
    and Slave Discovery" thread. My idea was a little more complex than just a
    DNS server. The DNS server would be useful for finding web frontends for
    various Hadoop services (JobTracker, TaskTracker, Namenode, etc.). However,
    the real magic would be creating a zookeeper specific naming service that
    could notify the listeners whenever an address:port changes. Keeping in mind
    that I am not very good with zookeeper yet, consider the following (with
    next gen mapreduce names):


    - /zns/$cluster/resource_manager/elected_master - contains ip:address of
    current resource manager
    - /zns/$cluster/resource_manager/managers/* - each node is named after
    one of the members of the resource master group of machines and contains
    ip:address of the resource manager
    - /zns/$cluster/node_managers/* - each node is named after one of the
    node managers and contains the ip:address of the node manager

    Then one could use zookeeper instead of DNS to discover everything and be
    notified when it changes (thus avoiding the DNS TTL issue). For example,
    when a new node manager comes up, it could create and ephemeral node under
    the node_managers/ hierarchy. Then the master would be notified when that
    happens and the master could contact and configure the machine. All that the
    resource and node managers would have to know is where the root of the
    zookeeper node hierarchy is.

    Also, there could really nice way to access web services identified by those
    nodes. There could be DNS server that is authoritative for names in the
    $cluster.zns.example domain. It could answer with SRV or A or AAAA records
    for names like elected_master.resource_manager.$cluster.zns.examplebe. In
    the case of a SRV record, you can include a address and port. In the case of
    A or AAAA, you could respond with an address for a web proxy that serves up
    the appropriate ip:port or a web redirector that redirects to http://ip:port
    /$query_str.

    I could also imagine having /zns/$cluster/jobs/$username/$jobname/$taskid
    (and $taskid.$jobname.$username.jobs.$cluster.zns.example via DNS/web proxy)
    link to a specific task in a job. The /zns/$cluster/jobs/$username/$jobname
    node could contain a list of all tasks for a particular job.
    The /zns/$cluster/jobs/$username node could contain a list of all jobs
    running under a specific user. /zns/$cluster/jobs could contain a list of
    all jobs managed by the resource master.

    I'm sure not all of what I have said is sound design, but I am hoping it
    conveys my message. Also, I think something like this could be really cool.

    Thanks,
    wt

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJul 6, '11 at 9:10p
activeJul 6, '11 at 9:50p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Eric Yang: 1 post Warren Turkal: 1 post

People

Translate

site design / logo © 2022 Grokbase