FAQ
Hi,

Instead of depending on local syncup to configuration files, would it be a
nice way to adopt JINI Discovery model, where in masters and slaves can
discover each other dynamically through a UDP broadcast/heart beat methods.

This would mean, any machine can come up and say I am a slave and
automatically discover the master and start supporting the master with in <
x seconds.

Regards,
Raja Nagendra Kumar,
C.T.O
www.tejasoft.com
-Hadoop Adoption Consulting


--
View this message in context: http://old.nabble.com/Hadoop-Master-and-Slave-Discovery-tp31981952p31981952.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Search Discussions

  • Steve Loughran at Jul 4, 2011 at 10:45 am

    On 03/07/11 03:11, Raja Nagendra Kumar wrote:
    Hi,

    Instead of depending on local syncup to configuration files, would it be a
    nice way to adopt JINI Discovery model, where in masters and slaves can
    discover each other dynamically through a UDP broadcast/heart beat methods
    That assumes that UDP Broadcast is supported through the switches (many
    turn it off as it creates too much traffic), or UDP multicast is
    supported (as an example of an infrastructure that does not, play with EC2)

    This would mean, any machine can come up and say I am a slave and
    automatically discover the master and start supporting the master with in<
    x seconds.

    How will your slave determine that the master that it has bonded to is
    the master that it should bond to and not something malicious within the
    same multicast range? It's possible, but you have generally have to have
    configuration files in the worker nodes.

    There's nothing to stop your Configuration Management layer using
    discovery or central config servers (Zookeeper, Anubis, LDAP, DNS, ...),
    which then pushes desired state information to the client nodes. These
    can deal with auth and may support infrastructures that don't support
    broadcast or multicast. Such tooling also gives you the ability to push
    out host table config, JVM options, logging parameters, and bounce
    worker nodes into new states without waiting for them to timeout and try
    to rediscover new masters.
  • Ted Dunning at Jul 4, 2011 at 5:22 pm
    One reasonable suggestion that I have heard recently was to do like Google
    does and put a DNS front end onto Zookeeper. Machines would need to have
    DNS set up properly and a requests for a special ZK based domain would have
    to be delegated to the fancy DNS setup, but this would allow all kinds of
    host targeted configuration settings to be moved by a very standardized
    network protocol. There are difficulties moving port numbers and all you
    really get are hostnames, but it is a nice trick since configuration of
    which nameserver to use is a common admin task.
    On Mon, Jul 4, 2011 at 3:44 AM, Steve Loughran wrote:
    On 03/07/11 03:11, Raja Nagendra Kumar wrote:


    Hi,

    Instead of depending on local syncup to configuration files, would it be a
    nice way to adopt JINI Discovery model, where in masters and slaves can
    discover each other dynamically through a UDP broadcast/heart beat methods
    That assumes that UDP Broadcast is supported through the switches (many
    turn it off as it creates too much traffic), or UDP multicast is supported
    (as an example of an infrastructure that does not, play with EC2)


    This would mean, any machine can come up and say I am a slave and
    automatically discover the master and start supporting the master with in<
    x seconds.
    How will your slave determine that the master that it has bonded to is the
    master that it should bond to and not something malicious within the same
    multicast range? It's possible, but you have generally have to have
    configuration files in the worker nodes.

    There's nothing to stop your Configuration Management layer using discovery
    or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), which then
    pushes desired state information to the client nodes. These can deal with
    auth and may support infrastructures that don't support broadcast or
    multicast. Such tooling also gives you the ability to push out host table
    config, JVM options, logging parameters, and bounce worker nodes into new
    states without waiting for them to timeout and try to rediscover new
    masters.
  • Steve Loughran at Jul 5, 2011 at 9:42 am

    On 04/07/11 18:22, Ted Dunning wrote:
    One reasonable suggestion that I have heard recently was to do like Google
    does and put a DNS front end onto Zookeeper. Machines would need to have
    DNS set up properly and a requests for a special ZK based domain would have
    to be delegated to the fancy DNS setup, but this would allow all kinds of
    host targeted configuration settings to be moved by a very standardized
    network protocol. There are difficulties moving port numbers and all you
    really get are hostnames, but it is a nice trick since configuration of
    which nameserver to use is a common admin task.
    good point

    1. you could use DNS proper, by way of Bonjour/avahi. You don't need to
    be running any mDNS server to support .local, and I would strongly
    advise against it in a large cluster (because .local resolution puts a
    lot of CPU load on every server in the subnet). What you can do is have
    the DNS server register some .local entries and have the clients use
    this to bind. You probably also need to set the dns TTLs in the JVM. In
    a large clusters that'll just add to the DNS traffic, so it's where host
    tables start to look appealing

    2. Apache MINA is set up to serve its directory data in lots of ways,
    including what appears to be text files over NFS. This is an even nicer
    trick. If you could get MINA to serve up the ZK data, life is very simple

    On Mon, Jul 4, 2011 at 3:44 AM, Steve Loughranwrote:
    On 03/07/11 03:11, Raja Nagendra Kumar wrote:


    Hi,

    Instead of depending on local syncup to configuration files, would it be a
    nice way to adopt JINI Discovery model, where in masters and slaves can
    discover each other dynamically through a UDP broadcast/heart beat methods
    That assumes that UDP Broadcast is supported through the switches (many
    turn it off as it creates too much traffic), or UDP multicast is supported
    (as an example of an infrastructure that does not, play with EC2)


    This would mean, any machine can come up and say I am a slave and
    automatically discover the master and start supporting the master with in<
    x seconds.
    How will your slave determine that the master that it has bonded to is the
    master that it should bond to and not something malicious within the same
    multicast range? It's possible, but you have generally have to have
    configuration files in the worker nodes.

    There's nothing to stop your Configuration Management layer using discovery
    or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), which then
    pushes desired state information to the client nodes. These can deal with
    auth and may support infrastructures that don't support broadcast or
    multicast. Such tooling also gives you the ability to push out host table
    config, JVM options, logging parameters, and bounce worker nodes into new
    states without waiting for them to timeout and try to rediscover new
    masters.
  • Eric Yang at Jul 5, 2011 at 10:00 pm
    In another project, I have implemented a bonjour beacon (jmdns) which
    sit on the Zookeeper nodes to advertise the location of zookeeper
    servers. When clients start up, it will discover location of
    zookeeper through multicast dns. Once, the server locations are
    resolved (ip:port and TXT records), the clients shutdown mdns
    resolvers. Client proceed to use the resolved list for zookeeper
    access. There does not seem to be cpu overhead incurred by the
    beacon, nor the clients. If a client could not connect to zookeeper
    anymore, then it will start mdns resolvers to look for new list of
    zookeeper servers. The code for the project is located at:

    http://github.com/macroadster/hms

    It may be possible to use similar approach for location resolution,
    and load rest of the config through zookeeper.

    regards,
    Eric
    On Tue, Jul 5, 2011 at 2:40 AM, Steve Loughran wrote:
    On 04/07/11 18:22, Ted Dunning wrote:

    One reasonable suggestion that I have heard recently was to do like Google
    does and put a DNS front end onto Zookeeper.  Machines would need to have
    DNS set up properly and a requests for a special ZK based domain would
    have
    to be delegated to the fancy DNS setup, but this would allow all kinds of
    host targeted configuration settings to be moved by a very standardized
    network protocol.  There are difficulties moving port numbers and all you
    really get are hostnames, but it is a nice trick since configuration of
    which nameserver to use is a common admin task.
    good point

    1. you could use DNS proper, by way of Bonjour/avahi. You don't need to be
    running any mDNS server to support .local, and I would strongly advise
    against it in a large cluster (because .local resolution puts a lot of CPU
    load on every server in the subnet). What you can do is have the DNS server
    register some .local entries and have the clients use this to bind. You
    probably also need to set the dns TTLs in the JVM. In a large clusters
    that'll just add to the DNS traffic, so it's where host tables start to look
    appealing

    2. Apache MINA is set up to serve its directory data in lots of ways,
    including what appears to be text files over NFS. This is an even nicer
    trick. If you could get MINA to serve up the ZK data, life is very simple

    On Mon, Jul 4, 2011 at 3:44 AM, Steve Loughranwrote:
    On 03/07/11 03:11, Raja Nagendra Kumar wrote:


    Hi,

    Instead of depending on local syncup to configuration files, would it be
    a
    nice way to adopt JINI Discovery model, where in masters and slaves can
    discover each other dynamically through a UDP broadcast/heart beat
    methods
    That assumes that UDP Broadcast is supported through the switches (many
    turn it off as it creates too much traffic), or UDP multicast is
    supported
    (as an example of an infrastructure that does not, play with EC2)


    This would mean, any machine can come up and say I am a slave and
    automatically discover the master and start supporting the master with
    in<
    x seconds.
    How will your slave determine that the master that it has bonded to is
    the
    master that it should bond to and not something malicious within the same
    multicast range? It's possible, but you have generally have to have
    configuration files in the worker nodes.

    There's nothing to stop your Configuration Management layer using
    discovery
    or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), which then
    pushes desired state information to the client nodes. These can deal with
    auth and may support infrastructures that don't support broadcast or
    multicast. Such tooling also gives you the ability to push out host table
    config, JVM options, logging parameters, and bounce worker nodes into new
    states without waiting for them to timeout and try to rediscover new
    masters.
  • Steve Loughran at Jul 6, 2011 at 9:53 am

    On 05/07/11 23:00, Eric Yang wrote:
    In another project, I have implemented a bonjour beacon (jmdns) which
    sit on the Zookeeper nodes to advertise the location of zookeeper
    servers. When clients start up, it will discover location of
    zookeeper through multicast dns. Once, the server locations are
    resolved (ip:port and TXT records), the clients shutdown mdns
    resolvers. Client proceed to use the resolved list for zookeeper
    access. There does not seem to be cpu overhead incurred by the
    beacon, nor the clients. If a client could not connect to zookeeper
    anymore, then it will start mdns resolvers to look for new list of
    zookeeper servers. The code for the project is located at:

    http://github.com/macroadster/hms

    It may be possible to use similar approach for location resolution,
    and load rest of the config through zookeeper.

    regards,
    Eric
    That's interesting; I think it's more important for clients to be able
    to bind dynamically than it is for the cluster machines, as they should
    be managed anyway.

    When I was doing the hadoop-in-VM clustering stuff, I had a well-known
    URL to serve up the relevant XML file for the cluster from the JT -all
    it did was relay the request to the JT at whatever host it had been
    assigned. All the clients needed to know was the URL of the config
    server, and they could bootstrap to working against clusters whose FS
    and JT URLs were different from run to run.

    zookeeper discovery would benefit a lot of projects
  • Patrick Hunt at Jul 6, 2011 at 4:10 pm
    There's a long standing "ZooKeeper DNS server" jira which can be found
    here, someone has already created a basic implementation:
    https://issues.apache.org/jira/browse/ZOOKEEPER-703

    Patrick
    On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran wrote:
    On 05/07/11 23:00, Eric Yang wrote:

    In another project, I have implemented a bonjour beacon (jmdns) which
    sit on the Zookeeper nodes to advertise the location of zookeeper
    servers.  When clients start up, it will discover location of
    zookeeper through multicast dns.  Once, the server locations are
    resolved (ip:port and TXT records), the clients shutdown mdns
    resolvers.  Client proceed to use the resolved list for zookeeper
    access.  There does not seem to be cpu overhead incurred by the
    beacon, nor the clients.  If a client could not connect to zookeeper
    anymore, then it will start mdns resolvers to look for new list of
    zookeeper servers.  The code for the project is located at:

    http://github.com/macroadster/hms

    It may be possible to use similar approach for location resolution,
    and load rest of the config through zookeeper.

    regards,
    Eric
    That's interesting; I think it's more important for clients to be able to
    bind dynamically than it is for the cluster machines, as they should be
    managed anyway.

    When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
    to serve up the relevant XML file for the cluster from the JT -all it did
    was relay the request to the JT at whatever host it had been assigned. All
    the clients needed to know was the URL of the config server, and they could
    bootstrap to working against clusters whose FS and JT URLs were different
    from run to run.

    zookeeper discovery would benefit a lot of projects
  • Eric Yang at Jul 6, 2011 at 4:15 pm
    It would be nicer, if it was written in Java. I think something wrap
    on top of jmdns would be a better fit for Zookeeper.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 9:10 AM, Patrick Hunt wrote:
    There's a long standing "ZooKeeper DNS server" jira which can be found
    here, someone has already created a basic implementation:
    https://issues.apache.org/jira/browse/ZOOKEEPER-703

    Patrick
    On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran wrote:
    On 05/07/11 23:00, Eric Yang wrote:

    In another project, I have implemented a bonjour beacon (jmdns) which
    sit on the Zookeeper nodes to advertise the location of zookeeper
    servers.  When clients start up, it will discover location of
    zookeeper through multicast dns.  Once, the server locations are
    resolved (ip:port and TXT records), the clients shutdown mdns
    resolvers.  Client proceed to use the resolved list for zookeeper
    access.  There does not seem to be cpu overhead incurred by the
    beacon, nor the clients.  If a client could not connect to zookeeper
    anymore, then it will start mdns resolvers to look for new list of
    zookeeper servers.  The code for the project is located at:

    http://github.com/macroadster/hms

    It may be possible to use similar approach for location resolution,
    and load rest of the config through zookeeper.

    regards,
    Eric
    That's interesting; I think it's more important for clients to be able to
    bind dynamically than it is for the cluster machines, as they should be
    managed anyway.

    When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
    to serve up the relevant XML file for the cluster from the JT -all it did
    was relay the request to the JT at whatever host it had been assigned. All
    the clients needed to know was the URL of the config server, and they could
    bootstrap to working against clusters whose FS and JT URLs were different
    from run to run.

    zookeeper discovery would benefit a lot of projects
  • Patrick Hunt at Jul 6, 2011 at 4:18 pm
    Eric, I'd be happy to work with you to get it committed if you'd like
    to take a whack. Would be a great addition to contrib.

    Patrick
    On Wed, Jul 6, 2011 at 9:15 AM, Eric Yang wrote:
    It would be nicer, if it was written in Java.  I think something wrap
    on top of jmdns would be a better fit for Zookeeper.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 9:10 AM, Patrick Hunt wrote:
    There's a long standing "ZooKeeper DNS server" jira which can be found
    here, someone has already created a basic implementation:
    https://issues.apache.org/jira/browse/ZOOKEEPER-703

    Patrick
    On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran wrote:
    On 05/07/11 23:00, Eric Yang wrote:

    In another project, I have implemented a bonjour beacon (jmdns) which
    sit on the Zookeeper nodes to advertise the location of zookeeper
    servers.  When clients start up, it will discover location of
    zookeeper through multicast dns.  Once, the server locations are
    resolved (ip:port and TXT records), the clients shutdown mdns
    resolvers.  Client proceed to use the resolved list for zookeeper
    access.  There does not seem to be cpu overhead incurred by the
    beacon, nor the clients.  If a client could not connect to zookeeper
    anymore, then it will start mdns resolvers to look for new list of
    zookeeper servers.  The code for the project is located at:

    http://github.com/macroadster/hms

    It may be possible to use similar approach for location resolution,
    and load rest of the config through zookeeper.

    regards,
    Eric
    That's interesting; I think it's more important for clients to be able to
    bind dynamically than it is for the cluster machines, as they should be
    managed anyway.

    When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
    to serve up the relevant XML file for the cluster from the JT -all it did
    was relay the request to the JT at whatever host it had been assigned. All
    the clients needed to know was the URL of the config server, and they could
    bootstrap to working against clusters whose FS and JT URLs were different
    from run to run.

    zookeeper discovery would benefit a lot of projects
  • Eric Yang at Jul 6, 2011 at 6:26 pm
    I am currently working on RPM packages for Zookeeper, Pig, Hive and
    HCat. It may take a while for me to circle back to this. Never the
    less, it is interesting work that I would like to contrib. Thanks

    regards,
    Eric
    On Wed, Jul 6, 2011 at 9:18 AM, Patrick Hunt wrote:
    Eric, I'd be happy to work with you to get it committed if you'd like
    to take a whack. Would be a great addition to contrib.

    Patrick
    On Wed, Jul 6, 2011 at 9:15 AM, Eric Yang wrote:
    It would be nicer, if it was written in Java.  I think something wrap
    on top of jmdns would be a better fit for Zookeeper.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 9:10 AM, Patrick Hunt wrote:
    There's a long standing "ZooKeeper DNS server" jira which can be found
    here, someone has already created a basic implementation:
    https://issues.apache.org/jira/browse/ZOOKEEPER-703

    Patrick
    On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran wrote:
    On 05/07/11 23:00, Eric Yang wrote:

    In another project, I have implemented a bonjour beacon (jmdns) which
    sit on the Zookeeper nodes to advertise the location of zookeeper
    servers.  When clients start up, it will discover location of
    zookeeper through multicast dns.  Once, the server locations are
    resolved (ip:port and TXT records), the clients shutdown mdns
    resolvers.  Client proceed to use the resolved list for zookeeper
    access.  There does not seem to be cpu overhead incurred by the
    beacon, nor the clients.  If a client could not connect to zookeeper
    anymore, then it will start mdns resolvers to look for new list of
    zookeeper servers.  The code for the project is located at:

    http://github.com/macroadster/hms

    It may be possible to use similar approach for location resolution,
    and load rest of the config through zookeeper.

    regards,
    Eric
    That's interesting; I think it's more important for clients to be able to
    bind dynamically than it is for the cluster machines, as they should be
    managed anyway.

    When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
    to serve up the relevant XML file for the cluster from the JT -all it did
    was relay the request to the JT at whatever host it had been assigned. All
    the clients needed to know was the URL of the config server, and they could
    bootstrap to working against clusters whose FS and JT URLs were different
    from run to run.

    zookeeper discovery would benefit a lot of projects
  • Allen Wittenauer at Jul 6, 2011 at 11:02 pm

    On Jul 5, 2011, at 2:40 AM, Steve Loughran wrote:
    1. you could use DNS proper, by way of Bonjour/avahi. You don't need to be running any mDNS server to support .local, and I would strongly advise against it in a large cluster (because .local resolution puts a lot of CPU load on every server in the subnet).
    +1 mDNS doesn't scale to large sizes. The only number I've ever heard is up to 1000 hosts (not services!) before the whole system falls apart. I don't think it was meant to scale past like a class C subnet.

    Something else to keep in mind: a lot of network gear gets multicast really really wrong. There are reasons why network admins are very happy that Hadoop doesn't use multicast.

    ... and all that's before one talks about the security implications.
  • Eric Yang at Jul 7, 2011 at 12:06 am
    Did you know that almost all linux desktop system comes with avahi
    pre-installed and turn on by default? What is more interesting is
    that there are thousands of those machines broadcast in large
    cooperation without anyone noticing them? I have recently built a
    multicast dns browser and look into the number of machines running in
    a large company environment. The number of desktop, laptop and
    printer machines running multicast dns is far exceeding 1000 machines
    in the local subnet. They are all happily working fine without
    causing any issues. Printer works fine, itune sharing from someone
    else works fine. For some reason, things tend to work better on my
    side of universe. :) Allen, if you want to get stuck on stone age
    tools, I won't stop you.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 4:02 PM, Allen Wittenauer wrote:
    On Jul 5, 2011, at 2:40 AM, Steve Loughran wrote:
    1. you could use DNS proper, by way of Bonjour/avahi. You don't need to be running any mDNS server to support .local, and I would strongly advise against it in a large cluster (because .local resolution puts a lot of CPU load on every server in the subnet).
    +1 mDNS doesn't scale to large sizes.  The only number I've ever heard is up to 1000 hosts (not services!) before the whole system falls apart. I don't think it was meant to scale past like a class C subnet.

    Something else to keep in mind:  a lot of network gear gets multicast really really wrong.  There are reasons why network admins are very happy that Hadoop doesn't use multicast.

    ... and all that's before one talks about the security implications.
  • Allen Wittenauer at Jul 7, 2011 at 12:49 am

    On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:

    Did you know that almost all linux desktop system comes with avahi
    pre-installed and turn on by default?
    ... which is why most admins turn those services off by default. :)
    What is more interesting is
    that there are thousands of those machines broadcast in large
    cooperation without anyone noticing them?
    That's because many network teams turn off multicast past the subnet boundary and many corporate desktops are in class C subnets. This automatically limits the host count down to 200-ish per network. Usually just the unicast traffic is bad enough. Throwing multicast into the mix just makes it worse.
    I have recently built a
    multicast dns browser and look into the number of machines running in
    a large company environment. The number of desktop, laptop and
    printer machines running multicast dns is far exceeding 1000 machines
    in the local subnet.
    From my understanding of Y!'s network, the few /22's they have (which would get you 1022 potential hosts on a subnet) have multicast traffic dropped at the router and switch levels. Additionally, DNS-SD (the service discovery portion of mDNS) offers unicast support as well. So there is a very good chance that the traffic you are seeing is from unicast, not multicast.

    The 1000 number, BTW, comes from Apple. I'm sure they'd be interested in your findings given their role in ZC.

    BTW, I'd much rather hear that you set up a /22 with many many machines running VMs trying to actually use mDNS for something useful. A service browser really isn't that interesting.
    They are all happily working fine without causing any issues.
    ... that you know of. Again, I'm 99% certain that Y! is dropping multicast packets into the bit bucket at the switch boundaries. [I remember having this conversation with them when we setup the new data centers.]
    Printer works fine,
    Most admins turn SLP and other broadcast services on printers off. For large networks, one usually sees print services enabled via AD or master print servers broadcasting the information on the local subnet. This allows a central point of control rather than randomness. Snow Leopard (I don't think Leopard did this) actually tells you where the printer is coming from now, so that's handy to see if they are ZC or AD or whatever.
    itune sharing from someone
    else works fine.
    iTunes specifically limits its reach so that it can't extend beyond the local subnet and definitely does unicast in addition to ZC, so that doesn't really say much of anything, other than potentially invalidating your results.
    For some reason, things tend to work better on my
    side of universe. :)
    I'm sure it does, but not for the reasons you think they do.
    Allen, if you want to get stuck on stone age
    tools, I won't stop you.
    Multicast has a time and place (mainly for small, non-busy networks). Using it without understanding the network impact is never a good idea.

    FWIW, I've seen multicast traffic bring down an entire campus of tens of thousands of machines due to routers and switches having bugs where they didn't subtract from the packet's TTL. I'm not the only one with these types of experiences. Anything multicast is going to have a very large uphill battle for adoption because of these widespread problems. Many network vendors really don't get this one right, for some reason.
  • Eric Yang at Jul 7, 2011 at 5:17 pm
    Internet Assigned Number Authority has allocated 169.254.1.0 to
    169.254.254.255 for the propose of communicate between nodes. This is
    65024 IP address designed for local area network only. They are not
    allow to be routed. Zeroconf is randomly selecting one address out of
    the 65024 available address, and broadcasts ARP message. If no one is
    using this address, then the machine will use the selected ip address
    and communicate with zeroconf service for name resolution. If the
    address is already in use, the system restart from scratch to pick
    another address. Hence, the actual limit is not bound to 1000, but
    65024. In real life, it is unlikely to use all 65024 for name
    resolution due to chance of loss packet on modern ethernet (10^-12) or
    delay from repetitive selection of the the same ip address from
    different hosts. It can easily push the limit to 10,000-20,000 nodes
    without losing reliability in server farm settings.

    It would be nice to support both dynamic discovery of master and
    slaves, and preserve the exist configuration style management for EC2
    like deployment. This is one innovation worth having.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer wrote:
    On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:

    Did you know that almost all linux desktop system comes with avahi
    pre-installed and turn on by default?
    ... which is why most admins turn those services off by default. :)
    What is more interesting is
    that there are thousands of those machines broadcast in large
    cooperation without anyone noticing them?
    That's because many network teams turn off multicast past the subnet boundary and many corporate desktops are in class C subnets.  This automatically limits the host count down to 200-ish per network.  Usually just the unicast traffic is bad enough.  Throwing multicast into the mix just makes it worse.
    I have recently built a
    multicast dns browser and look into the number of machines running in
    a large company environment.  The number of desktop, laptop and
    printer machines running multicast dns is far exceeding 1000 machines
    in the local subnet.
    From my understanding of Y!'s network, the few /22's they have (which would get you 1022 potential hosts on a subnet) have multicast traffic dropped at the router and switch levels.  Additionally, DNS-SD (the service discovery portion of mDNS) offers unicast support as well.  So there is a very good chance that the traffic you are seeing is from unicast, not multicast.

    The 1000 number, BTW, comes from Apple.  I'm sure they'd be interested in your findings given their role in ZC.

    BTW, I'd much rather hear that you set up a /22 with many many machines running VMs trying to actually use mDNS for something useful.  A service browser really isn't that interesting.
    They are all happily working fine without causing any issues.
    ... that you know of.  Again, I'm 99% certain that Y! is dropping multicast packets into the bit bucket at the switch boundaries.  [I remember having this conversation with them when we setup the new data centers.]
    Printer works fine,
    Most admins turn SLP and other broadcast services on printers off.   For large networks, one usually sees print services enabled via AD or master print servers broadcasting the information on the local subnet.  This allows a central point of control rather than randomness.   Snow Leopard (I don't think Leopard did this) actually tells you where the printer is coming from now, so that's handy to see if they are ZC or AD or whatever.
    itune sharing from someone
    else works fine.
    iTunes specifically limits its reach so that it can't extend beyond the local subnet and definitely does unicast in addition to ZC, so that doesn't really say much of anything, other than potentially invalidating your results.
    For some reason, things tend to work better on my
    side of universe. :)
    I'm sure it does, but not for the reasons you think they do.
    Allen, if you want to get stuck on stone age
    tools, I won't stop you.
    Multicast has a time and place (mainly for small, non-busy networks).  Using it without understanding the network impact is never a good idea.

    FWIW, I've seen multicast traffic bring down an entire campus of tens of thousands of machines due to routers and switches having bugs where they didn't subtract from the packet's TTL.  I'm not the only one with these types of experiences.  Anything multicast is going to have a very large uphill battle for adoption because of these widespread problems.  Many network vendors really don't get this one right, for some reason.
  • Warren Turkal at Jul 7, 2011 at 6:04 pm
    Stable infrastructures require deterministic behavior to be understood. I
    believe that mdns limits the determinism of a system by requiring that I
    accept that a machine will be picking a random address. When I setup my DCs
    I want machines to have a single constant ip address so that I don't have to
    do a bunch of work to figure out which machine I am trying to talk to when
    things go wrong.

    As such, I find it hard to believe that mdns belongs in large DCs.

    wt
    On Thu, Jul 7, 2011 at 10:17 AM, Eric Yang wrote:

    Internet Assigned Number Authority has allocated 169.254.1.0 to
    169.254.254.255 for the propose of communicate between nodes. This is
    65024 IP address designed for local area network only. They are not
    allow to be routed. Zeroconf is randomly selecting one address out of
    the 65024 available address, and broadcasts ARP message. If no one is
    using this address, then the machine will use the selected ip address
    and communicate with zeroconf service for name resolution. If the
    address is already in use, the system restart from scratch to pick
    another address. Hence, the actual limit is not bound to 1000, but
    65024. In real life, it is unlikely to use all 65024 for name
    resolution due to chance of loss packet on modern ethernet (10^-12) or
    delay from repetitive selection of the the same ip address from
    different hosts. It can easily push the limit to 10,000-20,000 nodes
    without losing reliability in server farm settings.

    It would be nice to support both dynamic discovery of master and
    slaves, and preserve the exist configuration style management for EC2
    like deployment. This is one innovation worth having.

    regards,
    Eric
    On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer wrote:
    On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:

    Did you know that almost all linux desktop system comes with avahi
    pre-installed and turn on by default?
    ... which is why most admins turn those services off by default.
    :)
    What is more interesting is
    that there are thousands of those machines broadcast in large
    cooperation without anyone noticing them?
    That's because many network teams turn off multicast past the
    subnet boundary and many corporate desktops are in class C subnets. This
    automatically limits the host count down to 200-ish per network. Usually
    just the unicast traffic is bad enough. Throwing multicast into the mix
    just makes it worse.
    I have recently built a
    multicast dns browser and look into the number of machines running in
    a large company environment. The number of desktop, laptop and
    printer machines running multicast dns is far exceeding 1000 machines
    in the local subnet.
    From my understanding of Y!'s network, the few /22's they have
    (which would get you 1022 potential hosts on a subnet) have multicast
    traffic dropped at the router and switch levels. Additionally, DNS-SD (the
    service discovery portion of mDNS) offers unicast support as well. So there
    is a very good chance that the traffic you are seeing is from unicast, not
    multicast.
    The 1000 number, BTW, comes from Apple. I'm sure they'd be
    interested in your findings given their role in ZC.
    BTW, I'd much rather hear that you set up a /22 with many many
    machines running VMs trying to actually use mDNS for something useful. A
    service browser really isn't that interesting.
    They are all happily working fine without causing any issues.
    ... that you know of. Again, I'm 99% certain that Y! is dropping
    multicast packets into the bit bucket at the switch boundaries. [I remember
    having this conversation with them when we setup the new data centers.]
    Printer works fine,
    Most admins turn SLP and other broadcast services on printers off.
    For large networks, one usually sees print services enabled via AD or
    master print servers broadcasting the information on the local subnet. This
    allows a central point of control rather than randomness. Snow Leopard (I
    don't think Leopard did this) actually tells you where the printer is coming
    from now, so that's handy to see if they are ZC or AD or whatever.
    itune sharing from someone
    else works fine.
    iTunes specifically limits its reach so that it can't extend
    beyond the local subnet and definitely does unicast in addition to ZC, so
    that doesn't really say much of anything, other than potentially
    invalidating your results.
    For some reason, things tend to work better on my
    side of universe. :)
    I'm sure it does, but not for the reasons you think they do.
    Allen, if you want to get stuck on stone age
    tools, I won't stop you.
    Multicast has a time and place (mainly for small, non-busy
    networks). Using it without understanding the network impact is never a
    good idea.
    FWIW, I've seen multicast traffic bring down an entire campus of
    tens of thousands of machines due to routers and switches having bugs where
    they didn't subtract from the packet's TTL. I'm not the only one with these
    types of experiences. Anything multicast is going to have a very large
    uphill battle for adoption because of these widespread problems. Many
    network vendors really don't get this one right, for some reason.
  • Warren Turkal at Jul 7, 2011 at 5:57 pm
    Frankly, I agree with Allen's comments.

    I think that discovering the zookeeper should be done with a well known DNS
    address (e.g. zookeeper.$cluster.prod.example.com). It would be pretty rare
    for something like the address of the zookeeper to change in a stable
    infrastructure. When it does, DNS can be updated as part of the procedure of
    the change.

    Using multicast on the other hand introduces a higher barrier to getting a
    hadoop cluster running as one must then troubleshoot and multicast issues
    that come up.

    wt
    On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer wrote:

    On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:

    Did you know that almost all linux desktop system comes with avahi
    pre-installed and turn on by default?
    ... which is why most admins turn those services off by default. :)
    What is more interesting is
    that there are thousands of those machines broadcast in large
    cooperation without anyone noticing them?
    That's because many network teams turn off multicast past the subnet
    boundary and many corporate desktops are in class C subnets. This
    automatically limits the host count down to 200-ish per network. Usually
    just the unicast traffic is bad enough. Throwing multicast into the mix
    just makes it worse.
    I have recently built a
    multicast dns browser and look into the number of machines running in
    a large company environment. The number of desktop, laptop and
    printer machines running multicast dns is far exceeding 1000 machines
    in the local subnet.
    From my understanding of Y!'s network, the few /22's they have
    (which would get you 1022 potential hosts on a subnet) have multicast
    traffic dropped at the router and switch levels. Additionally, DNS-SD (the
    service discovery portion of mDNS) offers unicast support as well. So there
    is a very good chance that the traffic you are seeing is from unicast, not
    multicast.

    The 1000 number, BTW, comes from Apple. I'm sure they'd be
    interested in your findings given their role in ZC.

    BTW, I'd much rather hear that you set up a /22 with many many
    machines running VMs trying to actually use mDNS for something useful. A
    service browser really isn't that interesting.
    They are all happily working fine without causing any issues.
    ... that you know of. Again, I'm 99% certain that Y! is dropping
    multicast packets into the bit bucket at the switch boundaries. [I remember
    having this conversation with them when we setup the new data centers.]
    Printer works fine,
    Most admins turn SLP and other broadcast services on printers off.
    For large networks, one usually sees print services enabled via AD or master
    print servers broadcasting the information on the local subnet. This allows
    a central point of control rather than randomness. Snow Leopard (I don't
    think Leopard did this) actually tells you where the printer is coming from
    now, so that's handy to see if they are ZC or AD or whatever.
    itune sharing from someone
    else works fine.
    iTunes specifically limits its reach so that it can't extend beyond
    the local subnet and definitely does unicast in addition to ZC, so that
    doesn't really say much of anything, other than potentially invalidating
    your results.
    For some reason, things tend to work better on my
    side of universe. :)
    I'm sure it does, but not for the reasons you think they do.
    Allen, if you want to get stuck on stone age
    tools, I won't stop you.
    Multicast has a time and place (mainly for small, non-busy
    networks). Using it without understanding the network impact is never a
    good idea.

    FWIW, I've seen multicast traffic bring down an entire campus of
    tens of thousands of machines due to routers and switches having bugs where
    they didn't subtract from the packet's TTL. I'm not the only one with these
    types of experiences. Anything multicast is going to have a very large
    uphill battle for adoption because of these widespread problems. Many
    network vendors really don't get this one right, for some reason.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJul 3, '11 at 2:12a
activeJul 7, '11 at 6:04p
posts16
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase