We have a DNS installation that has a HA-Logic, that may fail for say 10
In such a case we experience the following:
* DNS goes down
* The Master gets this: "Received report from unknown server -- telling
it to MSG_CALL_SERVER_STARTUP" (Probably the IP is "unknown")
* The Regionservers do as directed, zookeeper logs state that /hbase/rs/
nodes are updated
* DNS goes up
Now there is no or a wrong master selection and no region can be served
anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could
reanimate the cluster...
We use host names in the regionservers file.
What could we change to be more robust against such a problem?