Grokbase Groups HBase user May 2016
FAQ
Hi all,

I'm trying to run a large CopyTable job between clusters in totally
different datacenters and I'm trying to determine what network connectivity
is required here.

As per the Cloudera blog post about Copytable, I understand that the
network should be such that "MR TaskTrackers can access all the HBase and
ZK nodes in the destination cluster." So in practise that means that the
source task trackers should be able to access:

* Zookeeper on port 2181
* the Master on its RPC port (16000)
* the Regions' on their RPC ports (16020)

Anything else I need to configure here? Does Hadoop on the source need to
talk to directly with the destination Hadoop etc?

Also, what's unclear to me is what I should be doing with DNS. I'm guessing
that the source cluster needs to be able to resolve the hostnames of remote
RegionServers and Master nodes as stored in Zookeeper. Anything else I need
to configure here?

Thanks for your time!

--
Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>

Search Discussions

  • Michael Stack at May 29, 2016 at 10:57 pm

    On Sat, May 28, 2016 at 9:23 PM, Lex Toumbourou wrote:

    Hi all,

    I'm trying to run a large CopyTable job between clusters in totally
    different datacenters and I'm trying to determine what network connectivity
    is required here.

    As per the Cloudera blog post about Copytable, I understand that the
    network should be such that "MR TaskTrackers can access all the HBase and
    ZK nodes in the destination cluster." So in practise that means that the
    source task trackers should be able to access:

    * Zookeeper on port 2181
    * the Master on its RPC port (16000)
    * the Regions' on their RPC ports (16020)
    You'd have access to the UIs?

    Anything else I need to configure here? Does Hadoop on the source need to
    talk to directly with the destination Hadoop etc?
    Looking at code, it looks like it is just the source MR task doing bulk
    mutations against remote cluster.


    Also, what's unclear to me is what I should be doing with DNS. I'm guessing
    that the source cluster needs to be able to resolve the hostnames of remote
    RegionServers and Master nodes as stored in Zookeeper. Anything else I need
    to configure here?
    Yeah. Source HBase client is doing puts against remote cluster so that
    means being able to read the remote metatable and then being able to
    address whatever regionserver it finds there from the destination cluster.

    St.Ack


    Thanks for your time!

    --
    Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>
  • Lex Toumbourou at May 30, 2016 at 9:58 am
    Great. Thank you, St.Ack.
    On 30 May 2016 at 08:57, Stack wrote:
    On Sat, May 28, 2016 at 9:23 PM, Lex Toumbourou wrote:

    Hi all,

    I'm trying to run a large CopyTable job between clusters in totally
    different datacenters and I'm trying to determine what network
    connectivity
    is required here.

    As per the Cloudera blog post about Copytable, I understand that the
    network should be such that "MR TaskTrackers can access all the HBase and
    ZK nodes in the destination cluster." So in practise that means that the
    source task trackers should be able to access:

    * Zookeeper on port 2181
    * the Master on its RPC port (16000)
    * the Regions' on their RPC ports (16020)
    You'd have access to the UIs?

    Anything else I need to configure here? Does Hadoop on the source need to
    talk to directly with the destination Hadoop etc?
    Looking at code, it looks like it is just the source MR task doing bulk
    mutations against remote cluster.


    Also, what's unclear to me is what I should be doing with DNS. I'm guessing
    that the source cluster needs to be able to resolve the hostnames of remote
    RegionServers and Master nodes as stored in Zookeeper. Anything else I need
    to configure here?
    Yeah. Source HBase client is doing puts against remote cluster so that
    means being able to read the remote metatable and then being able to
    address whatever regionserver it finds there from the destination cluster.

    St.Ack


    Thanks for your time!

    --
    Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>


    --
    Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMay 29, '16 at 4:24a
activeMay 30, '16 at 9:58a
posts3
users2
websitehbase.apache.org

2 users in discussion

Lex Toumbourou: 2 posts Michael Stack: 1 post

People

Translate

site design / logo © 2018 Grokbase