FAQ
Hey guys,

I have another Hadoop cluster that has Hive installed with its own metastore
and all. I would like to move/copy/export data from a bunch of Hive tables
from a different Hadoop cluster into this one.

Is this possible? What's the best way to do it? The hadoop/hive/derby
versions are the same.

Thanks!

Ryan

Search Discussions

  • Edward Capriolo at Feb 11, 2010 at 5:25 pm
    I think it is pretty simple
    1)distcp the warehouse
    2)rsync your derby DB
    --or--
    backup restore derby

    This assumes you are not going to edit anything while moving.

    On Thu, Feb 11, 2010 at 12:01 PM, Ryan LeCompte wrote:
    Hey guys,

    I have another Hadoop cluster that has Hive installed with its own metastore
    and all. I would like to move/copy/export data from a bunch of Hive tables
    from a different Hadoop cluster into this one.

    Is this possible? What's the best way to do it? The hadoop/hive/derby
    versions are the same.

    Thanks!

    Ryan
  • Namit Jain at Feb 11, 2010 at 5:30 pm
    Is it a one time operation or continuous one ?

    If it is a one-time operation, the steps suggested below should work.

    Otherwise, you need to set up a process which will continuously feed the source and
    apply changes in the destination. Let me know, if this is the case - we have a
    similar requirement in facebook, and have set up a replication process (which is not
    open-source), but I can tell the main design points.


    Thanks,
    -namit


    On 2/11/10 9:25 AM, "Edward Capriolo" wrote:

    I think it is pretty simple
    1)distcp the warehouse
    2)rsync your derby DB
    --or--
    backup restore derby

    This assumes you are not going to edit anything while moving.

    On Thu, Feb 11, 2010 at 12:01 PM, Ryan LeCompte wrote:
    Hey guys,

    I have another Hadoop cluster that has Hive installed with its own metastore
    and all. I would like to move/copy/export data from a bunch of Hive tables
    from a different Hadoop cluster into this one.

    Is this possible? What's the best way to do it? The hadoop/hive/derby
    versions are the same.

    Thanks!

    Ryan
  • Ryan LeCompte at Feb 11, 2010 at 5:32 pm
    I see. Here's my problem, though. I already have a new derby metastoredb
    setup and configured in the NEW cluster. I don't want to blow that one away,
    since I have some tables that I don't want to get rid of.

    I essentially want to do this:

    1) Grab tables from the old cluster and move them into the new cluster (one
    time operation)

    I guess I would need to somehow merge table information from the old derby
    metastoredb into the new one?

    I guess I could easily distcp the table directories under
    /user/hive/warehouse from the old cluster and upload them to
    /user/hive/warehouse in the new cluster.

    But what about Derby?

    On Thu, Feb 11, 2010 at 12:25 PM, Edward Capriolo wrote:

    I think it is pretty simple
    1)distcp the warehouse
    2)rsync your derby DB
    --or--
    backup restore derby

    This assumes you are not going to edit anything while moving.

    On Thu, Feb 11, 2010 at 12:01 PM, Ryan LeCompte wrote:
    Hey guys,

    I have another Hadoop cluster that has Hive installed with its own metastore
    and all. I would like to move/copy/export data from a bunch of Hive tables
    from a different Hadoop cluster into this one.

    Is this possible? What's the best way to do it? The hadoop/hive/derby
    versions are the same.

    Thanks!

    Ryan
  • Edward Capriolo at Feb 12, 2010 at 4:15 pm

    On Thu, Feb 11, 2010 at 12:32 PM, Ryan LeCompte wrote:
    I see. Here's my problem, though. I already have a new derby metastoredb
    setup and configured in the NEW cluster. I don't want to blow that one away,
    since I have some tables that I don't want to get rid of.

    I essentially want to do this:

    1) Grab tables from the old cluster and move them into the new cluster (one
    time operation)

    I guess I would need to somehow merge table information from the old derby
    metastoredb into the new one?

    I guess I could easily distcp the table directories under
    /user/hive/warehouse from the old cluster and upload them to
    /user/hive/warehouse in the new cluster.

    But what about Derby?

    On Thu, Feb 11, 2010 at 12:25 PM, Edward Capriolo wrote:

    I think it is pretty simple
    1)distcp the warehouse
    2)rsync your derby DB
    --or--
    backup restore derby

    This assumes you are not going to edit anything while moving.


    On Thu, Feb 11, 2010 at 12:01 PM, Ryan LeCompte <lecompte@gmail.com>
    wrote:
    Hey guys,

    I have another Hadoop cluster that has Hive installed with its own
    metastore
    and all. I would like to move/copy/export data from a bunch of Hive
    tables
    from a different Hadoop cluster into this one.

    Is this possible? What's the best way to do it? The hadoop/hive/derby
    versions are the same.

    Thanks!

    Ryan
    I opened up https://issues.apache.org/jira/browse/HIVE-1161, I think
    whipping up tools to handle simple master/slave replication between
    two clusters should be fun, also might help quell some SPOF questions.
    +1

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 11, '10 at 5:01p
activeFeb 12, '10 at 4:15p
posts5
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase