Grokbase Groups Hive user May 2011
FAQ
We are planning a migration from a large PostgreSQL-based DWH to
Hadoop/Hive. The principal reason for this migration is the massive
growth of the data to analyze (5.6 TB and growing) where PostgreSQL like
a MVCC-based RDBMS has its pitfalls with heavy updates and query
execution with great quantities of data. (We had done many query tunning
and optimization to the server, with a minor effect on the latency of
the queries).

So, we have viewed Hadoop and we have done some tests combined with Hive
and HBase and it´s awesome the obtained performance.

Can you give us some advices to develop a good plan for this?

Environment:
- O.S:CentOS-5.5 64 bits
- Java version: 1.6. Update 20
- Hardware: 8 Nodes - AMD Opteron QuadCore 4130
8 GB RAM
1 TB HDD

Regards

--
Marcos Luís Ortíz Valmaseda
Software Engineer (Large-Scaled Distributed Systems)
University of Information Sciences,
La Habana, Cuba
Linux User # 418229
http://about.me/marcosortiz

Search Discussions

  • Alexandre \"TAZ\" dos Santos Andrade at May 4, 2011 at 8:45 pm
    Hi Marcos,

    I'm doing exactally the same migration, first of all you have to remember
    that hive is gonna make mapreduce for each query you dont write the result
    on a table, second is a litle bit anoing to migrate the data, there's no
    direct connector so I user a simple dump, extracted the header and footer
    and Loaded in hive structure.

    I hope I could Help you

    Alexandre dos Santos Andrade

    2011/5/4 Marcos Ortiz <mlortiz@uci.cu>
    We are planning a migration from a large PostgreSQL-based DWH to
    Hadoop/Hive. The principal reason for this migration is the massive growth
    of the data to analyze (5.6 TB and growing) where PostgreSQL like a
    MVCC-based RDBMS has its pitfalls with heavy updates and query execution
    with great quantities of data. (We had done many query tunning and
    optimization to the server, with a minor effect on the latency of the
    queries).

    So, we have viewed Hadoop and we have done some tests combined with Hive
    and HBase and it´s awesome the obtained performance.

    Can you give us some advices to develop a good plan for this?

    Environment:
    - O.S:CentOS-5.5 64 bits
    - Java version: 1.6. Update 20
    - Hardware: 8 Nodes - AMD Opteron QuadCore 4130
    8 GB RAM
    1 TB HDD

    Regards

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    University of Information Sciences,
    La Habana, Cuba
    Linux User # 418229
    http://about.me/marcosortiz

    --
    <a href="
    http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
    src="
    http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
    border="0" alt="Join Me at CW Connect!"></a>
  • Marcos Ortiz at May 4, 2011 at 9:47 pm

    On 05/04/2011 04:14 PM, Alexandre "TAZ" dos Santos Andrade wrote:
    Hi Marcos,

    I'm doing exactally the same migration, first of all you have to
    remember that hive is gonna make mapreduce for each query you dont
    write the result on a table, second is a litle bit anoing to migrate
    the data, there's no direct connector so I user a simple dump,
    extracted the header and footer and Loaded in hive structure.

    I hope I could Help you

    Alexandre dos Santos Andrade

    2011/5/4 Marcos Ortiz <mlortiz@uci.cu
    We are planning a migration from a large PostgreSQL-based DWH to
    Hadoop/Hive. The principal reason for this migration is the
    massive growth of the data to analyze (5.6 TB and growing) where
    PostgreSQL like a MVCC-based RDBMS has its pitfalls with heavy
    updates and query execution with great quantities of data. (We had
    done many query tunning and optimization to the server, with a
    minor effect on the latency of the queries).

    So, we have viewed Hadoop and we have done some tests combined
    with Hive and HBase and it´s awesome the obtained performance.

    Can you give us some advices to develop a good plan for this?

    Environment:
    - O.S:CentOS-5.5 64 bits
    - Java version: 1.6. Update 20
    - Hardware: 8 Nodes - AMD Opteron QuadCore 4130
    8 GB RAM
    1 TB HDD

    Regards

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    University of Information Sciences,
    La Habana, Cuba
    Linux User # 418229
    http://about.me/marcosortiz




    --
    <a
    href="http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
    src="http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
    border="0" alt="Join Me at CW Connect!"></a>
    Thanks a lot, Alexandre.
    Did you use Sqoop to load the data from PostgreSQL to Hive?



    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    University of Information Sciences,
    La Habana, Cuba
    Linux User # 418229
    http://about.me/marcosortiz
  • Alexandre \"TAZ\" dos Santos Andrade at May 4, 2011 at 9:48 pm
    I write a shell script to do that

    2011/5/4 Marcos Ortiz <mlortiz@uci.cu>
    On 05/04/2011 04:14 PM, Alexandre "TAZ" dos Santos Andrade wrote:

    Hi Marcos,

    I'm doing exactally the same migration, first of all you have to remember
    that hive is gonna make mapreduce for each query you dont write the result
    on a table, second is a litle bit anoing to migrate the data, there's no
    direct connector so I user a simple dump, extracted the header and footer
    and Loaded in hive structure.

    I hope I could Help you

    Alexandre dos Santos Andrade

    2011/5/4 Marcos Ortiz <mlortiz@uci.cu>
    We are planning a migration from a large PostgreSQL-based DWH to
    Hadoop/Hive. The principal reason for this migration is the massive growth
    of the data to analyze (5.6 TB and growing) where PostgreSQL like a
    MVCC-based RDBMS has its pitfalls with heavy updates and query execution
    with great quantities of data. (We had done many query tunning and
    optimization to the server, with a minor effect on the latency of the
    queries).

    So, we have viewed Hadoop and we have done some tests combined with Hive
    and HBase and it´s awesome the obtained performance.

    Can you give us some advices to develop a good plan for this?

    Environment:
    - O.S:CentOS-5.5 64 bits
    - Java version: 1.6. Update 20
    - Hardware: 8 Nodes - AMD Opteron QuadCore 4130
    8 GB RAM
    1 TB HDD

    Regards

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    University of Information Sciences,
    La Habana, Cuba
    Linux User # 418229
    http://about.me/marcosortiz

    --
    <a href="
    http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
    src="
    http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
    border="0" alt="Join Me at CW Connect!"></a>

    Thanks a lot, Alexandre.
    Did you use Sqoop to load the data from PostgreSQL to Hive?




    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    University of Information Sciences,
    La Habana, Cuba
    Linux User # 418229
    http://about.me/marcosortiz

    --
    <a href="
    http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
    src="
    http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
    border="0" alt="Join Me at CW Connect!"></a>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 4, '11 at 8:24p
activeMay 4, '11 at 9:48p
posts4
users2
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase