FAQ
Hi,

I'm using CM 4.5 Free Edition on a test cluster - currently just HDFS and
MR are running, and I would like to add Hive with MySQL as the metastore.
My understanding was that Derby used to be the default, and I'd seen the instructions
for switching Derby to MySQL<https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore>,
but with 4.5 it now appears<http://blog.cloudera.com/blog/2013/03/how-to-set-up-cloudera-manager-4-5-for-apache-hive/>that PostgresQL is the default metastore: "Cloudera
Manager will: Create a database for the Hive Metastore if you are using the
Cloudera Manager Embedded PostgreSQL Database" Also, all the "switchover"
instructions assume the Hive distribution has already been loaded on the
nodes, i.e. that the /usr/lib/hive/lib directory already exists for you to
copy the MySQL JDBC jar into.

In short, I had three questions:


1. All the instructions I've seen for using MySQL as the metastore
assume that the Hive distro is already on the machines, i.e. that the
/usr/lib/hive/lib directory already exists. Is there any way to do a clean
install Hive - that is, install Hive for the first time ever on that
machine - using MySQL as a "Custom Database", instead of starting Hive on
the "Embedded Database" and then switching over to MySQL?
2. If I manually created a /usr/lib/hive/lib folder just for the
purposes of starting a MySQL db before going back to CM and starting Hive
for the first time, would it cause a problem when CM goes to install Hive
and sees that /usr/lib/hive/ already exists?
3. If it's true that PostgresQL is now the embedded db for the Hive
metastore, are there instructions for switching from PostgreSQL to MySQL?


Thank you in advance for your help.

Best,
LK

Search Discussions

  • Darren Lo at Mar 13, 2013 at 3:20 pm
    Hi LK,

    Normally CM will install the hive binaries for you before you even add any
    services. Did you install CDH using CM? Did you use Parcels (which is
    recommended) or Packages? Note that if you installed using Parcels, then
    there won't be a /usr/lib/hive/lib, since Parcels installs all CDH binaries
    somewhere under /opt (by default, it can be configured)

    When using Hive through CM, you do not need to copy the MySQL jar (or
    anything else) into the Hive lib directory. You should just need to install
    the jar to /usr/share/java/mysql-connector-java.jar. Make sure it has that
    exact name, with no numbers or versions in the file name. This is the
    default location when installing the mysql connector via RPMs.

    Since you have no information in your Hive metastore, you just need to
    configure Hive to point to the correct database in the first place, which
    is handled on the Database Setup page in the CM wizard. You don't need to
    worry about doing a "clean install" on the correct database or to "switch"
    databases.

    Also, when granting privileges to your mysql user, be sure that the user
    can create tables, indices, etc. CM will create the Hive tables as part of
    setting up your new Hive service, and you don't want this to fail. You can
    revoke these privileges after CM has created the tables.

    Thanks,
    Darren

    On Wed, Mar 13, 2013 at 7:33 AM, L. K. wrote:

    Hi,

    I'm using CM 4.5 Free Edition on a test cluster - currently just HDFS and
    MR are running, and I would like to add Hive with MySQL as the metastore.
    My understanding was that Derby used to be the default, and I'd seen the instructions
    for switching Derby to MySQL<https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore>,
    but with 4.5 it now appears<http://blog.cloudera.com/blog/2013/03/how-to-set-up-cloudera-manager-4-5-for-apache-hive/>that PostgresQL is the default metastore: "Cloudera
    Manager will: Create a database for the Hive Metastore if you are using
    the Cloudera Manager Embedded PostgreSQL Database" Also, all the
    "switchover" instructions assume the Hive distribution has already been
    loaded on the nodes, i.e. that the /usr/lib/hive/lib directory already
    exists for you to copy the MySQL JDBC jar into.

    In short, I had three questions:


    1. All the instructions I've seen for using MySQL as the metastore
    assume that the Hive distro is already on the machines, i.e. that the
    /usr/lib/hive/lib directory already exists. Is there any way to do a clean
    install Hive - that is, install Hive for the first time ever on that
    machine - using MySQL as a "Custom Database", instead of starting Hive on
    the "Embedded Database" and then switching over to MySQL?
    2. If I manually created a /usr/lib/hive/lib folder just for the
    purposes of starting a MySQL db before going back to CM and starting Hive
    for the first time, would it cause a problem when CM goes to install Hive
    and sees that /usr/lib/hive/ already exists?
    3. If it's true that PostgresQL is now the embedded db for the Hive
    metastore, are there instructions for switching from PostgreSQL to MySQL?


    Thank you in advance for your help.

    Best,
    LK


    --
    Thanks,
    Darren
  • L. K. at Mar 13, 2013 at 9:07 pm
    Hi Darren,

    Thanks for your reply. I started MySQL and configured the metastore with
    the Hive 0.10.0 schema, confirmed the "mysqld" process was running on port
    3306, but Cloudera Manager was still unable to connect to it. Do you have
    any suggestions? The steps I took are below:

    Did you install CDH using CM? Did you use Parcels (which is recommended)
    or Packages?

    I installed the CDH4 Parcel:
    /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10

    - Downloaded mysql-connector-java-5.1.15-bin.jar and copied it as
    /usr/share/java/mysql-connector-java.jar on all nodes (just to be
    safe) in the cluster. *Should I have used a different version?*

    $ ls /usr/share/java/
    libintl.jar mysql-connector-java.jar

    - Installed mysql (apt-get install mysql-server) and confirmed that
    "mysqld" was running on port 3306 (CM had apparently already started the
    PostgreSQL db on 7432):

    $ sockstat
    USER PROCESS PID PROTO SOURCE ADDRESS
    FOREIGN ADDRESS STATE
    [...]
    cloudera postgres 2445 tcp4 127.0.0.1:7432
    127.0.0.1:43450 ESTABLISHED
    cloudera postgres 2446 tcp4 127.0.0.1:7432
    127.0.0.1:43451 ESTABLISHED
    cloudera postgres 2447 tcp4 127.0.0.1:7432
    127.0.0.1:43452 ESTABLISHED
    cloudera postgres 2448 tcp4 127.0.0.1:7432
    127.0.0.1:43453 ESTABLISHED
    cloudera postgres 2449 tcp4 127.0.0.1:7432
    127.0.0.1:43454 ESTABLISHED
    [...]
    mysql mysqld 17142 tcp4 127.0.0.1:3306 *:*
    LISTEN


    - After starting the mysql server, I created a metastore using the Hive
    0.10.0 schema, taking my cue from
    /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/ hive-hwi-*
    0.10.0*-cdh4.2.0.jar:

    mysql
    mysql> CREATE DATABASE metastore;
    mysql> USE metastore;
    mysql> SOURCE
    /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql
    mysql> CREATE USER 'user'@'%' IDENTIFIED BY 'password';
    mysql> GRANT SELECT,INSERT,UPDATE,DELETE ON metastore.* TO 'user'@'%';
    mysql> REVOKE ALTER,CREATE ON metastore.* FROM 'user'@'%';

    - I then went back to Cloudera Manager to the "Database Setup" page for
    Hive, selected "Custom Databases and entered "metastore" and the
    username/password, but the connection test still failed:

    No database server found running on host server:3306

    - The Database Setup page also provided a default database
    name/username/password combo (hive/hive/[randomstring]) so I ran all the
    same MySQL commands with that info, but the connection still failed

    Best,
    Lauren

    On Wednesday, March 13, 2013 11:20:27 AM UTC-4, Darren Lo wrote:

    Hi LK,

    Normally CM will install the hive binaries for you before you even add any
    services. Did you install CDH using CM? Did you use Parcels (which is
    recommended) or Packages? Note that if you installed using Parcels, then
    there won't be a /usr/lib/hive/lib, since Parcels installs all CDH binaries
    somewhere under /opt (by default, it can be configured)

    When using Hive through CM, you do not need to copy the MySQL jar (or
    anything else) into the Hive lib directory. You should just need to install
    the jar to /usr/share/java/mysql-connector-java.jar. Make sure it has that
    exact name, with no numbers or versions in the file name. This is the
    default location when installing the mysql connector via RPMs.

    Since you have no information in your Hive metastore, you just need to
    configure Hive to point to the correct database in the first place, which
    is handled on the Database Setup page in the CM wizard. You don't need to
    worry about doing a "clean install" on the correct database or to "switch"
    databases.

    Also, when granting privileges to your mysql user, be sure that the user
    can create tables, indices, etc. CM will create the Hive tables as part of
    setting up your new Hive service, and you don't want this to fail. You can
    revoke these privileges after CM has created the tables.

    Thanks,
    Darren


    On Wed, Mar 13, 2013 at 7:33 AM, L. K. <lkus...@revelytix.com<javascript:>
    wrote:
    Hi,

    I'm using CM 4.5 Free Edition on a test cluster - currently just HDFS and
    MR are running, and I would like to add Hive with MySQL as the metastore.
    My understanding was that Derby used to be the default, and I'd seen the instructions
    for switching Derby to MySQL<https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore>,
    but with 4.5 it now appears<http://blog.cloudera.com/blog/2013/03/how-to-set-up-cloudera-manager-4-5-for-apache-hive/>that PostgresQL is the default metastore: "Cloudera
    Manager will: Create a database for the Hive Metastore if you are using
    the Cloudera Manager Embedded PostgreSQL Database" Also, all the
    "switchover" instructions assume the Hive distribution has already been
    loaded on the nodes, i.e. that the /usr/lib/hive/lib directory already
    exists for you to copy the MySQL JDBC jar into.

    In short, I had three questions:


    1. All the instructions I've seen for using MySQL as the metastore
    assume that the Hive distro is already on the machines, i.e. that the
    /usr/lib/hive/lib directory already exists. Is there any way to do a clean
    install Hive - that is, install Hive for the first time ever on that
    machine - using MySQL as a "Custom Database", instead of starting Hive on
    the "Embedded Database" and then switching over to MySQL?
    2. If I manually created a /usr/lib/hive/lib folder just for the
    purposes of starting a MySQL db before going back to CM and starting Hive
    for the first time, would it cause a problem when CM goes to install Hive
    and sees that /usr/lib/hive/ already exists?
    3. If it's true that PostgresQL is now the embedded db for the Hive
    metastore, are there instructions for switching from PostgreSQL to MySQL?


    Thank you in advance for your help.

    Best,
    LK


    --
    Thanks,
    Darren
  • Darren Lo at Mar 13, 2013 at 9:31 pm
    Hi Lauren,

    Your mysql driver should be fine. That's definitely not causing your error.

    Your steps all look correct (though you didn't have to SOURCE the hive sql,
    since CM can do that for you, but what you did is perfectly fine, CM will
    just skip that step).

    Can you verify that your host name and other database configs are correct?
    You should be able to successfully run:
    mysql --user=USERNAME --host=HOST --port=PORT -p
    (prompts for password)

    If that doesn't work, it can be due to firewall / networking issues, or
    simply providing the wrong username/password/host/port. Once you get that
    working, then enter these values into the CM wizard and it should work. If
    you used the default mysql port (3306), then you don't need to specify port
    in the CM wizard.

    Thanks,
    Darren

    On Wed, Mar 13, 2013 at 2:07 PM, L. K. wrote:

    Hi Darren,

    Thanks for your reply. I started MySQL and configured the metastore with
    the Hive 0.10.0 schema, confirmed the "mysqld" process was running on port
    3306, but Cloudera Manager was still unable to connect to it. Do you have
    any suggestions? The steps I took are below:

    Did you install CDH using CM? Did you use Parcels (which is recommended)
    or Packages?

    I installed the CDH4 Parcel:
    /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10

    - Downloaded mysql-connector-java-5.1.15-bin.jar and copied it as
    /usr/share/java/mysql-**connector-java.jar on all nodes (just to be
    safe) in the cluster. *Should I have used a different version?*

    $ ls /usr/share/java/
    libintl.jar mysql-connector-java.jar

    - Installed mysql (apt-get install mysql-server) and confirmed that
    "mysqld" was running on port 3306 (CM had apparently already started the
    PostgreSQL db on 7432):

    $ sockstat
    USER PROCESS PID PROTO SOURCE ADDRESS
    FOREIGN ADDRESS STATE
    [...]
    cloudera postgres 2445 tcp4 127.0.0.1:7432
    127.0.0.1:43450 ESTABLISHED
    cloudera postgres 2446 tcp4 127.0.0.1:7432
    127.0.0.1:43451 ESTABLISHED
    cloudera postgres 2447 tcp4 127.0.0.1:7432
    127.0.0.1:43452 ESTABLISHED
    cloudera postgres 2448 tcp4 127.0.0.1:7432
    127.0.0.1:43453 ESTABLISHED
    cloudera postgres 2449 tcp4 127.0.0.1:7432
    127.0.0.1:43454 ESTABLISHED
    [...]
    mysql mysqld 17142 tcp4 127.0.0.1:3306
    *:* LISTEN


    - After starting the mysql server, I created a metastore using the
    Hive 0.10.0 schema, taking my cue from
    /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/ hive-hwi-
    *0.10.0*-cdh4.2.0.jar:

    mysql
    mysql> CREATE DATABASE metastore;
    mysql> USE metastore;
    mysql> SOURCE
    /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql
    mysql> CREATE USER 'user'@'%' IDENTIFIED BY 'password';
    mysql> GRANT SELECT,INSERT,UPDATE,DELETE ON metastore.* TO 'user'@'%';
    mysql> REVOKE ALTER,CREATE ON metastore.* FROM 'user'@'%';

    - I then went back to Cloudera Manager to the "Database Setup" page
    for Hive, selected "Custom Databases and entered "metastore" and the
    username/password, but the connection test still failed:

    No database server found running on host server:3306

    - The Database Setup page also provided a default database
    name/username/password combo (hive/hive/[randomstring]) so I ran all the
    same MySQL commands with that info, but the connection still failed

    Best,
    Lauren

    On Wednesday, March 13, 2013 11:20:27 AM UTC-4, Darren Lo wrote:

    Hi LK,

    Normally CM will install the hive binaries for you before you even add
    any services. Did you install CDH using CM? Did you use Parcels (which is
    recommended) or Packages? Note that if you installed using Parcels, then
    there won't be a /usr/lib/hive/lib, since Parcels installs all CDH binaries
    somewhere under /opt (by default, it can be configured)

    When using Hive through CM, you do not need to copy the MySQL jar (or
    anything else) into the Hive lib directory. You should just need to install
    the jar to /usr/share/java/mysql-**connector-java.jar. Make sure it has
    that exact name, with no numbers or versions in the file name. This is the
    default location when installing the mysql connector via RPMs.

    Since you have no information in your Hive metastore, you just need to
    configure Hive to point to the correct database in the first place, which
    is handled on the Database Setup page in the CM wizard. You don't need to
    worry about doing a "clean install" on the correct database or to "switch"
    databases.

    Also, when granting privileges to your mysql user, be sure that the user
    can create tables, indices, etc. CM will create the Hive tables as part of
    setting up your new Hive service, and you don't want this to fail. You can
    revoke these privileges after CM has created the tables.

    Thanks,
    Darren

    On Wed, Mar 13, 2013 at 7:33 AM, L. K. wrote:

    Hi,

    I'm using CM 4.5 Free Edition on a test cluster - currently just HDFS
    and MR are running, and I would like to add Hive with MySQL as the
    metastore. My understanding was that Derby used to be the default, and I'd
    seen the instructions for switching Derby to MySQL<https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore>,
    but with 4.5 it now appears<http://blog.cloudera.com/blog/2013/03/how-to-set-up-cloudera-manager-4-5-for-apache-hive/>that PostgresQL is the default metastore: "Cloudera
    Manager will: Create a database for the Hive Metastore if you are using
    the Cloudera Manager Embedded PostgreSQL Database" Also, all the
    "switchover" instructions assume the Hive distribution has already been
    loaded on the nodes, i.e. that the /usr/lib/hive/lib directory already
    exists for you to copy the MySQL JDBC jar into.

    In short, I had three questions:


    1. All the instructions I've seen for using MySQL as the metastore
    assume that the Hive distro is already on the machines, i.e. that the
    /usr/lib/hive/lib directory already exists. Is there any way to do a clean
    install Hive - that is, install Hive for the first time ever on that
    machine - using MySQL as a "Custom Database", instead of starting Hive on
    the "Embedded Database" and then switching over to MySQL?
    2. If I manually created a /usr/lib/hive/lib folder just for the
    purposes of starting a MySQL db before going back to CM and starting Hive
    for the first time, would it cause a problem when CM goes to install Hive
    and sees that /usr/lib/hive/ already exists?
    3. If it's true that PostgresQL is now the embedded db for the Hive
    metastore, are there instructions for switching from PostgreSQL to MySQL?


    Thank you in advance for your help.

    Best,
    LK


    --
    Thanks,
    Darren

    --
    Thanks,
    Darren

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMar 13, '13 at 2:33p
activeMar 13, '13 at 9:31p
posts4
users2
websitecloudera.com
irc#hadoop

2 users in discussion

L. K.: 2 posts Darren Lo: 2 posts

People

Translate

site design / logo © 2022 Grokbase