Hi Darren,
Thanks for your reply. I started MySQL and configured the metastore with
the Hive 0.10.0 schema, confirmed the "mysqld" process was running on port
3306, but Cloudera Manager was still unable to connect to it. Do you have
any suggestions? The steps I took are below:
Did you install CDH using CM? Did you use Parcels (which is recommended)
or Packages?
I installed the CDH4 Parcel:
/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10
- Downloaded mysql-connector-java-5.1.15-bin.jar and copied it as
/usr/share/java/mysql-connector-java.jar on all nodes (just to be
safe) in the cluster. *Should I have used a different version?*
$ ls /usr/share/java/
libintl.jar mysql-connector-java.jar
- Installed mysql (apt-get install mysql-server) and confirmed that
"mysqld" was running on port 3306 (CM had apparently already started the
PostgreSQL db on 7432):
$ sockstat
USER PROCESS PID PROTO SOURCE ADDRESS
FOREIGN ADDRESS STATE
[...]
cloudera postgres 2445 tcp4 127.0.0.1:7432
127.0.0.1:43450 ESTABLISHED
cloudera postgres 2446 tcp4 127.0.0.1:7432
127.0.0.1:43451 ESTABLISHED
cloudera postgres 2447 tcp4 127.0.0.1:7432
127.0.0.1:43452 ESTABLISHED
cloudera postgres 2448 tcp4 127.0.0.1:7432
127.0.0.1:43453 ESTABLISHED
cloudera postgres 2449 tcp4 127.0.0.1:7432
127.0.0.1:43454 ESTABLISHED
[...]
mysql mysqld 17142 tcp4 127.0.0.1:3306 *:*
LISTEN
- After starting the mysql server, I created a metastore using the Hive
0.10.0 schema, taking my cue from
/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/ hive-hwi-*
0.10.0*-cdh4.2.0.jar:
mysql
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE
/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql
mysql> CREATE USER 'user'@'%' IDENTIFIED BY 'password';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE ON metastore.* TO 'user'@'%';
mysql> REVOKE ALTER,CREATE ON metastore.* FROM 'user'@'%';
- I then went back to Cloudera Manager to the "Database Setup" page for
Hive, selected "Custom Databases and entered "metastore" and the
username/password, but the connection test still failed:
No database server found running on host server:3306
- The Database Setup page also provided a default database
name/username/password combo (hive/hive/[randomstring]) so I ran all the
same MySQL commands with that info, but the connection still failed
Best,
Lauren
On Wednesday, March 13, 2013 11:20:27 AM UTC-4, Darren Lo wrote:Hi LK,
Normally CM will install the hive binaries for you before you even add any
services. Did you install CDH using CM? Did you use Parcels (which is
recommended) or Packages? Note that if you installed using Parcels, then
there won't be a /usr/lib/hive/lib, since Parcels installs all CDH binaries
somewhere under /opt (by default, it can be configured)
When using Hive through CM, you do not need to copy the MySQL jar (or
anything else) into the Hive lib directory. You should just need to install
the jar to /usr/share/java/mysql-connector-java.jar. Make sure it has that
exact name, with no numbers or versions in the file name. This is the
default location when installing the mysql connector via RPMs.
Since you have no information in your Hive metastore, you just need to
configure Hive to point to the correct database in the first place, which
is handled on the Database Setup page in the CM wizard. You don't need to
worry about doing a "clean install" on the correct database or to "switch"
databases.
Also, when granting privileges to your mysql user, be sure that the user
can create tables, indices, etc. CM will create the Hive tables as part of
setting up your new Hive service, and you don't want this to fail. You can
revoke these privileges after CM has created the tables.
Thanks,
Darren
On Wed, Mar 13, 2013 at 7:33 AM, L. K. <lkus...@revelytix.com<javascript:>
wrote:
Hi,
I'm using CM 4.5 Free Edition on a test cluster - currently just HDFS and
MR are running, and I would like to add Hive with MySQL as the metastore.
My understanding was that Derby used to be the default, and I'd seen the instructions
for switching Derby to MySQL<
https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-ConfiguringtheHiveMetastore>,
but with 4.5 it now appears<
http://blog.cloudera.com/blog/2013/03/how-to-set-up-cloudera-manager-4-5-for-apache-hive/>that PostgresQL is the default metastore: "Cloudera
Manager will: Create a database for the Hive Metastore if you are using
the Cloudera Manager Embedded PostgreSQL Database" Also, all the
"switchover" instructions assume the Hive distribution has already been
loaded on the nodes, i.e. that the /usr/lib/hive/lib directory already
exists for you to copy the MySQL JDBC jar into.
In short, I had three questions:
1. All the instructions I've seen for using MySQL as the metastore
assume that the Hive distro is already on the machines, i.e. that the
/usr/lib/hive/lib directory already exists. Is there any way to do a clean
install Hive - that is, install Hive for the first time ever on that
machine - using MySQL as a "Custom Database", instead of starting Hive on
the "Embedded Database" and then switching over to MySQL?
2. If I manually created a /usr/lib/hive/lib folder just for the
purposes of starting a MySQL db before going back to CM and starting Hive
for the first time, would it cause a problem when CM goes to install Hive
and sees that /usr/lib/hive/ already exists?
3. If it's true that PostgresQL is now the embedded db for the Hive
metastore, are there instructions for switching from PostgreSQL to MySQL?
Thank you in advance for your help.
Best,
LK
--
Thanks,
Darren