FAQ
Hi,

I am new to Clouera. I have installed Cloudera Manger 4.5, Now I am trying
to install Hive,impala,Beewax But facing the issue.

Hive I am able to run my HQL from CLI.

But impala is showing bad condition.

Through Beewax console I am not able to browse my HDFS.

These are the steps I follw for installation:

Installing Impala with Cloudera Manager Free Edition<https://ccp.cloudera.com/display/FREE45BETADOC/Installing+Impala+with+Cloudera+Manager+Free+Edition> Step
1: Install CDH and Hive

To use Cloudera Impala, you must install CDH, Hive, and Impala. Install
CDH, Hive and Impala on the nodes that will run Impala as described in Automated
Installation of Cloudera Manager and CDH<https://ccp.cloudera.com/display/FREE45BETADOC/Automated+Installation+of+Cloudera+Manager+and+CDH>.

You must perform your installation using packages; installation of Impala
is not currently available using parcels.
Impala only supports RHEL/CentOS 6.2.


Step 2: Install a Database for the Hive Metastore

In order to run Impala, you must have the Hive metastore configured to use
either a MySQL or a PostgreSQL database. The default Hive metastore Derby
database is not supported with Impala.

The following instructions describe how to install a MySQL database to use
for the Hive metastore. Install this database on a single machine in your
cluster.

*To install and configure a MySQL database:*

1. Install the MySQL server.
$ sudo yum install mysql-server

After issuing the command to install MySQL, you may need to respond to
prompts to confirm that you do want to complete the installation.

2. After installation completes, start the mysql daemon.
$ sudo service mysqld start
In the following step, your current root password is blank. Press the
Enter key when you're prompted for the root password.
3. Set the MySQL root password:
$ sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!
4. Configure MySQL server to start at boot:
$ sudo /sbin/chkconfig mysqld on
$ sudo /sbin/chkconfig --list mysqld
mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off

Step 3: Configure a Remote Database as the Hive Metastore

The recommended production environment for Hive is to use a database on one
or more remote servers as the metastore, and MySQL is the most popular
database to use. To set this up:
Impala does not support Derby.
Configuring the Remote MySQL Database

Before you can run the Hive metastore with a remote MySQL database, you
must configure a connector to the remote MySQL database, set up the initial
database schema, and configure the MySQL user account for the Hive user.

*Install the MySQL JDBC Connector<http://www.mysql.com/downloads/connector/j>in the Hive lib directory:
*
$ curl -L
'http://www.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.22.tar.gz/from/http://mysql.he.net/'
tar xz
$ sudo cp mysql-connector-java-5.1.22/mysql-connector-java-5.1.22-bin.jar
/usr/lib/hive/lib/

The MySQL administrator should create the initial database schema using the
hive-schema-0.9.0.mysql.sql file located in the
/usr/lib/hive/scripts/metastore/upgrade/mysql directory.
$ mysql -u root -p
mysql> CREATE DATABASE hivemetastoredb;
mysql> USE hivemetastoredb;
mysql> SOURCE
/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql;
mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON hivemetastoredb.* TO 'hive'@'%' WITH GRANT
OPTION;
mysql> FLUSH PRIVILEGES;
mysql> quit;

Take note of the settings you applied to the remote MySQL database. You
will need these values, such as the database name and database user name as
you configure Impala.


Step 4: Add the Impala Service

As you configure Impala, you will need to modify HDFS and Hive settings.
Configurations that are recommended for many environments are as follows:

*HDFS Configurations*
Property Value DataNode Local Path Access Users
dfs.block.local-path-access.user impala DataNode Data Directory
Permissions
dfs.datanode.data.dir.perm 755 Enable HDFS Block Metadata API
dfs.datanode.hdfs-blocks-metadata.enabled true Enable HDFS Short Circuit
Read
dfs.client.read.shortcircuit true

*Hive Configurations*
Property Value Hive Metastore Database Type mysql Hive Metastore
Database Name hivemetastoredb Hive Metastore Database Host <db
hostname> Hive Metastore Database Port 3306 Hive Metastore Database
User
javax.jdo.option.ConnectionUserName hive Hive Metastore Database
Password
javax.jdo.option.ConnectionPassword hive

*To add and configure the Impala service*

1. Connect to the Cloudera Manager Admin Console using a browser.
The server URL takes the following form:

http://<Server host>:<port>

<Server host> is the fully-qualified domain name or IP address of the
host machine where the Cloudera Manager Server is installed.
<port> is the port configured for the Cloudera Manager Server. The
default port is 7180.

For example, use a URL such as the following:
http://myhost.example.com:7180/
2. Click the *Services* tab, then choose *All Services*.

3. Click the *Add a Service* button.

4. Select *Impala* and click *Continue*.

5. Select the dependencies for your service and click *Continue*.
Usually this will be an HDFS, HBase, and Hive service, indicating that the
Impala service depends on these services.

6. Confirm the host assignment for the Impala services and click *
Continue*.

7. Review the configuration changes to be made during the Impala
installation and click *Accept*.

8. The results of the installation process are displayed. Click *Continue
* to return to the Cloudera Manager Admin Console.

9. Use the the HDFS service page to configure the HDFS service for
Impala. Click the instance of HDFS that supports Impala.

10. Pull down the *Configuration* tab at the top of the window and
select *Edit*.

1. Search for dfs.block.local-path-access.user, which is located under
the *Service-Wide > Performance* category in the left hand column.
Add impala to this list, where impala is the System user configured
in the Impala service.

2. Click *Save Changes*.

3. Search for dfs.datanode.data.dir.perm, which is under the *Security
* category under each DataNode role configuration group. The search
function should display all occurrences of this property if you have
multiple DataNode role configuration groups. Set the *DataNode Data
Directory Permissions* property to 755. This is recommended for
Impala performance.

4. Click *Save Changes*.

5. Click the *Configuration* tab and click *Service Wide*. Enable the
following HDFS properties:
- Search for dfs.datanode.hdfs-blocks-metadata.enabled and verify
that it is enabled.
- Search for dfs.client.read.shortcircuit and verify that it is
enabled.
Both of these are normally enabled by default.

6. Click *Save Changes*.

11. Restart the HDFS service.

1. Click *Services* and click *All Services*.

2. For the HDFS service you modified, click *Actions* and click *
Restart*.

12. Set the Hive metastore

1. Go to the Hive service page (by selecting the Hive service from the *
Services* menu or from the *All Services* page).

2. Pull down the *Configuration* tab at the top of the window and
select *Edit*.

3. Under *Service-Wide* select *Hive Metastore* in the left hand
column.

4. Update the properties as appropriate to your metastore
configuration, then *Save Changes*.

13. (Optional) Once you have configured and saved your Hive metastore
settings, it is recommended that you validate the Hive metastore by
executing a Hive query. You can validate the metastore using Hue's Beewax
application.

14. Click *Services* and click the new Impala service. Click *Actions*and click
*Restart*.


Configuring Hue Beeswax to Connect with Impala (Optional)

You may want to connect with Impala and execute queries from the Hue
Beeswax interface. By default, Hue Beeswax is not configured to use RPC,
which is required for querying Impala. To enable RPC connections, you must
add configuration settings in the *Hue Service Configuration Safety Valve*to add the settings to the
hue.ini file. Impala requires Hue 2.1 or later in CDH 4.1 or later.

*To configure Hue to connect with Impala:*

1. In the Cloudera Manager Admin Console, go to the *Hue service >
Configuration* page.

2. In the *Service-Wide > Advanced* section, set the following values
for the *Hue Service Configuration Safety Valve*:
[beeswax]
beeswax_server_host=<Impala daemon hostname>
beeswax_server_port=<Impala daemon port>

where: <Impala daemon hostname> refers to any host where any Impala
daemon is running. Hue must connect with only one Impala daemon, so any of
them will work. <Impala daemon port> refers to the port on that host to
use to connect to the Impala daemon. The default is 21000.
For example:
[beeswax]
beeswax_server_host=impaladhostname
beeswax_server_port=21000
*Important*
If Hue and the Impala daemon are installed on the same host, and you are
using CDH4.1, then you must add the beeswax_meta_server_only=9004configuration value to the
*Hue Service Configuration Safety Valve* as shown below to avoid a port
conflict in Hue:
[beeswax]
beeswax_server_host=<Impala daemon hostname>
beeswax_server_port=<Impala daemon port>
beeswax_meta_server_only=9004
3. Click *Save*.

4. Click *Services* and click the Hue service. Click *Actions* and click
*Restart*.

For information about using Beeswax for queries, see Beeswax<https://ccp.cloudera.com/display/CDH4DOC/Beeswax>.

Search Discussions

  • Saurav Sinha at Feb 1, 2013 at 7:21 pm
    Hi Romani,

    I am getting the error in impala shell even though it is showing the good
    condition in CMS

    I am including the screen shot of the CMS

    [saurav@localhost ~]$ impala-shell -i impala-host -r

    Error connecting: <class
    'thrift.transport.TTransport.TTransportException'>, Could not connect to
    impala-host:21000
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.4 (f360eba) built on Thu Jan 17 10:50:01 PST 2013)
    [Not connected] >

    On Fri, Feb 1, 2013 at 10:48 PM, Romain Rigaux wrote:

    + hue-user

    About Hue:
    About not being able to browser HDFS in Hue, have you done
    https://ccp.cloudera.com/display/CDH4DOC/Hue+Installation#HueInstallation-ConfiguringHuetouseWebHDFSorHttpFS?
    (Hue also has a URL listing the problems in
    http://HUE_HOST:HUE:PORT/debug/check_config<http://127.0.0.1:8000/debug/check_config>
    )

    We need to see if Impala or Hue are misconfigured. Have you checked the
    Impala logs in order to see why its status was bad? Is Impala Shell working?

    Huse-wise, if Hue is running and has this properties correctly entered in
    "Hue Service Configuration Safety Valve" it should be good:

    [beeswax]
    beeswax_server_host=<Impala daemon hostname>
    beeswax_server_port=<Impala daemon port>
    beeswax_meta_server_only=9004

    Romain

    On Fri, Feb 1, 2013 at 2:46 AM, Saurav Sinha wrote:

    Hi,

    I am new to Clouera. I have installed Cloudera Manger 4.5, Now I am
    trying to install Hive,impala,Beewax But facing the issue.

    Hive I am able to run my HQL from CLI.

    But impala is showing bad condition.

    Through Beewax console I am not able to browse my HDFS.

    These are the steps I follw for installation:

    Installing Impala with Cloudera Manager Free Edition<https://ccp.cloudera.com/display/FREE45BETADOC/Installing+Impala+with+Cloudera+Manager+Free+Edition> Step
    1: Install CDH and Hive

    To use Cloudera Impala, you must install CDH, Hive, and Impala. Install
    CDH, Hive and Impala on the nodes that will run Impala as described in Automated
    Installation of Cloudera Manager and CDH<https://ccp.cloudera.com/display/FREE45BETADOC/Automated+Installation+of+Cloudera+Manager+and+CDH>.

    You must perform your installation using packages; installation of Impala
    is not currently available using parcels.
    Impala only supports RHEL/CentOS 6.2.


    Step 2: Install a Database for the Hive Metastore

    In order to run Impala, you must have the Hive metastore configured to
    use either a MySQL or a PostgreSQL database. The default Hive metastore
    Derby database is not supported with Impala.

    The following instructions describe how to install a MySQL database to
    use for the Hive metastore. Install this database on a single machine in
    your cluster.

    *To install and configure a MySQL database:*

    1. Install the MySQL server.
    $ sudo yum install mysql-server

    After issuing the command to install MySQL, you may need to respond
    to prompts to confirm that you do want to complete the installation.

    2. After installation completes, start the mysql daemon.
    $ sudo service mysqld start
    In the following step, your current root password is blank. Press
    the Enter key when you're prompted for the root password.
    3. Set the MySQL root password:
    $ sudo /usr/bin/mysql_secure_installation
    [...]
    Enter current password for root (enter for none):
    OK, successfully used password, moving on...
    [...]
    Set root password? [Y/n] y
    New password:
    Re-enter new password:
    Remove anonymous users? [Y/n] Y
    [...]
    Disallow root login remotely? [Y/n] N
    [...]
    Remove test database and access to it [Y/n] Y
    [...]
    Reload privilege tables now? [Y/n] Y
    All done!
    4. Configure MySQL server to start at boot:
    $ sudo /sbin/chkconfig mysqld on
    $ sudo /sbin/chkconfig --list mysqld
    mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off

    Step 3: Configure a Remote Database as the Hive Metastore

    The recommended production environment for Hive is to use a database on
    one or more remote servers as the metastore, and MySQL is the most popular
    database to use. To set this up:
    Impala does not support Derby.
    Configuring the Remote MySQL Database

    Before you can run the Hive metastore with a remote MySQL database, you
    must configure a connector to the remote MySQL database, set up the initial
    database schema, and configure the MySQL user account for the Hive user.

    *Install the MySQL JDBC Connector<http://www.mysql.com/downloads/connector/j>in the Hive lib directory:
    *
    $ curl -L '
    http://www.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.22.tar.gz/from/http://mysql.he.net/'
    tar xz
    $ sudo cp mysql-connector-java-5.1.22/mysql-connector-java-5.1.22-bin.jar
    /usr/lib/hive/lib/

    The MySQL administrator should create the initial database schema using
    the hive-schema-0.9.0.mysql.sql file located in the
    /usr/lib/hive/scripts/metastore/upgrade/mysql directory.
    $ mysql -u root -p
    mysql> CREATE DATABASE hivemetastoredb;
    mysql> USE hivemetastoredb;
    mysql> SOURCE
    /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql;
    mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
    mysql> GRANT ALL PRIVILEGES ON hivemetastoredb.* TO 'hive'@'%' WITH
    GRANT OPTION;
    mysql> FLUSH PRIVILEGES;
    mysql> quit;

    Take note of the settings you applied to the remote MySQL database. You
    will need these values, such as the database name and database user name as
    you configure Impala.


    Step 4: Add the Impala Service

    As you configure Impala, you will need to modify HDFS and Hive settings.
    Configurations that are recommended for many environments are as follows:

    *HDFS Configurations*
    Property Value DataNode Local Path Access Users
    dfs.block.local-path-access.user impala DataNode Data Directory
    Permissions
    dfs.datanode.data.dir.perm 755 Enable HDFS Block Metadata API
    dfs.datanode.hdfs-blocks-metadata.enabled true Enable HDFS Short
    Circuit Read
    dfs.client.read.shortcircuit true

    *Hive Configurations*
    Property Value Hive Metastore Database Type mysql Hive Metastore
    Database Name hivemetastoredb Hive Metastore Database Host <db
    hostname> Hive Metastore Database Port 3306 Hive Metastore Database
    User
    javax.jdo.option.ConnectionUserName hive Hive Metastore Database
    Password
    javax.jdo.option.ConnectionPassword hive

    *To add and configure the Impala service*

    1. Connect to the Cloudera Manager Admin Console using a browser.
    The server URL takes the following form:

    http://<Server host>:<port>

    <Server host> is the fully-qualified domain name or IP address of
    the host machine where the Cloudera Manager Server is installed.
    <port> is the port configured for the Cloudera Manager Server. The
    default port is 7180.

    For example, use a URL such as the following:
    http://myhost.example.com:7180/
    2. Click the *Services* tab, then choose *All Services*.

    3. Click the *Add a Service* button.

    4. Select *Impala* and click *Continue*.

    5. Select the dependencies for your service and click *Continue*.
    Usually this will be an HDFS, HBase, and Hive service, indicating that the
    Impala service depends on these services.

    6. Confirm the host assignment for the Impala services and click *
    Continue*.

    7. Review the configuration changes to be made during the Impala
    installation and click *Accept*.

    8. The results of the installation process are displayed. Click *
    Continue* to return to the Cloudera Manager Admin Console.

    9. Use the the HDFS service page to configure the HDFS service for
    Impala. Click the instance of HDFS that supports Impala.

    10. Pull down the *Configuration* tab at the top of the window and
    select *Edit*.

    1. Search for dfs.block.local-path-access.user, which is located
    under the *Service-Wide > Performance* category in the left hand
    column. Add impala to this list, where impala is the System user
    configured in the Impala service.

    2. Click *Save Changes*.

    3. Search for dfs.datanode.data.dir.perm, which is under the *
    Security* category under each DataNode role configuration group.
    The search function should display all occurrences of this property if you
    have multiple DataNode role configuration groups. Set the *DataNode
    Data Directory Permissions* property to 755. This is recommended
    for Impala performance.

    4. Click *Save Changes*.

    5. Click the *Configuration* tab and click *Service Wide*. Enable
    the following HDFS properties:
    - Search for dfs.datanode.hdfs-blocks-metadata.enabled and
    verify that it is enabled.
    - Search for dfs.client.read.shortcircuit and verify that it is
    enabled.
    Both of these are normally enabled by default.

    6. Click *Save Changes*.

    11. Restart the HDFS service.

    1. Click *Services* and click *All Services*.

    2. For the HDFS service you modified, click *Actions* and click *
    Restart*.

    12. Set the Hive metastore

    1. Go to the Hive service page (by selecting the Hive service from
    the *Services* menu or from the *All Services* page).

    2. Pull down the *Configuration* tab at the top of the window and
    select *Edit*.

    3. Under *Service-Wide* select *Hive Metastore* in the left hand
    column.

    4. Update the properties as appropriate to your metastore
    configuration, then *Save Changes*.

    13. (Optional) Once you have configured and saved your Hive
    metastore settings, it is recommended that you validate the Hive metastore
    by executing a Hive query. You can validate the metastore using Hue's
    Beewax application.

    14. Click *Services* and click the new Impala service. Click *Actions*and click
    *Restart*.


    Configuring Hue Beeswax to Connect with Impala (Optional)

    You may want to connect with Impala and execute queries from the Hue
    Beeswax interface. By default, Hue Beeswax is not configured to use RPC,
    which is required for querying Impala. To enable RPC connections, you must
    add configuration settings in the *Hue Service Configuration Safety Valve
    * to add the settings to the hue.ini file. Impala requires Hue 2.1 or
    later in CDH 4.1 or later.

    *To configure Hue to connect with Impala:*

    1. In the Cloudera Manager Admin Console, go to the *Hue service >
    Configuration* page.

    2. In the *Service-Wide > Advanced* section, set the following values
    for the *Hue Service Configuration Safety Valve*:
    [beeswax]
    beeswax_server_host=<Impala daemon hostname>
    beeswax_server_port=<Impala daemon port>

    where: <Impala daemon hostname> refers to any host where any Impala
    daemon is running. Hue must connect with only one Impala daemon, so any of
    them will work. <Impala daemon port> refers to the port on that host
    to use to connect to the Impala daemon. The default is 21000.
    For example:
    [beeswax]
    beeswax_server_host=impaladhostname
    beeswax_server_port=21000
    *Important*
    If Hue and the Impala daemon are installed on the same host, and you
    are using CDH4.1, then you must add the beeswax_meta_server_only=9004configuration value to the
    *Hue Service Configuration Safety Valve* as shown below to avoid a
    port conflict in Hue:
    [beeswax]
    beeswax_server_host=<Impala daemon hostname>
    beeswax_server_port=<Impala daemon port>
    beeswax_meta_server_only=9004
    3. Click *Save*.

    4. Click *Services* and click the Hue service. Click *Actions* and
    click *Restart*.

    For information about using Beeswax for queries, see Beeswax<https://ccp.cloudera.com/display/CDH4DOC/Beeswax>.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedFeb 1, '13 at 10:46a
activeFeb 1, '13 at 7:21p
posts2
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Saurav Sinha: 2 posts

People

Translate

site design / logo © 2022 Grokbase