FAQ
Ahoy,

I think I have Impala correctly configured, but I cannot see my hive
metadata... so maybe I don't :P

Everything is the latest CDH4 RPMs (2.0.0+552-1.cdh4.1.2.p0.27.el5) from
Cloudera's repo. CentOS 5.9, x86_64, Oracle JDK 1.6.0_32, mysql java
connector jar 5.1.22.

I have a bastion node with
* hive 0.9.0+155-1.cdh4.1.2.p0.21.el5
* hive's metastore service
* mysql 5.5.21 (from Oracle's RPMs)
* statestored

I have created some sample table from public domain weather data and can
query it from hive fine.

hive> show tables;
OK
weather_data
Time taken: 0.092 seconds


hive> SELECT count(*) FROM weather_data WHERE Max_TemperatureF > 70;
<snip>
Total MapReduce CPU Time Spent: 4 seconds 120 msec
OK
163
Time taken: 17.728 seconds

I have 12 data nodes, all running impalad, hdfs is setup as specified here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance

I have taken the step of pointing /etc/impala/conf to /etc/hadoop/conf

(root@hdp2-10001-prod-nydc1:~)# update-alternatives --display impala-conf
impala-conf - status is auto.
link currently points to /etc/hadoop/conf

Rather than keep mutliple copies of my hadoop config everywhere.

I have the mysql java connector where the docs specify it, on all nodes.

in /etc/default/impala I got

export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:-
-state_store_port=24000}
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- -use_statestore
-state_store_host=<proper fqdn of the state store node>
-state_store_port=24000 -be_port=22000 -nn=nydc1-research -nn_port=8020}

(the -nn specification above is to the service name for an HA namenode
setup.

I cannot get impala to see my metastore data.

(Build version: Impala v0.6 (720f93c) built on Sat Feb 23 18:52:43 PST 2013)
[Not connected] > connect hdp2-10001-prod-nydc1
Connected to hdp2-10001-prod-nydc1:21000
[hdp2-10001-prod-nydc1:21000] > refresh
Successfully refreshed catalog
[hdp2-10001-prod-nydc1:21000] > show tables;
Query: show tables
Query finished, fetching results ...

Returned 0 row(s) in 0.01s

Do I need hive installed with a proper config on every datanode/impalad
node?

Do the impalad nodes get metastore information as somehow proxied from the
state store, who then gets it from the metastore service, or directly from
mysql?

Or does each impalad talk directly to the mysql metastore? In which case I
will need to open up access to the mysql instance.

I'd be grateful for anyone's thoughts on what is awry.

-n

Search Discussions

  • Mwc at Mar 22, 2013 at 9:54 pm
    Hello,

    1. You do need the proper configuration, but you don't need hive installed
    separately on each datanode.
    2. Each impalad gets its catalog metadata directly from the hive metastore.

    I see that you are running Impala v0.6 and have CDH4.1.2 installed.
    Currently, v0.6 of impala works with CDH 4.2.
    There was a change between Hive versions between CDH4.1.2 and CDH4.2, which
    causes the incompatibility.
    My suggestion to get impala working would be to upgrade to CDH4.2. The rpms
    for CDH4.2 are available here:
    http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/

    Thanks,
    Miklos
    On Friday, March 22, 2013 11:45:05 AM UTC-7, Nathan Milford wrote:

    Ahoy,

    I think I have Impala correctly configured, but I cannot see my hive
    metadata... so maybe I don't :P

    Everything is the latest CDH4 RPMs (2.0.0+552-1.cdh4.1.2.p0.27.el5) from
    Cloudera's repo. CentOS 5.9, x86_64, Oracle JDK 1.6.0_32, mysql java
    connector jar 5.1.22.

    I have a bastion node with
    * hive 0.9.0+155-1.cdh4.1.2.p0.21.el5
    * hive's metastore service
    * mysql 5.5.21 (from Oracle's RPMs)
    * statestored

    I have created some sample table from public domain weather data and can
    query it from hive fine.

    hive> show tables;
    OK
    weather_data
    Time taken: 0.092 seconds


    hive> SELECT count(*) FROM weather_data WHERE Max_TemperatureF > 70;
    <snip>
    Total MapReduce CPU Time Spent: 4 seconds 120 msec
    OK
    163
    Time taken: 17.728 seconds

    I have 12 data nodes, all running impalad, hdfs is setup
    as specified here:
    https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance

    I have taken the step of pointing /etc/impala/conf to /etc/hadoop/conf

    (root@hdp2-10001-prod-nydc1:~)# update-alternatives --display impala-conf
    impala-conf - status is auto.
    link currently points to /etc/hadoop/conf

    Rather than keep mutliple copies of my hadoop config everywhere.

    I have the mysql java connector where the docs specify it, on all nodes.

    in /etc/default/impala I got

    export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:-
    -state_store_port=24000}
    export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- -use_statestore
    -state_store_host=<proper fqdn of the state store node>
    -state_store_port=24000 -be_port=22000 -nn=nydc1-research -nn_port=8020}

    (the -nn specification above is to the service name for an HA namenode
    setup.

    I cannot get impala to see my metastore data.

    (Build version: Impala v0.6 (720f93c) built on Sat Feb 23 18:52:43 PST
    2013)
    [Not connected] > connect hdp2-10001-prod-nydc1
    Connected to hdp2-10001-prod-nydc1:21000
    [hdp2-10001-prod-nydc1:21000] > refresh
    Successfully refreshed catalog
    [hdp2-10001-prod-nydc1:21000] > show tables;
    Query: show tables
    Query finished, fetching results ...

    Returned 0 row(s) in 0.01s

    Do I need hive installed with a proper config on every datanode/impalad
    node?

    Do the impalad nodes get metastore information as somehow proxied from the
    state store, who then gets it from the metastore service, or directly from
    mysql?

    Or does each impalad talk directly to the mysql metastore? In which case
    I will need to open up access to the mysql instance.

    I'd be grateful for anyone's thoughts on what is awry.

    -n

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 22, '13 at 6:45p
activeMar 22, '13 at 9:54p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Nathan Milford: 1 post Mwc: 1 post

People

Translate

site design / logo © 2022 Grokbase