Ahoy,
I think I have Impala correctly configured, but I cannot see my hive
metadata... so maybe I don't :P
Everything is the latest CDH4 RPMs (2.0.0+552-1.cdh4.1.2.p0.27.el5) from
Cloudera's repo. CentOS 5.9, x86_64, Oracle JDK 1.6.0_32, mysql java
connector jar 5.1.22.
I have a bastion node with
* hive 0.9.0+155-1.cdh4.1.2.p0.21.el5
* hive's metastore service
* mysql 5.5.21 (from Oracle's RPMs)
* statestored
I have created some sample table from public domain weather data and can
query it from hive fine.
hive> show tables;
OK
weather_data
Time taken: 0.092 seconds
hive> SELECT count(*) FROM weather_data WHERE Max_TemperatureF > 70;
<snip>
Total MapReduce CPU Time Spent: 4 seconds 120 msec
OK
163
Time taken: 17.728 seconds
I have 12 data nodes, all running impalad, hdfs is setup as specified here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance
I have taken the step of pointing /etc/impala/conf to /etc/hadoop/conf
(root@hdp2-10001-prod-nydc1:~)# update-alternatives --display impala-conf
impala-conf - status is auto.
link currently points to /etc/hadoop/conf
Rather than keep mutliple copies of my hadoop config everywhere.
I have the mysql java connector where the docs specify it, on all nodes.
in /etc/default/impala I got
export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:-
-state_store_port=24000}
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- -use_statestore
-state_store_host=<proper fqdn of the state store node>
-state_store_port=24000 -be_port=22000 -nn=nydc1-research -nn_port=8020}
(the -nn specification above is to the service name for an HA namenode
setup.
I cannot get impala to see my metastore data.
(Build version: Impala v0.6 (720f93c) built on Sat Feb 23 18:52:43 PST 2013)
[Not connected] > connect hdp2-10001-prod-nydc1
Connected to hdp2-10001-prod-nydc1:21000
[hdp2-10001-prod-nydc1:21000] > refresh
Successfully refreshed catalog
[hdp2-10001-prod-nydc1:21000] > show tables;
Query: show tables
Query finished, fetching results ...
Returned 0 row(s) in 0.01s
Do I need hive installed with a proper config on every datanode/impalad
node?
Do the impalad nodes get metastore information as somehow proxied from the
state store, who then gets it from the metastore service, or directly from
mysql?
Or does each impalad talk directly to the mysql metastore? In which case I
will need to open up access to the mysql instance.
I'd be grateful for anyone's thoughts on what is awry.
-n