Hi DK,
First, there is no requirement that the machine running statestored is the
same machine you run queries on. You can actually run queries targeting any
impalad instance in your cluster. The state store can run on any machine in
the cluster. However, we recommend that you run the state store on a
different machine than the NameNode due to the NameNode's memory
requirements. For optimal performance, it is also critical that there is an
impalad instance installed and running on every DataNode in the cluster.

Let me know if you have any additional questions.

Software Engineer - Cloudera

On Thu, Feb 14, 2013 at 11:48 AM, DK wrote:

Thanks Lenni !
This is helpful just wanted to confirm a full deployment in a distributed
1 server - Resource Manager
1 Server - Name Node
1 Server - Secondary NameNode
1 Server at minimum to host Hive and MySQL Server can be anywhere(this can
go on any of the three above)
Rest of the servers with CDH components and impala
Also from wherever the query will be invoked needs the statestored to be

Do you think this is the right deployment planning would be ?


On Wednesday, February 13, 2013 11:42:45 AM UTC-8, lskuff wrote:

Hi DK,
Answered your questions inline

Software Engineer - Cloudera
On Wed, Feb 13, 2013 at 11:35 AM, DK wrote:

Hi All,

Trying to install/setup impala on multi machine cluster in distributed
mode. I have following question related to hive:
1. I understand hive installation is mandatory on all nodes.
Hive is *not* required on all nodes. Currently, Impala does not
support DDL syntax (CREATE/ALTER DATABASE/TABLE) so you only need Hive
installed on nodes where you want to perform these operations.

2. Wondering if Hive need to be setup on only one node with a mysql
installation or need to be done on every node.
You only need a single mysql metastore installation. Every Impalad
instances should be configured to point to the same metastore. Note that
other databases (such as postgres) can also be used as a hive metastore,
there is no specific requirement on mysql.

3. For 2 I am guessing one metastore is fine because they all connect to
the same hadoop nodes and will have the same information.

Thanks for your help in advance.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 6 | next ›
Discussion Overview
groupimpala-user @
postedFeb 13, '13 at 7:42p
activeFeb 18, '13 at 4:43a



site design / logo © 2021 Grokbase