On 26.08.2011 01:00, Ashutosh Chauhan wrote:Christian,
Looks like its not possible to do the setup that you are looking for.
Problem arises since HiveServer extends HMSHandler directly instead of
accessing Metastore through HiveMetaStoreClient and because of this
metastore thrift interface is missed entirely. Hiveserver will contact
mysql directly and won't go through external metastore service as you
have in your diagram. If you consider this as a blocker, please open
up a jira for more discussion.
Hope it helps,
Ashutosh
On Wed, Aug 24, 2011 at 23:21, Christian Kurz wrote:
Thanks, Edward and Ashutosh
Ashutosh,
yes, I do not understand why the service "hiveserver" still uses a
Derby instance even through it should be talking to the service
"metastore". Btw, if I run the hiveserver without having started
the metastore service, the hiveserver complains when I try to let
it execute a HiveQL command through JDBC:
...
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Could not connect to meta store using any of
the URIs provided)
at
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
...
(full stacktrace at the end of this post)
which is exactly what I expect and which makes me somewhat
confident that I have configured things correctly.
The entire issue came up, because the hiveserver service did not
work, when started from the same directory, from which the
metastore service had been started. It turned out that this was
because both services were trying to setup a Derby instance in the
current dir and therefore ran into a file locking situation. I
have worked around this by starting the two services from
different directories, but I am worried that I'd be missing an
important point in my setup.
When I run "pfiles <pid of hiveserver>" it lists these files for
the hiveserver service (which should not need a Derby instance, as
far as I understood):
...tons of jars...
/home/hadoop/hive_admin/derby.log
/home/hadoop/hive_admin/metastore_db/log/log1.dat
/home/hadoop/hive_admin/metastore_db/dbex.lck
/home/hadoop/hive_admin/metastore_db/seg0/c191.dat
/home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
...
/home/hadoop/hive_admin/metastore_db/seg0/c431.dat
/home/hadoop/hive_admin/metastore_db/seg0/c451.dat
Any pointers appreciated. If anybody things this is a bug, I can
file one.
Thanks,
Christian
full stacktrace:
Hive history
file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
FAILED: Error in semantic analysis: Table not found weblog
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Could not connect to meta store using any of
the URIs provided)
at
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
at
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
at
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: MetaException(message:Could not connect to meta store
using any of the URIs provided)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
at
org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
at
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
... 13 more
FAILED: Error in metadata: MetaException(message:Could not connect
to meta store using any of the URIs provided)
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
On 25.08.2011 01 <tel:25.08.2011%2001>:29, Ashutosh Chauhan wrote:Edward,
Apart from recommended best practices what Christian is asking
for is why HiveServer is still trying to interact with local db
instance even after setting the config variables. AFAIK it should
not. Christian, you found that out by looking at files opened by
HiveServer jvm. Can you provide more info there like how did you
find that out and which these files are?
Ashutosh
On Wed, Aug 24, 2011 at 14:20, Edward Capriolo
wrote:
On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz
wrote:
Thanks for the quick reply, Edward
I am not sure I got you: My HiveService has been started
with hive.metastore.local=false. So shouldn't it use
thrift instead of its own local Derby instance?
Thanks,
Christian
Am 24.08.2011 um 19:33 schrieb Edward Capriolo
<edlinuxguru@gmail.com >
On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz
wrote:
Greetings,
could somebody confirm/correct my understanding of a
fully distributed Hive setup, please?
My setup is as follows
* *Java application using Hive JDBC driver
*connects to
* *hive --service hiveserver*, which connects to
* *hive --service metastore*, which uses an
embedded Derby database for metadata storage
Please find more details in the image attached.
The thing I find confusing is that JVM2 (Hive
Server) starts up a Derby database instance. I can
see that from the files the JVM has opened.
Does anybody know, why the Hive Server needs a Derby
instance even though hive-site.xml says:
hive.metastore.local=false ?
Any hints are much appreciated.
Thanks,
Christian
btw,
I have not been able to access the picture on the
wiki
<
https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
("Not permitted"; even though I have registered on
the wiki)
hive.metastore.local is really misnamed.
local=true means communicate using datanucleus/JPOX and
talking directly to the metastore.
local=false means use thrift which is essentially a
level of indirection.
Talking about HiveService can confuse things because
HiveService is a different thrift interface.
You could be setup like this:
HiveServiceClient->HiveService->metastore.local=true->derby
or
HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore
most people are setup like this:
HiveServiceClient->HiveService->metastore.local=true->mysql
cli->metastore.local=true->mysql