Grokbase Groups Hive user August 2011
FAQ
Greetings,

could somebody confirm/correct my understanding of a fully distributed
Hive setup, please?

My setup is as follows

* *Java application using Hive JDBC driver *connects to
* *hive --service hiveserver*, which connects to
* *hive --service metastore*, which uses an embedded Derby database
for metadata storage

Please find more details in the image attached.

The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
database instance. I can see that from the files the JVM has opened.

Does anybody know, why the Hive Server needs a Derby instance even
though hive-site.xml says: hive.metastore.local=false ?

Any hints are much appreciated.

Thanks,
Christian

btw,
I have not been able to access the picture on the wiki
<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
("Not permitted"; even though I have registered on the wiki)

Search Discussions

  • Edward Capriolo at Aug 24, 2011 at 5:34 pm

    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz wrote:

    Greetings,

    could somebody confirm/correct my understanding of a fully distributed Hive
    setup, please?

    My setup is as follows

    - *Java application using Hive JDBC driver *connects to
    - *hive --service hiveserver*, which connects to
    - *hive --service metastore*, which uses an embedded Derby database
    for metadata storage

    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
    database instance. I can see that from the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby instance even though
    hive-site.xml says: hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
    ("Not permitted"; even though I have registered on the wiki)

    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and talking directly to
    the metastore.

    local=false means use thrift which is essentially a level of indirection.
  • Christian Kurz at Aug 24, 2011 at 7:03 pm
    Thanks for the quick reply, Edward

    I am not sure I got you: My HiveService has been started with hive.metastore.local=false. So shouldn't it use thrift instead of its own local Derby instance?

    Thanks,
    Christian

    Am 24.08.2011 um 19:33 schrieb Edward Capriolo <edlinuxguru@gmail.com>:

    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz wrote:
    Greetings,

    could somebody confirm/correct my understanding of a fully distributed Hive setup, please?

    My setup is as follows
    Java application using Hive JDBC driver connects to
    hive --service hiveserver, which connects to
    hive --service metastore, which uses an embedded Derby database for metadata storage
    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive Server) starts up a Derby database instance. I can see that from the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby instance even though hive-site.xml says: hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the wiki. ("Not permitted"; even though I have registered on the wiki)



    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and talking directly to the metastore.

    local=false means use thrift which is essentially a level of indirection.
  • Edward Capriolo at Aug 24, 2011 at 9:21 pm

    On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz wrote:
    Thanks for the quick reply, Edward

    I am not sure I got you: My HiveService has been started with hive.metastore.local=false.
    So shouldn't it use thrift instead of its own local Derby instance?

    Thanks,
    Christian

    Am 24.08.2011 um 19:33 schrieb Edward Capriolo <edlinuxguru@gmail.com>:


    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz wrote:

    Greetings,

    could somebody confirm/correct my understanding of a fully distributed
    Hive setup, please?

    My setup is as follows

    - *Java application using Hive JDBC driver *connects to
    - *hive --service hiveserver*, which connects to
    - *hive --service metastore*, which uses an embedded Derby database
    for metadata storage

    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
    database instance. I can see that from the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby instance even though
    hive-site.xml says: hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
    ("Not permitted"; even though I have registered on the wiki)

    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and talking directly to
    the metastore.

    local=false means use thrift which is essentially a level of indirection.

    Talking about HiveService can confuse things because HiveService is a
    different thrift interface.

    You could be setup like this:
    HiveServiceClient->HiveService->metastore.local=true->derby
    or
    HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore

    most people are setup like this:

    HiveServiceClient->HiveService->metastore.local=true->mysql
    cli->metastore.local=true->mysql
  • Ashutosh Chauhan at Aug 24, 2011 at 11:29 pm
    Edward,

    Apart from recommended best practices what Christian is asking for is why
    HiveServer is still trying to interact with local db instance even after
    setting the config variables. AFAIK it should not. Christian, you found that
    out by looking at files opened by HiveServer jvm. Can you provide more info
    there like how did you find that out and which these files are?

    Ashutosh
    On Wed, Aug 24, 2011 at 14:20, Edward Capriolo wrote:


    On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz wrote:


    Thanks for the quick reply, Edward

    I am not sure I got you: My HiveService has been started with hive.metastore.local=false.
    So shouldn't it use thrift instead of its own local Derby instance?

    Thanks,
    Christian

    Am 24.08.2011 um 19:33 schrieb Edward Capriolo <edlinuxguru@gmail.com>:


    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz wrote:

    Greetings,

    could somebody confirm/correct my understanding of a fully distributed
    Hive setup, please?

    My setup is as follows

    - *Java application using Hive JDBC driver *connects to
    - *hive --service hiveserver*, which connects to
    - *hive --service metastore*, which uses an embedded Derby database
    for metadata storage

    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
    database instance. I can see that from the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby instance even though
    hive-site.xml says: hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
    ("Not permitted"; even though I have registered on the wiki)

    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and talking directly
    to the metastore.

    local=false means use thrift which is essentially a level of indirection.

    Talking about HiveService can confuse things because HiveService is a
    different thrift interface.

    You could be setup like this:
    HiveServiceClient->HiveService->metastore.local=true->derby
    or

    HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore

    most people are setup like this:

    HiveServiceClient->HiveService->metastore.local=true->mysql
    cli->metastore.local=true->mysql

  • Christian Kurz at Aug 25, 2011 at 6:22 am
    Thanks, Edward and Ashutosh

    Ashutosh,
    yes, I do not understand why the service "hiveserver" still uses a Derby
    instance even through it should be talking to the service "metastore".
    Btw, if I run the hiveserver without having started the metastore
    service, the hiveserver complains when I try to let it execute a HiveQL
    command through JDBC:

    ...
    org.apache.hadoop.hive.ql.metadata.HiveException:
    MetaException(message:Could not connect to meta store using any of the
    URIs provided)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
    ...
    (full stacktrace at the end of this post)

    which is exactly what I expect and which makes me somewhat confident
    that I have configured things correctly.

    The entire issue came up, because the hiveserver service did not work,
    when started from the same directory, from which the metastore service
    had been started. It turned out that this was because both services were
    trying to setup a Derby instance in the current dir and therefore ran
    into a file locking situation. I have worked around this by starting the
    two services from different directories, but I am worried that I'd be
    missing an important point in my setup.

    When I run "pfiles <pid of hiveserver>" it lists these files for the
    hiveserver service (which should not need a Derby instance, as far as I
    understood):
    ...tons of jars...
    /home/hadoop/hive_admin/derby.log
    /home/hadoop/hive_admin/metastore_db/log/log1.dat
    /home/hadoop/hive_admin/metastore_db/dbex.lck
    /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
    /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
    ...
    /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
    /home/hadoop/hive_admin/metastore_db/seg0/c451.dat

    Any pointers appreciated. If anybody things this is a bug, I can file one.

    Thanks,
    Christian


    full stacktrace:

    Hive history file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
    FAILED: Error in semantic analysis: Table not found weblog
    org.apache.hadoop.hive.ql.metadata.HiveException:
    MetaException(message:Could not connect to meta store using any of the
    URIs provided)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
    at
    org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
    at
    org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
    at
    org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
    at
    org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
    at
    org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
    at
    org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
    at
    org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: MetaException(message:Could not connect to meta store using
    any of the URIs provided)
    at
    org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
    at
    org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(Hive.java:1855)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
    ... 13 more
    FAILED: Error in metadata: MetaException(message:Could not connect to
    meta store using any of the URIs provided)
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.DDLTask


    On 25.08.2011 01:29, Ashutosh Chauhan wrote:
    Edward,

    Apart from recommended best practices what Christian is asking for is
    why HiveServer is still trying to interact with local db instance even
    after setting the config variables. AFAIK it should not. Christian,
    you found that out by looking at files opened by HiveServer jvm. Can
    you provide more info there like how did you find that out and which
    these files are?

    Ashutosh

    On Wed, Aug 24, 2011 at 14:20, Edward Capriolo wrote:



    On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz wrote:


    Thanks for the quick reply, Edward

    I am not sure I got you: My HiveService has been started with
    hive.metastore.local=false. So shouldn't it use thrift instead
    of its own local Derby instance?

    Thanks,
    Christian

    Am 24.08.2011 um 19:33 schrieb Edward Capriolo
    <edlinuxguru@gmail.com

    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz
    wrote:

    Greetings,

    could somebody confirm/correct my understanding of a
    fully distributed Hive setup, please?

    My setup is as follows

    * *Java application using Hive JDBC driver *connects to
    * *hive --service hiveserver*, which connects to
    * *hive --service metastore*, which uses an embedded
    Derby database for metadata storage

    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive Server)
    starts up a Derby database instance. I can see that from
    the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby
    instance even though hive-site.xml says:
    hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the wiki
    <https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
    ("Not permitted"; even though I have registered on the wiki)



    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and
    talking directly to the metastore.

    local=false means use thrift which is essentially a level of
    indirection.
    Talking about HiveService can confuse things because HiveService
    is a different thrift interface.

    You could be setup like this:
    HiveServiceClient->HiveService->metastore.local=true->derby
    or
    HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore

    most people are setup like this:

    HiveServiceClient->HiveService->metastore.local=true->mysql
    cli->metastore.local=true->mysql

  • Ashutosh Chauhan at Aug 25, 2011 at 11:00 pm
    Christian,

    Looks like its not possible to do the setup that you are looking for.
    Problem arises since HiveServer extends HMSHandler directly instead of
    accessing Metastore through HiveMetaStoreClient and because of this
    metastore thrift interface is missed entirely. Hiveserver will contact mysql
    directly and won't go through external metastore service as you have in your
    diagram. If you consider this as a blocker, please open up a jira for more
    discussion.

    Hope it helps,
    Ashutosh
    On Wed, Aug 24, 2011 at 23:21, Christian Kurz wrote:

    **
    Thanks, Edward and Ashutosh

    Ashutosh,
    yes, I do not understand why the service "hiveserver" still uses a Derby
    instance even through it should be talking to the service "metastore". Btw,
    if I run the hiveserver without having started the metastore service, the
    hiveserver complains when I try to let it execute a HiveQL command through
    JDBC:

    ...
    org.apache.hadoop.hive.ql.metadata.HiveException:
    MetaException(message:Could not connect to meta store using any of the URIs
    provided)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
    ...
    (full stacktrace at the end of this post)

    which is exactly what I expect and which makes me somewhat confident that I
    have configured things correctly.

    The entire issue came up, because the hiveserver service did not work, when
    started from the same directory, from which the metastore service had been
    started. It turned out that this was because both services were trying to
    setup a Derby instance in the current dir and therefore ran into a file
    locking situation. I have worked around this by starting the two services
    from different directories, but I am worried that I'd be missing an
    important point in my setup.

    When I run "pfiles <pid of hiveserver>" it lists these files for the
    hiveserver service (which should not need a Derby instance, as far as I
    understood):
    ...tons of jars...
    /home/hadoop/hive_admin/derby.log
    /home/hadoop/hive_admin/metastore_db/log/log1.dat
    /home/hadoop/hive_admin/metastore_db/dbex.lck
    /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
    /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
    ...
    /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
    /home/hadoop/hive_admin/metastore_db/seg0/c451.dat

    Any pointers appreciated. If anybody things this is a bug, I can file one.

    Thanks,
    Christian


    full stacktrace:

    Hive history
    file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
    FAILED: Error in semantic analysis: Table not found weblog
    org.apache.hadoop.hive.ql.metadata.HiveException:
    MetaException(message:Could not connect to meta store using any of the URIs
    provided)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
    at
    org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
    at
    org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
    at
    org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
    at
    org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
    at
    org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
    at
    org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
    at
    org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: MetaException(message:Could not connect to meta store using any
    of the URIs provided)
    at
    org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
    at
    org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
    ... 13 more
    FAILED: Error in metadata: MetaException(message:Could not connect to meta
    store using any of the URIs provided)
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.DDLTask




    On 25.08.2011 01:29, Ashutosh Chauhan wrote:

    Edward,

    Apart from recommended best practices what Christian is asking for is why
    HiveServer is still trying to interact with local db instance even after
    setting the config variables. AFAIK it should not. Christian, you found that
    out by looking at files opened by HiveServer jvm. Can you provide more info
    there like how did you find that out and which these files are?

    Ashutosh
    On Wed, Aug 24, 2011 at 14:20, Edward Capriolo wrote:


    On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz wrote:


    Thanks for the quick reply, Edward

    I am not sure I got you: My HiveService has been started with hive.metastore.local=false.
    So shouldn't it use thrift instead of its own local Derby instance?

    Thanks,
    Christian

    Am 24.08.2011 um 19:33 schrieb Edward Capriolo <edlinuxguru@gmail.com>:


    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz wrote:

    Greetings,

    could somebody confirm/correct my understanding of a fully distributed
    Hive setup, please?

    My setup is as follows

    - *Java application using Hive JDBC driver *connects to
    - *hive --service hiveserver*, which connects to
    - *hive --service metastore*, which uses an embedded Derby database
    for metadata storage

    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
    database instance. I can see that from the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby instance even
    though hive-site.xml says: hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
    ("Not permitted"; even though I have registered on the wiki)

    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and talking
    directly to the metastore.

    local=false means use thrift which is essentially a level of
    indirection.

    Talking about HiveService can confuse things because HiveService is a
    different thrift interface.

    You could be setup like this:
    HiveServiceClient->HiveService->metastore.local=true->derby
    or

    HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore

    most people are setup like this:

    HiveServiceClient->HiveService->metastore.local=true->mysql
    cli->metastore.local=true->mysql

  • Christian Kurz at Aug 26, 2011 at 9:52 am
    Ashutosh,

    thank you for the explanation. I have changed the setup from embedded
    Derby to stand-alone Derby. Neither the hiveserver nor the metastore
    service open any Derby files any longer and things are working fine.

    Thanks again for your help,
    Christian
    On 26.08.2011 01:00, Ashutosh Chauhan wrote:
    Christian,

    Looks like its not possible to do the setup that you are looking for.
    Problem arises since HiveServer extends HMSHandler directly instead of
    accessing Metastore through HiveMetaStoreClient and because of this
    metastore thrift interface is missed entirely. Hiveserver will contact
    mysql directly and won't go through external metastore service as you
    have in your diagram. If you consider this as a blocker, please open
    up a jira for more discussion.

    Hope it helps,
    Ashutosh

    On Wed, Aug 24, 2011 at 23:21, Christian Kurz wrote:

    Thanks, Edward and Ashutosh

    Ashutosh,
    yes, I do not understand why the service "hiveserver" still uses a
    Derby instance even through it should be talking to the service
    "metastore". Btw, if I run the hiveserver without having started
    the metastore service, the hiveserver complains when I try to let
    it execute a HiveQL command through JDBC:

    ...
    org.apache.hadoop.hive.ql.metadata.HiveException:
    MetaException(message:Could not connect to meta store using any of
    the URIs provided)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
    ...
    (full stacktrace at the end of this post)

    which is exactly what I expect and which makes me somewhat
    confident that I have configured things correctly.

    The entire issue came up, because the hiveserver service did not
    work, when started from the same directory, from which the
    metastore service had been started. It turned out that this was
    because both services were trying to setup a Derby instance in the
    current dir and therefore ran into a file locking situation. I
    have worked around this by starting the two services from
    different directories, but I am worried that I'd be missing an
    important point in my setup.

    When I run "pfiles <pid of hiveserver>" it lists these files for
    the hiveserver service (which should not need a Derby instance, as
    far as I understood):
    ...tons of jars...
    /home/hadoop/hive_admin/derby.log
    /home/hadoop/hive_admin/metastore_db/log/log1.dat
    /home/hadoop/hive_admin/metastore_db/dbex.lck
    /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
    /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
    ...
    /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
    /home/hadoop/hive_admin/metastore_db/seg0/c451.dat

    Any pointers appreciated. If anybody things this is a bug, I can
    file one.

    Thanks,
    Christian


    full stacktrace:

    Hive history
    file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
    FAILED: Error in semantic analysis: Table not found weblog
    org.apache.hadoop.hive.ql.metadata.HiveException:
    MetaException(message:Could not connect to meta store using any of
    the URIs provided)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
    at
    org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
    at
    org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
    at
    org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
    at
    org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
    at
    org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
    at
    org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
    at
    org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: MetaException(message:Could not connect to meta store
    using any of the URIs provided)
    at
    org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
    at
    org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
    at
    org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
    ... 13 more
    FAILED: Error in metadata: MetaException(message:Could not connect
    to meta store using any of the URIs provided)
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.DDLTask



    On 25.08.2011 01 <tel:25.08.2011%2001>:29, Ashutosh Chauhan wrote:
    Edward,

    Apart from recommended best practices what Christian is asking
    for is why HiveServer is still trying to interact with local db
    instance even after setting the config variables. AFAIK it should
    not. Christian, you found that out by looking at files opened by
    HiveServer jvm. Can you provide more info there like how did you
    find that out and which these files are?

    Ashutosh

    On Wed, Aug 24, 2011 at 14:20, Edward Capriolo
    wrote:



    On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz
    wrote:


    Thanks for the quick reply, Edward

    I am not sure I got you: My HiveService has been started
    with hive.metastore.local=false. So shouldn't it use
    thrift instead of its own local Derby instance?

    Thanks,
    Christian

    Am 24.08.2011 um 19:33 schrieb Edward Capriolo
    <edlinuxguru@gmail.com >

    On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz
    wrote:

    Greetings,

    could somebody confirm/correct my understanding of a
    fully distributed Hive setup, please?

    My setup is as follows

    * *Java application using Hive JDBC driver
    *connects to
    * *hive --service hiveserver*, which connects to
    * *hive --service metastore*, which uses an
    embedded Derby database for metadata storage

    Please find more details in the image attached.

    The thing I find confusing is that JVM2 (Hive
    Server) starts up a Derby database instance. I can
    see that from the files the JVM has opened.

    Does anybody know, why the Hive Server needs a Derby
    instance even though hive-site.xml says:
    hive.metastore.local=false ?

    Any hints are much appreciated.

    Thanks,
    Christian

    btw,
    I have not been able to access the picture on the
    wiki
    <https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
    ("Not permitted"; even though I have registered on
    the wiki)



    hive.metastore.local is really misnamed.

    local=true means communicate using datanucleus/JPOX and
    talking directly to the metastore.

    local=false means use thrift which is essentially a
    level of indirection.
    Talking about HiveService can confuse things because
    HiveService is a different thrift interface.

    You could be setup like this:
    HiveServiceClient->HiveService->metastore.local=true->derby
    or
    HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore

    most people are setup like this:

    HiveServiceClient->HiveService->metastore.local=true->mysql
    cli->metastore.local=true->mysql

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 24, '11 at 2:53p
activeAug 26, '11 at 9:52a
posts8
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase