FAQ
https://issues.apache.org/jira/browse/HIVE-1293 : Is this JIRA truly fixed and included in 0.7.0?
If so, can the patch be applied separately on top of 0.5.0 or 0.6.0?
Are there instructions somewhere for how to enable/integrate Zookeeper with Hive for this patch to work?
The JIRA comments indicate the patch was tested and committed, however the wiki that the JIRA points to http://wiki.apache.org/hadoop/Hive/Locking implies concurrency will not be supported. Hence the confusion.
Is there a simple way in Hive to query which tables are currently being accessed?

More detail:
What I'm trying to do is to do daily Sqoop-imports into Hive from an external database. There are jobs running on the Hive warehouse a lot of the times. I import the data into temporary tables in Hive and then want to drop the permanent tables, and rename the (just-imported) temporary ones to the permanent names WITHOUT IMPACTING THE JOBS. At the moment of course doing an ALTER TABLE RENAME results in any running jobs accessing the table to die on the next fetch. So I thought if the above JIRA was indeed fixed, then 0.7.0 should allow the job to complete before the Rename gets its X lock, or if the rename is in progress, the Job wont get its S lock until the Rename is done. However our test on 0.7.0 trunk (pulled in late September) reveals that the rename happens instantly even with a query accessing the table, not waiting for any locks.

Barring this patch, are there any other ideas anyone can suggest for accomplishing what I want? Some ideas we have considered:
- Parse Hive logs/xml files looking for a tablename to determine if there is a job currently accessing the table. If not, then rename.
- Create views on temporary tables named by day. Have jobs go against the views. When we are ready to rename, basically replace the view, pointing it now to the new table of today. The key question here is: is the View metadata consulted only upon query startup, or is it repeatedly looked at during query execution. If only on startup, we might be able to get away this trick, until concurrency truly works.

Thanks
Jay

Search Discussions

  • Namit Jain at Jan 26, 2011 at 7:05 pm
    The patch below has been committed.


    https://issues.apache.org/jira/browse/HIVE-1865 was a follow-up patch which should help concurrency.
    I have not tried backporting the patch on hive 0.5 or hive0.6, but I don’t think it will work, since the code
    has changed significantly, and a number of bug fixes to update the inputs and outputs went in.

    By default, concurrency is disabled. If you want to enable it, you need to set: hive.support.concurrency to true


    Thanks,
    -namit


    From: Jay Ramadorai <jramadorai@tripadvisor.com
    Reply-To: <user@hive.apache.org
    Date: Wed, 26 Jan 2011 13:52:58 -0500
    To: <user@hive.apache.org
    Subject: Hive Concurrency Model - does it work?

    https://issues.apache.org/jira/browse/HIVE-1293 : Is this JIRA truly fixed and included in 0.7.0?
    If so, can the patch be applied separately on top of 0.5.0 or 0.6.0?
    Are there instructions somewhere for how to enable/integrate Zookeeper with Hive for this patch to work?
    The JIRA comments indicate the patch was tested and committed, however the wiki that the JIRA points to http://wiki.apache.org/hadoop/Hive/Locking implies concurrency will not be supported. Hence the confusion.
    Is there a simple way in Hive to query which tables are currently being accessed?

    More detail:
    What I'm trying to do is to do daily Sqoop-imports into Hive from an external database. There are jobs running on the Hive warehouse a lot of the times. I import the data into temporary tables in Hive and then want to drop the permanent tables, and rename the (just-imported) temporary ones to the permanent names WITHOUT IMPACTING THE JOBS. At the moment of course doing an ALTER TABLE RENAME results in any running jobs accessing the table to die on the next fetch. So I thought if the above JIRA was indeed fixed, then 0.7.0 should allow the job to complete before the Rename gets its X lock, or if the rename is in progress, the Job wont get its S lock until the Rename is done. However our test on 0.7.0 trunk (pulled in late September) reveals that the rename happens instantly even with a query accessing the table, not waiting for any locks.

    Barring this patch, are there any other ideas anyone can suggest for accomplishing what I want? Some ideas we have considered:
    - Parse Hive logs/xml files looking for a tablename to determine if there is a job currently accessing the table. If not, then rename.
    - Create views on temporary tables named by day. Have jobs go against the views. When we are ready to rename, basically replace the view, pointing it now to the new table of today. The key question here is: is the View metadata consulted only upon query startup, or is it repeatedly looked at during query execution. If only on startup, we might be able to get away this trick, until concurrency truly works.

    Thanks
    Jay
  • John Sichi at Jan 26, 2011 at 7:52 pm

    On Jan 26, 2011, at 10:52 AM, Jay Ramadorai wrote:
    - Create views on temporary tables named by day. Have jobs go against the views. When we are ready to rename, basically replace the view, pointing it now to the new table of today. The key question here is: is the View metadata consulted only upon query startup, or is it repeatedly looked at during query execution. If only on startup, we might be able to get away this trick, until concurrency truly works.
    View metadata is consulted only while the query is being compiled, not during execution.

    JVS

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJan 26, '11 at 6:53p
activeJan 26, '11 at 7:52p
posts3
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase