Grokbase Groups Hive user July 2009
FAQ
Is there an UPDATE statement in Hive? If not, are there any plans for adding
support for it in the future?

This is why I ask: I want to maintain a table which, against each user ID,
stores the first visit & last visit time. This is across the entire year,
not a day -- basically to understand how many visitors we got in last 1/3/6
months, etc.

I can add new users into a separate partition to get around the limitation
of not being able to append rows to a table. However, I don't know how to
update the last_visited_at column for each user?

Is this best achieved by storing this table outside of Hive in a traditional
RDBMS? Using JDBC query Hive for a list of distinct visitors today and based
on that list update the 'external' table.

Saurabh.

Search Discussions

  • Ashish Thusoo at Jul 28, 2009 at 2:02 pm
    There is no update statement at this time and as there is no update of a file in hadoop and update in Hive though possible would just be syntax sugar for merging the new values to the old data in the table and then rewriting the table with the merged output. This can be achieved by doing an insert overwrite on the old table from the results of the merge done by a left outer join on the old table and the new data staged in another table. Also note that when you are updating the table, current queries running on the table may fail.

    Another option is to change your schema so that the table actually contains the changes to the row instead of the row values themselves and then change the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for adding support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID, stores the first visit & last visit time. This is across the entire year, not a day -- basically to understand how many visitors we got in last 1/3/6 months, etc.

    I can add new users into a separate partition to get around the limitation of not being able to append rows to a table. However, I don't know how to update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a traditional RDBMS? Using JDBC query Hive for a list of distinct visitors today and based on that list update the 'external' table.

    Saurabh.
  • Amr Awadallah at Jul 29, 2009 at 12:07 am
    Saurabh, I think you better off with HBase for this kind of use, see:

    http://hadoop.apache.org/hbase/

    In a nutshell, HBase is a layer on top of HDFS which supports two
    things: (1) quick lookups based on keys (e.g. a userid), and (2)
    transaction semantics at the row-level (update/delete/insert values for
    a given key).

    Ashish, is there any way to run Hive queries on top of HBase? Pig has
    support for that via this patch:

    https://issues.apache.org/jira/browse/PIG-6

    -- amr

    Ashish Thusoo wrote:
    There is no update statement at this time and as there is no update of a file in hadoop and update in Hive though possible would just be syntax sugar for merging the new values to the old data in the table and then rewriting the table with the merged output. This can be achieved by doing an insert overwrite on the old table from the results of the merge done by a left outer join on the old table and the new data staged in another table. Also note that when you are updating the table, current queries running on the table may fail.

    Another option is to change your schema so that the table actually contains the changes to the row instead of the row values themselves and then change the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for adding support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID, stores the first visit & last visit time. This is across the entire year, not a day -- basically to understand how many visitors we got in last 1/3/6 months, etc.

    I can add new users into a separate partition to get around the limitation of not being able to append rows to a table. However, I don't know how to update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a traditional RDBMS? Using JDBC query Hive for a list of distinct visitors today and based on that list update the 'external' table.

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com
  • Peter Skomoroch at Jul 29, 2009 at 12:10 am
    +1 for Hive queries on HBase - that would be a powerful combination.
    On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah wrote:

    Saurabh, I think you better off with HBase for this kind of use, see:

    http://hadoop.apache.org/hbase/

    In a nutshell, HBase is a layer on top of HDFS which supports two things:
    (1) quick lookups based on keys (e.g. a userid), and (2) transaction
    semantics at the row-level (update/delete/insert values for a given key).

    Ashish, is there any way to run Hive queries on top of HBase? Pig has
    support for that via this patch:

    https://issues.apache.org/jira/browse/PIG-6

    -- amr


    Ashish Thusoo wrote:
    There is no update statement at this time and as there is no update of a
    file in hadoop and update in Hive though possible would just be syntax sugar
    for merging the new values to the old data in the table and then rewriting
    the table with the merged output. This can be achieved by doing an insert
    overwrite on the old table from the results of the merge done by a left
    outer join on the old table and the new data staged in another table. Also
    note that when you are updating the table, current queries running on the
    table may fail.

    Another option is to change your schema so that the table actually
    contains the changes to the row instead of the row values themselves and
    then change the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for
    adding support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID,
    stores the first visit & last visit time. This is across the entire year,
    not a day -- basically to understand how many visitors we got in last 1/3/6
    months, etc.

    I can add new users into a separate partition to get around the limitation
    of not being able to append rows to a table. However, I don't know how to
    update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a
    traditional RDBMS? Using JDBC query Hive for a list of distinct visitors
    today and based on that list update the 'external' table.

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com

    --
    Peter N. Skomoroch
    617.285.8348
    http://www.datawrangling.com
    http://delicious.com/pskomoroch
    http://twitter.com/peteskomoroch
  • He Yongqiang at Jul 29, 2009 at 2:03 am
    The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a
    student here in our institute, but another laboratory.
    If hive is interested in this, I will get in touch with him to see if he
    would like to do a similar contribution for hive.
    On 09-7-29 上午8:10, "Peter Skomoroch" wrote:

    +1 for Hive queries on HBase - that would be a powerful combination.
    On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah wrote:
    Saurabh, I think you better off with HBase for this kind of use, see:

    http://hadoop.apache.org/hbase/

    In a nutshell, HBase is a layer on top of HDFS which supports two things: (1)
    quick lookups based on keys (e.g. a userid), and (2) transaction semantics at
    the row-level (update/delete/insert values for a given key).

    Ashish, is there any way to run Hive queries on top of HBase? Pig has support
    for that via this patch:

    https://issues.apache.org/jira/browse/PIG-6

    -- amr


    Ashish Thusoo wrote:
    There is no update statement at this time and as there is no update of a
    file in hadoop and update in Hive though possible would just be syntax sugar
    for merging the new values to the old data in the table and then rewriting
    the table with the merged output. This can be achieved by doing an insert
    overwrite on the old table from the results of the merge done by a left
    outer join on the old table and the new data staged in another table. Also
    note that when you are updating the table, current queries running on the
    table may fail.

    Another option is to change your schema so that the table actually contains
    the changes to the row instead of the row values themselves and then change
    the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for adding
    support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID,
    stores the first visit & last visit time. This is across the entire year,
    not a day -- basically to understand how many visitors we got in last 1/3/6
    months, etc.

    I can add new users into a separate partition to get around the limitation
    of not being able to append rows to a table. However, I don't know how to
    update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a traditional
    RDBMS? Using JDBC query Hive for a list of distinct visitors today and based
    on that list update the 'external' table.

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com
    ?
  • Ashish Thusoo at Jul 29, 2009 at 2:17 am
    That would be great Youngqiang.

    Amr, we don't have that kind of support but would love to add it.

    Ashish

    ________________________________
    From: He Yongqiang
    Sent: Tuesday, July 28, 2009 7:03 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: UPDATE statement in Hive?

    The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a student here in our institute, but another laboratory.
    If hive is interested in this, I will get in touch with him to see if he would like to do a similar contribution for hive.

    On 09-7-29 上午8:10, "Peter Skomoroch" wrote:

    +1 for Hive queries on HBase - that would be a powerful combination.

    On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah wrote:
    Saurabh, I think you better off with HBase for this kind of use, see:

    http://hadoop.apache.org/hbase/

    In a nutshell, HBase is a layer on top of HDFS which supports two things: (1) quick lookups based on keys (e.g. a userid), and (2) transaction semantics at the row-level (update/delete/insert values for a given key).

    Ashish, is there any way to run Hive queries on top of HBase? Pig has support for that via this patch:

    https://issues.apache.org/jira/browse/PIG-6

    -- amr


    Ashish Thusoo wrote:
    There is no update statement at this time and as there is no update of a file in hadoop and update in Hive though possible would just be syntax sugar for merging the new values to the old data in the table and then rewriting the table with the merged output. This can be achieved by doing an insert overwrite on the old table from the results of the merge done by a left outer join on the old table and the new data staged in another table. Also note that when you are updating the table, current queries running on the table may fail.

    Another option is to change your schema so that the table actually contains the changes to the row instead of the row values themselves and then change the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for adding support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID, stores the first visit & last visit time. This is across the entire year, not a day -- basically to understand how many visitors we got in last 1/3/6 months, etc.

    I can add new users into a separate partition to get around the limitation of not being able to append rows to a table. However, I don't know how to update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a traditional RDBMS? Using JDBC query Hive for a list of distinct visitors today and based on that list update the 'external' table.

    Saurabh.
  • He Yongqiang at Jul 29, 2009 at 3:51 am
    Talked with Samuel Guo, and I am sure he will work on it soon.
    On 09-7-29 上午10:15, "Ashish Thusoo" wrote:

    That would be great Youngqiang.

    Amr, we don't have that kind of support but would love to add it.

    Ashish


    From: He Yongqiang
    Sent: Tuesday, July 28, 2009 7:03 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: UPDATE statement in Hive?

    The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a
    student here in our institute, but another laboratory.
    If hive is interested in this, I will get in touch with him to see if he would
    like to do a similar contribution for hive.
    On 09-7-29 上午8:10, "Peter Skomoroch" wrote:

    +1 for Hive queries on HBase - that would be a powerful combination.
    On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah wrote:

    Saurabh, I think you better off with HBase for this kind of use, see:

    http://hadoop.apache.org/hbase/

    In a nutshell, HBase is a layer on top of HDFS which supports two things:
    (1) quick lookups based on keys (e.g. a userid), and (2) transaction
    semantics at the row-level (update/delete/insert values for a given key).

    Ashish, is there any way to run Hive queries on top of HBase? Pig has
    support for that via this patch:

    https://issues.apache.org/jira/browse/PIG-6

    -- amr


    Ashish Thusoo wrote:
    There is no update statement at this time and as there is no update of a
    file in hadoop and update in Hive though possible would just be syntax
    sugar for merging the new values to the old data in the table and then
    rewriting the table with the merged output. This can be achieved by doing
    an insert overwrite on the old table from the results of the merge done by
    a left outer join on the old table and the new data staged in another
    table. Also note that when you are updating the table, current queries
    running on the table may fail.

    Another option is to change your schema so that the table actually
    contains the changes to the row instead of the row values themselves and
    then change the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for
    adding support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID,
    stores the first visit & last visit time. This is across the entire year,
    not a day -- basically to understand how many visitors we got in last
    1/3/6 months, etc.

    I can add new users into a separate partition to get around the limitation
    of not being able to append rows to a table. However, I don't know how to
    update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a
    traditional RDBMS? Using JDBC query Hive for a list of distinct visitors
    today and based on that list update the 'external' table.

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com
    ?

  • Abhijit Pol at Jul 29, 2009 at 4:09 am
    +1 if need more support for this feature. I think this will be very
    powerful and useful addition to HIVE.

    2009/7/28 He Yongqiang <heyongqiang@software.ict.ac.cn>:
    Talked with Samuel Guo, and I am sure he will work on it soon.

    On 09-7-29 上午10:15, "Ashish Thusoo" wrote:

    That would be great Youngqiang.

    Amr, we don't have that kind of support but would love to add it.

    Ashish

    ________________________________
    From: He Yongqiang
    Sent: Tuesday, July 28, 2009 7:03 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: UPDATE statement in Hive?

    The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a
    student here in our institute, but another laboratory.
    If hive is interested in this, I will get in touch with him to see if he
    would like to do a similar contribution for hive.

    On 09-7-29 上午8:10, "Peter Skomoroch" wrote:

    +1 for Hive queries on HBase - that would be a powerful combination.

    On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah wrote:


    Saurabh, I think you better off with HBase for this kind of use, see:

    http://hadoop.apache.org/hbase/

    In a nutshell, HBase is a layer on top of HDFS which supports two things:
    (1) quick lookups based on keys (e.g. a userid), and (2) transaction
    semantics at the row-level (update/delete/insert values for a given key).

    Ashish, is there any way to run Hive queries on top of HBase? Pig has
    support for that via this patch:

    https://issues.apache.org/jira/browse/PIG-6

    -- amr


    Ashish Thusoo wrote:


    There is no update statement at this time and as there is no update of a
    file in hadoop and update in Hive though possible would just be syntax
    sugar for merging the new values to the old data in the table and then
    rewriting the table with the merged output. This can be achieved by doing
    an insert overwrite on the old table from the results of the merge done by
    a left outer join on the old table and the new data staged in another
    table. Also note that when you are updating the table, current queries
    running on the table may fail.

    Another option is to change your schema so that the table actually contains
    the changes to the row instead of the row values themselves and then change
    the query that takes the new schema into account.

    Ashish

    ________________________________________
    From: Saurabh Nanda [saurabhnanda@gmail.com]
    Sent: Tuesday, July 28, 2009 3:41 AM
    To: hive-user@hadoop.apache.org
    Subject: UPDATE statement in Hive?

    Is there an UPDATE statement in Hive? If not, are there any plans for
    adding support for it in the future?

    This is why I ask: I want to maintain a table which, against each user ID,
    stores the first visit & last visit time. This is across the entire year,
    not a day -- basically to understand how many visitors we got in last 1/3/6
    months, etc.

    I can add new users into a separate partition to get around the limitation
    of not being able to append rows to a table. However, I don't know how to
    update the last_visited_at column for each user?

    Is this best achieved by storing this table outside of Hive in a
    traditional RDBMS? Using JDBC query Hive for a list of distinct visitors
    today and based on that list update the 'external' table.

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com
    ?


  • Saurabh Nanda at Jul 29, 2009 at 4:21 am
    Sorry for the newbie questions here, but how is this going to work? Using
    'normal' Hive queries will I be able to read & write to an HBase datastore?
    From withing the Hive CLI?
    Saurabh.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 28, '09 at 10:47a
activeJul 29, '09 at 4:21a
posts9
users6
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase