Grokbase Groups Hive user March 2010
FAQ
It looks like we can add columns to existing tables via:

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type
[COMMENT col_comment], ...)

However, I see the following comment in the Hive docs:

"NOTE: These commands will only modify Hive's metadata, and will NOT
reorganize or reformat existing data. Users should make sure the actual
data layout conforms with the metadata definition."


Question: If we already have a table that has lots of data in it, and
I execute the above statement to add a column, will I still be able to
query existing data? Or do I need to re-import somehow all of the data
and fill in a value for the new column? The idea is to be able to add
a new column, and make sure that the column value exists for all NEW
partitions in the same table. I would hate to have to reload all of
the old data just to specify a NULL value for the new column.

Will this work as expected or a data re-load is necessary every time
we add a new column to be able to still query older data?

Thanks!

Ryan

Search Discussions

  • Prasad Chakka at Mar 9, 2010 at 9:25 pm
    All it says is that when you change metadata, the underlying data is not reformatted to juggle around to fit the new metadata.

    Eg.
    If your data has 3 columns and they are named a, b, c in metadata and if you replace this set of names with "new_a, d, new_b, new_c" and you shouldn't expect columns new_b, & new_c to have same values as old columns b and c.

    But in most cases you will be a column at the end of existing list and they will return null if such a column doesn't exist in data.


    ________________________________
    From: Ryan LeCompte <lecompte@gmail.com>
    Reply-To: <hive-user@hadoop.apache.org>
    Date: Tue, 9 Mar 2010 12:24:50 -0800
    To: <hive-user@hadoop.apache.org>
    Subject: Adding new columns to existing Hive tables

    It looks like we can add columns to existing tables via:

    ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

    However, I see the following comment in the Hive docs:


    "NOTE: These commands will only modify Hive's metadata, and will NOT
    reorganize or reformat existing data. Users should make sure the actual
    data layout conforms with the metadata definition."



    Question: If we already have a table that has lots of data in it, and I execute the above statement to add a column, will I still be able to query existing data? Or do I need to re-import somehow all of the data and fill in a value for the new column? The idea is to be able to add a new column, and make sure that the column value exists for all NEW partitions in the same table. I would hate to have to reload all of the old data just to specify a NULL value for the new column.


    Will this work as expected or a data re-load is necessary every time we add a new column to be able to still query older data?

    Thanks!

    Ryan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 9, '10 at 8:25p
activeMar 9, '10 at 9:25p
posts2
users2
websitehive.apache.org

2 users in discussion

Ryan LeCompte: 1 post Prasad Chakka: 1 post

People

Translate

site design / logo © 2021 Grokbase