FAQ
Hi all,

I'm testing if Impala still works with old data after adding new columns.
It does work for querying old columns but when I query the new columns, it
seems not right. I'm wondering is Impala supposed to work under such
situation? Or I should manually convert all old data to new data with null
new columns?

Files:
parquet.v1
parquet.v2 (v2 =v1 + new_column)

Query:
select count(new_column) or what ever on the new_column

Impala version: 1.3.1


Thanks!

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Tivona Hu at May 19, 2014 at 7:35 am
    By the way, I think it's supposed to work according to this:
    https://issues.cloudera.org/browse/IMPALA-779

    "Currently, the only supported schema evolution is adding columns at the
    end. This is supported in both directions.

    If you add a column to the end of the table schema (not in the file),
    Impala will populate the missing columns with NULLs."

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Nong Li at May 19, 2014 at 4:41 pm
    That should work. What's the error you see?

    On Mon, May 19, 2014 at 12:35 AM, Tivona Hu wrote:

    By the way, I think it's supposed to work according to this:
    https://issues.cloudera.org/browse/IMPALA-779

    "Currently, the only supported schema evolution is adding columns at the
    end. This is supported in both directions.

    If you add a column to the end of the table schema (not in the file),
    Impala will populate the missing columns with NULLs."

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Tivona Hu at May 21, 2014 at 9:12 am
    I've tested on tables of several rows to thousands of rows, all got the
    same result.
    It's no problem to "select *" or query on old columns with/without limit

    If I select new_column from tbl limit 10, it shows:
    +-----------+
    new_column |
    +-----------+
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    +-----------+

    But if I "select distinct(new_column) from tbl limit 10", it will hang.


    Nong於 2014年5月21日星期三UTC+8上午11時35分45秒寫道:
    How many rows in your table? Can you try select * from tbl limit 10?


    On Mon, May 19, 2014 at 6:54 PM, Tivona Hu <huti...@gmail.com<javascript:>
    wrote:
    There's no error message. But the query hangs (distinct new_column,
    count(new_column)..) without returning anything, and "select new_column"
    keeps printing nonstoppedly:

    ....
    NULL
    NULL
    --------------------+
    new_column |
    --------------------+
    NULL
    NULL
    NULL
    ...

    Nong於 2014年5月20日星期二UTC+8上午12時40分50秒寫道:
    That should work. What's the error you see?

    On Mon, May 19, 2014 at 12:35 AM, Tivona Hu wrote:

    By the way, I think it's supposed to work according to this:
    https://issues.cloudera.org/browse/IMPALA-779

    "Currently, the only supported schema evolution is adding columns at
    the end. This is supported in both directions.

    If you add a column to the end of the table schema (not in the file),
    Impala will populate the missing columns with NULLs."

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Nong Li at May 23, 2014 at 7:03 pm
    Thanks for the detailed repro. I've filed
    https://issues.cloudera.org/browse/IMPALA-1016 to track the issue.

    On Fri, May 23, 2014 at 9:31 AM, Tivona Hu wrote:

    I've created a very simple test case to reproduce this issue:

    Impala commands:
    1. create table schema_test2 (string1 string) stored as parquet;
    2. insert into schema_test2 values ('test'); --> all queries work fine
    after this
    3. alter table schema_test2 add columns (string2 string);
    4. select * from schema_test2 --> works fine
    5. select string2 from schema.test2 -> prints "NULL" unstoppedly
    6. select distinct string2 from schema.test2 -> query hangs

    I'm wondering if anyone can also reproduce this or I made something wrong
    here?
    BTW, my Impala ver. is 1.3.1.


    Tivona Hu於 2014年5月21日星期三UTC+8下午5時12分38秒寫道:
    I've tested on tables of several rows to thousands of rows, all got the
    same result.
    It's no problem to "select *" or query on old columns with/without limit

    If I select new_column from tbl limit 10, it shows:
    +-----------+
    new_column |
    +-----------+
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    NULL |
    +-----------+

    But if I "select distinct(new_column) from tbl limit 10", it will hang.


    Nong於 2014年5月21日星期三UTC+8上午11時35分45秒寫道:
    How many rows in your table? Can you try select * from tbl limit 10?

    On Mon, May 19, 2014 at 6:54 PM, Tivona Hu wrote:

    There's no error message. But the query hangs (distinct new_column,
    count(new_column)..) without returning anything, and "select new_column"
    keeps printing nonstoppedly:

    ....
    NULL
    NULL
    --------------------+
    new_column |
    --------------------+
    NULL
    NULL
    NULL
    ...

    Nong於 2014年5月20日星期二UTC+8上午12時40分50秒寫道:
    That should work. What's the error you see?

    On Mon, May 19, 2014 at 12:35 AM, Tivona Hu wrote:

    By the way, I think it's supposed to work according to this:
    https://issues.cloudera.org/browse/IMPALA-779

    "Currently, the only supported schema evolution is adding columns at
    the end. This is supported in both directions.

    If you add a column to the end of the table schema (not in the file),
    Impala will populate the missing columns with NULLs."

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 19, '14 at 4:15a
activeMay 23, '14 at 7:03p
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Tivona Hu: 3 posts Nong Li: 2 posts

People

Translate

site design / logo © 2022 Grokbase