FAQ
Avro in thirdparty on impala github has version 1.7.1-CDH4.2.0, and the
others in that folder use CDH4.3.0. Why avro uses 1.7.1?

I guess Impala C++ code use 1.7.1, but frontend java code will use 1.7.3
because I can find the 1.7.3.jar in 1.0 parcel

./IMPALA-1.0-1.p0.371/lib/impala/lib/avro-1.7.3.jar

Not sure if this version difference causes the following problem related to
Avro schema evolution:

I'm using avro-1.7.3.jar to generate a list of avro files to test if Impala
supports Avro schema evolution. See the attached scala file.

And all attached .avro files are copied to /user/benyiw/t_avro and table
t_avro is created in Hive using t_avro.sql.

When I query in Hive, the result is correct:

hive> desc t_avro;
OK
a string from deserializer
e double from deserializer
d boolean from deserializer
Time taken: 1.86 seconds
hive> select * from t_avro;
OK
46 6.0 true
49 9.0 false
47 7.0 false
45 5.0 NULL
48 8.0 true
Time taken: 5.207 seconds

But in Impala, the result is wrong:

[:21000] > select * from t_avro;
Query: select * from t_avro
Query finished, fetching results ...
+----+------------------------+-------+
a | e | d |
+----+------------------------+-------+
46 | 1.576104156640076e-52 | true |
49 | 9 | false |
45 | 3.09721136873014e-316 | true |
48 | 3.952525166729972e-323 | true |
47 | 3.458459520888726e-323 | false |
+----+------------------------+-------+
Returned 5 row(s) in 1.57s

Is it a bug in Impala?

Search Discussions

  • Alex Behm at Jun 6, 2013 at 5:37 pm
    Thanks for digging into this and filing a thorough bug report!

    Cheers,

    Alex
    On Thu, Jun 6, 2013 at 10:13 AM, wrote:
    Impala's hdfs-avro-scanner doesn't use Avro ResolvingDecoder, so that it
    doesn't support avro schema evolution.

    On Thursday, June 6, 2013 8:36:33 AM UTC-7, bewan...@gmail.com wrote:

    Avro in thirdparty on impala github has version 1.7.1-CDH4.2.0, and the
    others in that folder use CDH4.3.0. Why avro uses 1.7.1?

    I guess Impala C++ code use 1.7.1, but frontend java code will use 1.7.3
    because I can find the 1.7.3.jar in 1.0 parcel

    ./IMPALA-1.0-1.p0.371/lib/impala/lib/avro-1.7.3.jar

    Not sure if this version difference causes the following problem related
    to Avro schema evolution:

    I'm using avro-1.7.3.jar to generate a list of avro files to test if
    Impala supports Avro schema evolution. See the attached scala file.

    And all attached .avro files are copied to /user/benyiw/t_avro and table
    t_avro is created in Hive using t_avro.sql.

    When I query in Hive, the result is correct:

    hive> desc t_avro;
    OK
    a string from deserializer
    e double from deserializer
    d boolean from deserializer
    Time taken: 1.86 seconds
    hive> select * from t_avro;
    OK
    46 6.0 true
    49 9.0 false
    47 7.0 false
    45 5.0 NULL
    48 8.0 true
    Time taken: 5.207 seconds

    But in Impala, the result is wrong:

    [:21000] > select * from t_avro;
    Query: select * from t_avro
    Query finished, fetching results ...
    +----+------------------------+-------+
    a | e | d |
    +----+------------------------+-------+
    46 | 1.576104156640076e-52 | true |
    49 | 9 | false |
    45 | 3.09721136873014e-316 | true |
    48 | 3.952525166729972e-323 | true |
    47 | 3.458459520888726e-323 | false |
    +----+------------------------+-------+
    Returned 5 row(s) in 1.57s

    Is it a bug in Impala?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 6, '13 at 3:36p
activeJun 6, '13 at 5:37p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Alex Behm: 1 post Bewang Tech: 1 post

People

Translate

site design / logo © 2021 Grokbase