FAQ
I have an Avro table containing two rows. When I query using Impala it
doesn't see any data:

$ impala-shell -q 'select count(*) from sessions'
Connected to localhost.localdomain:21000
Server version: impalad version 1.0 RELEASE (build
d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
Query: select count(*)
from sessions
Query finished, fetching results ...
+----------+
count(*) |
+----------+
0 |
+----------+
Returned 1 row(s) in 0.26s

But if I use Hive it can see the two rows:

$ hive -e 'select count(*) from sessions'
...
OK
2
Time taken: 30.629 seconds

The data in the table looks like this:

$ hadoop fs -ls -R /tmp/data/sessions
-rw-rw-rw- 3 cloudera supergroup 1057 2013-06-03 18:39
/tmp/data/sessions/part-r-00000.avro

Any idea what is going on or how to debug? I'm using Impala 1.0 (d1bf0d1,
released April 28) on the QuickStart VM. I also tried with Impala 1.0.1 and
got the same result.

Thanks,
Tom

Search Discussions

  • Tom White at Jun 5, 2013 at 7:57 pm
    Hi Lenni,

    I tried that, but unfortunately it had no effect.

    Thanks,
    Tom
    On Wed, Jun 5, 2013 at 6:40 PM, Lenni Kuff wrote:
    Hi Tom,
    Have you tried executing a "refresh" command from the Impala shell? To do
    this just run "refresh <table name>".

    This is needed when files are added/removed external to the Impala instance.

    Thanks,
    Lenni

    On Wed, Jun 5, 2013 at 4:50 AM, wrote:

    I have an Avro table containing two rows. When I query using Impala it
    doesn't see any data:

    $ impala-shell -q 'select count(*) from sessions'
    Connected to localhost.localdomain:21000
    Server version: impalad version 1.0 RELEASE (build
    d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
    Query: select count(*)
    from sessions
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    0 |
    +----------+
    Returned 1 row(s) in 0.26s

    But if I use Hive it can see the two rows:

    $ hive -e 'select count(*) from sessions'
    ...
    OK
    2
    Time taken: 30.629 seconds

    The data in the table looks like this:

    $ hadoop fs -ls -R /tmp/data/sessions
    -rw-rw-rw- 3 cloudera supergroup 1057 2013-06-03 18:39
    /tmp/data/sessions/part-r-00000.avro

    Any idea what is going on or how to debug? I'm using Impala 1.0 (d1bf0d1,
    released April 28) on the QuickStart VM. I also tried with Impala 1.0.1 and
    got the same result.

    Thanks,
    Tom
  • Bewang Tech at Jun 5, 2013 at 10:38 pm
    I don't have problem querying the avro table in my Impala 1.0 cluster. But
    I'm really worrying about this problem because we are going to use avro to
    store data before parquet impala support is mature.

    The only problem is that I have to run the create table statement in HIVE.
    Impala doesn't support ROW FORMAT serde.

    Here are my avro table:

    create external table t_avro (
       A string,
       B int,
       C double)
    ROW FORMAT SERDE
       'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT
       'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT
       'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    LOCATION '/user/benyiw/t_avro'
    TBLPROPERTIES (
       'avro.schema.literal'='{ "type": "record",
           "name": "TestRecord",
           "version": 2,
           "fields": [
             { "name": "A", "type": ["null", "string"] },
             { "name": "B", "type": ["null", "int" ] },
             { "name": "C", "type": ["null", "double"], "default": null }
           ] }
    ');

    [:21000] > desc t_avro
    ;
    Query: describe t_avro
    Query finished, fetching results ...
    +------+--------+-------------------+
    name | type | comment |
    +------+--------+-------------------+
    a | string | from deserializer |
    b | int | from deserializer |
    c | double | from deserializer |
    +------+--------+-------------------+
    Returned 3 row(s) in 0.45s
    [:21000] > select * from t_avro;
    Query: select * from t_avro
    Query finished, fetching results ...
    +-------+---+----+
    a | b | c |
    +-------+---+----+
    Hello | 5 | 13 |
    +-------+---+----+
    Returned 1 row(s) in 0.56s
    [:21000] > select count(*) from t_avro;
    Query: select count(*) from t_avro
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    1 |
    +----------+
    Returned 1 row(s) in 0.27s

    On Wednesday, June 5, 2013 12:57:34 PM UTC-7, Tom White wrote:

    Hi Lenni,

    I tried that, but unfortunately it had no effect.

    Thanks,
    Tom
    On Wed, Jun 5, 2013 at 6:40 PM, Lenni Kuff wrote:
    Hi Tom,
    Have you tried executing a "refresh" command from the Impala shell? To do
    this just run "refresh <table name>".

    This is needed when files are added/removed external to the Impala instance.
    Thanks,
    Lenni


    On Wed, Jun 5, 2013 at 4:50 AM, <t...@cloudera.com <javascript:>>
    wrote:
    I have an Avro table containing two rows. When I query using Impala it
    doesn't see any data:

    $ impala-shell -q 'select count(*) from sessions'
    Connected to localhost.localdomain:21000
    Server version: impalad version 1.0 RELEASE (build
    d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
    Query: select count(*)
    from sessions
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    0 |
    +----------+
    Returned 1 row(s) in 0.26s

    But if I use Hive it can see the two rows:

    $ hive -e 'select count(*) from sessions'
    ...
    OK
    2
    Time taken: 30.629 seconds

    The data in the table looks like this:

    $ hadoop fs -ls -R /tmp/data/sessions
    -rw-rw-rw- 3 cloudera supergroup 1057 2013-06-03 18:39
    /tmp/data/sessions/part-r-00000.avro

    Any idea what is going on or how to debug? I'm using Impala 1.0
    (d1bf0d1,
    released April 28) on the QuickStart VM. I also tried with Impala 1.0.1
    and
    got the same result.

    Thanks,
    Tom
  • Skye Wanderman-Milne at Jun 5, 2013 at 11:24 pm
    Hi Tom,

    Can you send the impalad log file? Or if you're able to send the data file
    I'll see if I can reproduce the problem. Also, what's the Avro schema of
    your table?

    Thanks,
    Skye

    On Wed, Jun 5, 2013 at 12:57 PM, Tom White wrote:

    Hi Lenni,

    I tried that, but unfortunately it had no effect.

    Thanks,
    Tom
    On Wed, Jun 5, 2013 at 6:40 PM, Lenni Kuff wrote:
    Hi Tom,
    Have you tried executing a "refresh" command from the Impala shell? To do
    this just run "refresh <table name>".

    This is needed when files are added/removed external to the Impala instance.
    Thanks,
    Lenni

    On Wed, Jun 5, 2013 at 4:50 AM, wrote:

    I have an Avro table containing two rows. When I query using Impala it
    doesn't see any data:

    $ impala-shell -q 'select count(*) from sessions'
    Connected to localhost.localdomain:21000
    Server version: impalad version 1.0 RELEASE (build
    d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
    Query: select count(*)
    from sessions
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    0 |
    +----------+
    Returned 1 row(s) in 0.26s

    But if I use Hive it can see the two rows:

    $ hive -e 'select count(*) from sessions'
    ...
    OK
    2
    Time taken: 30.629 seconds

    The data in the table looks like this:

    $ hadoop fs -ls -R /tmp/data/sessions
    -rw-rw-rw- 3 cloudera supergroup 1057 2013-06-03 18:39
    /tmp/data/sessions/part-r-00000.avro

    Any idea what is going on or how to debug? I'm using Impala 1.0
    (d1bf0d1,
    released April 28) on the QuickStart VM. I also tried with Impala 1.0.1
    and
    got the same result.

    Thanks,
    Tom
  • Tom White at Jun 6, 2013 at 10:14 am
    I think I found what the underlying problem is - it looks like a
    limitation in the Impala Avro schema parser not being able to parse

    { "name": "A", "type":{"type":"string"} }

    The more common case of

    { "name": "A", "type":"string" }

    works fine. The nested case is typically used when you want to add
    extra properties to the type, e.g. the Java code generator adds
    "avro.java.string":"String" to say that the type is a String (rather
    than Avro Java's Utf8 type).

    I opened https://issues.cloudera.org/browse/IMPALA-399 - hopefully
    someone can fix it soon :)

    Thanks bewang for the working example - that helped me narrow down the problem.

    Cheers,
    Tom
    On Thu, Jun 6, 2013 at 12:23 AM, Skye Wanderman-Milne wrote:
    Hi Tom,

    Can you send the impalad log file? Or if you're able to send the data file
    I'll see if I can reproduce the problem. Also, what's the Avro schema of
    your table?

    Thanks,
    Skye

    On Wed, Jun 5, 2013 at 12:57 PM, Tom White wrote:

    Hi Lenni,

    I tried that, but unfortunately it had no effect.

    Thanks,
    Tom
    On Wed, Jun 5, 2013 at 6:40 PM, Lenni Kuff wrote:
    Hi Tom,
    Have you tried executing a "refresh" command from the Impala shell? To
    do
    this just run "refresh <table name>".

    This is needed when files are added/removed external to the Impala
    instance.

    Thanks,
    Lenni

    On Wed, Jun 5, 2013 at 4:50 AM, wrote:

    I have an Avro table containing two rows. When I query using Impala it
    doesn't see any data:

    $ impala-shell -q 'select count(*) from sessions'
    Connected to localhost.localdomain:21000
    Server version: impalad version 1.0 RELEASE (build
    d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
    Query: select count(*)
    from sessions
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    0 |
    +----------+
    Returned 1 row(s) in 0.26s

    But if I use Hive it can see the two rows:

    $ hive -e 'select count(*) from sessions'
    ...
    OK
    2
    Time taken: 30.629 seconds

    The data in the table looks like this:

    $ hadoop fs -ls -R /tmp/data/sessions
    -rw-rw-rw- 3 cloudera supergroup 1057 2013-06-03 18:39
    /tmp/data/sessions/part-r-00000.avro

    Any idea what is going on or how to debug? I'm using Impala 1.0
    (d1bf0d1,
    released April 28) on the QuickStart VM. I also tried with Impala 1.0.1
    and
    got the same result.

    Thanks,
    Tom

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 5, '13 at 11:50a
activeJun 6, '13 at 10:14a
posts5
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase