I don't have problem querying the avro table in my Impala 1.0 cluster. But
I'm really worrying about this problem because we are going to use avro to
store data before parquet impala support is mature.
The only problem is that I have to run the create table statement in HIVE.
Impala doesn't support ROW FORMAT serde.
Here are my avro table:
create external table t_avro (
A string,
B int,
C double)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/benyiw/t_avro'
TBLPROPERTIES (
'avro.schema.literal'='{ "type": "record",
"name": "TestRecord",
"version": 2,
"fields": [
{ "name": "A", "type": ["null", "string"] },
{ "name": "B", "type": ["null", "int" ] },
{ "name": "C", "type": ["null", "double"], "default": null }
] }
');
[:21000] > desc t_avro
;
Query: describe t_avro
Query finished, fetching results ...
+------+--------+-------------------+
name | type | comment |
+------+--------+-------------------+
a | string | from deserializer |
b | int | from deserializer |
c | double | from deserializer |
+------+--------+-------------------+
Returned 3 row(s) in 0.45s
[:21000] > select * from t_avro;
Query: select * from t_avro
Query finished, fetching results ...
+-------+---+----+
a | b | c |
+-------+---+----+
Hello | 5 | 13 |
+-------+---+----+
Returned 1 row(s) in 0.56s
[:21000] > select count(*) from t_avro;
Query: select count(*) from t_avro
Query finished, fetching results ...
+----------+
count(*) |
+----------+
1 |
+----------+
Returned 1 row(s) in 0.27s
On Wednesday, June 5, 2013 12:57:34 PM UTC-7, Tom White wrote:Hi Lenni,
I tried that, but unfortunately it had no effect.
Thanks,
Tom
On Wed, Jun 5, 2013 at 6:40 PM, Lenni Kuff wrote:
Hi Tom,
Have you tried executing a "refresh" command from the Impala shell? To do
this just run "refresh <table name>".
This is needed when files are added/removed external to the Impala instance.
Thanks,
Lenni
On Wed, Jun 5, 2013 at 4:50 AM, <t...@cloudera.com <javascript:>>
wrote:
I have an Avro table containing two rows. When I query using Impala it
doesn't see any data:
$ impala-shell -q 'select count(*) from sessions'
Connected to localhost.localdomain:21000
Server version: impalad version 1.0 RELEASE (build
d1bf0d1dac339af3692ffa17a5e3fdae0aed751f)
Query: select count(*)
from sessions
Query finished, fetching results ...
+----------+
count(*) |
+----------+
0 |
+----------+
Returned 1 row(s) in 0.26s
But if I use Hive it can see the two rows:
$ hive -e 'select count(*) from sessions'
...
OK
2
Time taken: 30.629 seconds
The data in the table looks like this:
$ hadoop fs -ls -R /tmp/data/sessions
-rw-rw-rw- 3 cloudera supergroup 1057 2013-06-03 18:39
/tmp/data/sessions/part-r-00000.avro
Any idea what is going on or how to debug? I'm using Impala 1.0
(d1bf0d1,
released April 28) on the QuickStart VM. I also tried with Impala 1.0.1
and
got the same result.
Thanks,
Tom