We're using hive and impala with flume-ng, which is writing gzip compressed
sequence files to HDFS. In impala v0.1 this worked fine; in 0.3, it
doesn't, throwing "ERROR: java.lang.RuntimeException: Compressed file not
Hive tables were created with:
CREATE EXTERNAL TABLE IF NOT EXISTS ....
PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE LOCATION '...';
Hive accesses the data just fine, and as I said, v0.1 of Impala was fine as
well. Other than upgrading and restarting statestore and impalad, no
changes on my end from v0.1. Not running Cloudera Manager, but on CDH 4.1.2.
As far as I can tell
what caused this by restricting the available compression formats to LZO.
Backtrace from simple query:
I0107 19:11:52.957919 20015 impala-server.cc:863] query(): query=select
count(*) FROM logs_dash_unicorn
I0107 19:11:52.962441 20015 status.cc:36] java.lang.RuntimeException:
Compressed file not supported:
that there should be gzip support for sequence files.
Something obvious I'm missing, or might this be a regression?