FAQ
Despite the limitations it is part of the roadmap but we don't have a
timeline for this. Note that even when this is supported we'll recommend
against it for the CPU and network performance issues described earlier.

On Fri, Aug 23, 2013 at 3:18 PM, John Russell wrote:

I believe there is an architectural mismatch with Hadoop generally, in
that gzipped files aren't "splittable" the same way LZO files are, so
there's less opportunity for dividing up the work in parallel.

Some background:


http://stackoverflow.com/questions/11229272/hadoop-mr-better-to-have-compressed-input-files-or-raw-files

I see the idea of splittable gzip has been explored, but I don't know if
that ever went anywhere:

https://issues.apache.org/jira/browse/MAPREDUCE-491

John

On Aug 21, 2013, at 6:19 AM, Jon Bjarnason wrote:

Has there been any movement on this? We are using gzipped text files and
are very happy with it. This is a blocker for moving into Impala.

There is a lot of native support for gzipped files in hdfs and it seems
odd that Impala doesn't support it.

Thanks,

Jon
On Wednesday, April 17, 2013 11:06:06 PM UTC, Justin Erickson wrote:

Impala supports LZO-compressed and uncompressed text files. GZIP is
currently supported with splittable formats such as SequenceFiles, RCFiles,
etc.

In general, even with just MapReduce, we'd recommend against using GZIP
compressed text files for the following reasons:
* GZIP with a non-splittable file format (i.e. text files) will require
remote reads to process the entire file for files larger than an HDFS block
* GZIP is a very CPU-expensive compression codec optimized for storage
density above performance so it will often be a performance bottleneck

For better performance, we recommend using a splittable file format with
Snappy compression such as Snappy-compressed Avro or SequenceFiles. If you
need to use text files for external accessibility, LZO-compressed text is
probably your best choice.

That said, we do have GZIP compression for text files as part of our
roadmap considerations but I don't have a timeline given it's current level
of feedback relative to other higher priority items.

Thanks,
Justin

On Wed, Apr 17, 2013 at 2:44 PM, Josh Hansen wrote:

Impala 0.7.1 fails to query an external table backed by files ending
with a .sql.gz extension. These are gzipped tab-separated value files and I
can successfully query them with Hive.

Output:

$ impala-shell
Connected to $HOST:21000
Unable to load history: [Errno 2] No such file or directory
Welcome to the Impala shell. Press TAB twice to see a list of available
commands.

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

(Build version: Impala v0.7.1 (70cfa54) built on Tue Apr 16 22:10:43 PDT
2013)

[$HOST:21000] > select * from exampletable limit 10;
Query: select * from exampletable limit 10
ERROR: AnalysisException: Failed to load metadata for table: exampletable
CAUSED BY: TableLoadingException: Failed to load metadata for table:
exampletable
CAUSED BY: RuntimeException: Compressed text files are not supported:
hdfs://$HOST:8020/path/to/**file.sql.gz
[$HOST:21000] >

Apparently there have been issues in this area before (IMPALA-14<https://issues.cloudera.org/browse/IMPALA-14>)
- is there some connection? That issue seems to imply support for gzipped
files, but apparently that is no longer the case. Regression?

Cluster is CDH4.2.0 installed using parcels from Cloudera Manager 4.5

BTW, the "Unable to load history: [Errno 2] No such file or directory"
seems to only appear on the first invocation of impala-shell. Probably
shouldn't be considered an error at all in that case, since obviously there
would be no history on the first invocation.
- Josh
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 7 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 17, '13 at 9:49p
activeJan 17, '14 at 9:45a
posts7
users7
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase