instead of snappy compressed text files but as everything is in production
not possible to change right now.
'Text files are not splittable' is not issue for us because the way we have
partitioned data, after applying compression one file is not crossing block
size(128 MB) in 95% cases.
Thanks for your suggestion.
Regards,
Nishant
On Thu, Sep 12, 2013 at 9:25 AM, Marcel Kornacker wrote:
On Wed, Sep 11, 2013 at 8:40 PM, Nishant Patel
wrote:
problem with text files is that they're not splittable.
if
and
RCFiles,
GZIP
require
block
with
If you
text is
level
files and I
available
is
files,
directory"
Probably
obviously there
To unsubscribe from this group and stop receiving emails from it, sendan
an
To unsubscribe from this group and stop receiving emails from it, sendan
--
Regards,
Nishant Patel
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
On Wed, Sep 11, 2013 at 8:40 PM, Nishant Patel
wrote:
Any timeline for supporting snappy compressed text files? We wont be able to
migrate ti impala as all our tables has snappy compressed text files :(
What's the reason for not using snappy-compressed sequence files? Themigrate ti impala as all our tables has snappy compressed text files :(
problem with text files is that they're not splittable.
Regards,
Nishant
On Thu, Sep 12, 2013 at 12:56 AM, Justin Erickson <justin@cloudera.com>
wrote:
http://stackoverflow.com/questions/11229272/hadoop-mr-better-to-have-compressed-input-files-or-raw-filesNishant
On Thu, Sep 12, 2013 at 12:56 AM, Justin Erickson <justin@cloudera.com>
wrote:
Despite the limitations it is part of the roadmap but we don't have a
timeline for this. Note that even when this is supported we'll recommend
against it for the CPU and network performance issues described earlier.
On Fri, Aug 23, 2013 at 3:18 PM, John Russell <jrussell@cloudera.com>
wrote:
timeline for this. Note that even when this is supported we'll recommend
against it for the CPU and network performance issues described earlier.
On Fri, Aug 23, 2013 at 3:18 PM, John Russell <jrussell@cloudera.com>
wrote:
I believe there is an architectural mismatch with Hadoop generally, in
that gzipped files aren't "splittable" the same way LZO files are, so
there's less opportunity for dividing up the work in parallel.
Some background:
that gzipped files aren't "splittable" the same way LZO files are, so
there's less opportunity for dividing up the work in parallel.
Some background:
I see the idea of splittable gzip has been explored, but I don't know
that ever went anywhere:
https://issues.apache.org/jira/browse/MAPREDUCE-491
John
On Aug 21, 2013, at 6:19 AM, Jon Bjarnason wrote:
Has there been any movement on this? We are using gzipped text files
https://issues.apache.org/jira/browse/MAPREDUCE-491
John
On Aug 21, 2013, at 6:19 AM, Jon Bjarnason wrote:
Has there been any movement on this? We are using gzipped text files
are very happy with it. This is a blocker for moving into Impala.
There is a lot of native support for gzipped files in hdfs and it seems
odd that Impala doesn't support it.
Thanks,
Jon
There is a lot of native support for gzipped files in hdfs and it seems
odd that Impala doesn't support it.
Thanks,
Jon
On Wednesday, April 17, 2013 11:06:06 PM UTC, Justin Erickson wrote:
Impala supports LZO-compressed and uncompressed text files. GZIP is
currently supported with splittable formats such as SequenceFiles,
Impala supports LZO-compressed and uncompressed text files. GZIP is
currently supported with splittable formats such as SequenceFiles,
etc.
In general, even with just MapReduce, we'd recommend against using
In general, even with just MapReduce, we'd recommend against using
compressed text files for the following reasons:
* GZIP with a non-splittable file format (i.e. text files) will
* GZIP with a non-splittable file format (i.e. text files) will
remote reads to process the entire file for files larger than an HDFS
* GZIP is a very CPU-expensive compression codec optimized for storage
density above performance so it will often be a performance bottleneck
For better performance, we recommend using a splittable file format
density above performance so it will often be a performance bottleneck
For better performance, we recommend using a splittable file format
Snappy compression such as Snappy-compressed Avro or SequenceFiles.
need to use text files for external accessibility, LZO-compressed
probably your best choice.
That said, we do have GZIP compression for text files as part of our
roadmap considerations but I don't have a timeline given it's current
That said, we do have GZIP compression for text files as part of our
roadmap considerations but I don't have a timeline given it's current
of feedback relative to other higher priority items.
Thanks,
Justin
On Wed, Apr 17, 2013 at 2:44 PM, Josh Hansen <hansen....@gmail.com>
wrote:
Thanks,
Justin
On Wed, Apr 17, 2013 at 2:44 PM, Josh Hansen <hansen....@gmail.com>
wrote:
Impala 0.7.1 fails to query an external table backed by files ending
with a .sql.gz extension. These are gzipped tab-separated value
with a .sql.gz extension. These are gzipped tab-separated value
can successfully query them with Hive.
Output:
$ impala-shell
Connected to $HOST:21000
Unable to load history: [Errno 2] No such file or directory
Welcome to the Impala shell. Press TAB twice to see a list of
Output:
$ impala-shell
Connected to $HOST:21000
Unable to load history: [Errno 2] No such file or directory
Welcome to the Impala shell. Press TAB twice to see a list of
commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Build version: Impala v0.7.1 (70cfa54) built on Tue Apr 16 22:10:43
PDT 2013)
[$HOST:21000] > select * from exampletable limit 10;
Query: select * from exampletable limit 10
ERROR: AnalysisException: Failed to load metadata for table:
exampletable
CAUSED BY: TableLoadingException: Failed to load metadata for table:
exampletable
CAUSED BY: RuntimeException: Compressed text files are not supported:
hdfs://$HOST:8020/path/to/file.sql.gz
[$HOST:21000] >
Apparently there have been issues in this area before (IMPALA-14) -
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Build version: Impala v0.7.1 (70cfa54) built on Tue Apr 16 22:10:43
PDT 2013)
[$HOST:21000] > select * from exampletable limit 10;
Query: select * from exampletable limit 10
ERROR: AnalysisException: Failed to load metadata for table:
exampletable
CAUSED BY: TableLoadingException: Failed to load metadata for table:
exampletable
CAUSED BY: RuntimeException: Compressed text files are not supported:
hdfs://$HOST:8020/path/to/file.sql.gz
[$HOST:21000] >
Apparently there have been issues in this area before (IMPALA-14) -
there some connection? That issue seems to imply support for gzipped
but apparently that is no longer the case. Regression?
Cluster is CDH4.2.0 installed using parcels from Cloudera Manager 4.5
BTW, the "Unable to load history: [Errno 2] No such file or
Cluster is CDH4.2.0 installed using parcels from Cloudera Manager 4.5
BTW, the "Unable to load history: [Errno 2] No such file or
seems to only appear on the first invocation of impala-shell.
shouldn't be considered an error at all in that case, since
would be no history on the first invocation.
- Josh
- Josh
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
To unsubscribe from this group and stop receiving emails from it, send
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
email to impala-user+unsubscribe@cloudera.org.
--
Regards,
Nishant Patel
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
email to impala-user+unsubscribe@cloudera.org.
--
Regards,
Nishant Patel
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.