FAQ
We have a Parquet table for a day of activity (close to 5 billion records)
which is partitioned by minute and there are several minutes where the
total data size is over 1 GB, but when populating the table it split the
files and I can't find any files that are larger than 260 MB. Since the
HDFS block size is 1GB, this is not ideal, since the optimal Parquet file
size should also be 1GB.

Is there any setting for Impala to make sure that the files don't get split
into smaller sizes to improve I/O performance during queries?

Thank you,
Daniel

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 6 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 20, '13 at 4:17p
activeJul 11, '13 at 3:15a
posts6
users5
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase