FAQ
Hi:

I noticed that when Impala reads Parquet file, it ignores splits whose
offset are not 0. Whereas ParquetMR are reading all splits.

I understand the reason is that Impala reads the whole file instead for
better sequential IO performance.

Can anyone tell me what the concept of splits in Impala parquet scanner and
why it is there? (if we read whole files, why do we still introduce splits)

Thanks!

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 23, '14 at 9:17p
activeApr 28, '14 at 7:20p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Xiu Guo: 1 post Nong Li: 1 post

People

Translate

site design / logo © 2022 Grokbase