I noticed that when Impala reads Parquet file, it ignores splits whose
offset are not 0. Whereas ParquetMR are reading all splits.
I understand the reason is that Impala reads the whole file instead for
better sequential IO performance.
Can anyone tell me what the concept of splits in Impala parquet scanner and
why it is there? (if we read whole files, why do we still introduce splits)
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.