FAQ
I find the "-*num_threads_per_disk*" option in Impala which determines the
maximum number of the threads per disk. (The default value is 1)

I am not quite sure, but increasing this option value might help to read
the column chunks in parallel. Therefore, the query latency will be lowered.

Am I right, experts?

Thanks


On Saturday, May 25, 2013 1:18:02 AM UTC+9, gerrard...@gmail.com wrote:

Hi,

When I run a select query on data in parquet with ~50 million rows and 10
columns I get much worse performance as I select more columns in the row.
Suppose the following query returns 3 rows:

select a from table where a = 12345;

This query returns in 2 seconds. Then if I query:

select a, b from table where a = 12345;

the query returns in 4 seconds and so on. Is this expected behaviour as
parquet is a columnar store? Is there a way to optimise this?

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 25, '13 at 5:08a
activeMay 25, '13 at 5:08a
posts1
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Jung-Yup Lee: 1 post

People

Translate

site design / logo © 2022 Grokbase