FAQ
Will this fix also address other types of aggregate push downs? For
example, sums, max, and min?

On Wed, Mar 12, 2014 at 12:08 PM, Marcel Kornacker wrote:

Gyorgy, note that this is slated to be fixed in 1.3.
On Mon, Feb 24, 2014 at 5:51 PM, Alex Behm wrote:
Hi Gyorgy,

I'm afraid you have interpreted the explain exactly right. All rows are sent
to the coordinator which then executes the count(*), i.e., the
aggregation
is not distributed properly. Clearly a bug.

I've filed https://issues.cloudera.org/browse/IMPALA-831 to track progress
on this issue. Sorry for the inconvenience.

Best regards,

Alex

On Mon, Feb 24, 2014 at 7:24 AM, György Balogh wrote:

Hi,

We are streaming data to impala with the following strategy:

- load to t1 in text or lzo text
- from time to time move from t1 to a parquet t2

To be able to query both a view is defined as: create v as select * from
t1 union all select * from t2;

However it seems that this has a serious performance penalty.

For example select count(*) from v takes orders of magnitude longer then
select count(*) from t1, and select count(*) from t2 executed one by
one.
It seems select * from t1 union all select * from t2 materialize or
transfer each record before count(*).

Is that correct? Is there any workaround? We could change our strategy
to
load into one table (with different partition for staging).

Thank you
Gyorgy

Here is the query plan:

----------------
Estimated Per-Host Requirements: Memory=1.50GB VCores=2

PLAN FRAGMENT 0
PARTITION: UNPARTITIONED

3:AGGREGATE (finalize)
output: COUNT(*)
cardinality: 0
per-host memory: unavailable
tuple ids: 4
4:EXCHANGE
cardinality: 0
per-host memory: unavailable
tuple ids: 2

PLAN FRAGMENT 1
PARTITION: RANDOM

STREAM DATA SINK
EXCHANGE ID: 4
UNPARTITIONED

6:MERGE
cardinality: 0
per-host memory: 0B
tuple ids: 2
2:SCAN HDFS
table=default.wo_loadtest1 #partitions=0/1 size=0B
table stats: unavailable
column stats: unavailable
cardinality: unavailable
per-host memory: 0B
tuple ids: 1

PLAN FRAGMENT 2
PARTITION: RANDOM

STREAM DATA SINK
EXCHANGE ID: 4
UNPARTITIONED

5:MERGE
cardinality: 0
per-host memory: 0B
tuple ids: 2
1:SCAN HDFS
table=default.loadtest1 #partitions=1/1 size=1.65GB
table stats: unavailable
column stats: unavailable
cardinality: unavailable
per-host memory: 1.50GB
tuple ids: 0
----------------

To unsubscribe from this group and stop receiving emails from it, send
an
email to impala-user+unsubscribe@cloudera.org.

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedFeb 24, '14 at 3:24p
activeMar 12, '14 at 9:19p
posts5
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase