FAQ
Whoops, got the flag wrong. I meant to say DISABLE_CODEGEN=true.

On Wed, Oct 30, 2013 at 11:10 AM, Keith Simmons wrote:

The performance issues where due to the fixed overhead of llvm code
generation. They go away if I set DISABLE_CODEGEN=false. I'll need to do
some more testing to find the inflection point regarding data size vs
codegen overhead, but it's handy I can set this on a per query basis.

On Wed, Oct 30, 2013 at 10:22 AM, Alex Behm wrote:

Hi Keith,

I'd recommend having a look at the explain plan of those queries to
understand how they are executed. You may look at the query plans
through Impala's Web UI (running on each Impalad on port 25000 by
default).

If your experiments consist of exactly those queries and data you
pasted, I am doubtful they are representative of queries on more
reasonable amounts of data. What are you trying to measure?
Perhaps the time is dominated by query
parsing/analysis/planning/shipping and that's why you see such a
steady increase in time with more select blocks in the query.

If you are having performance issues with unions on larger amounts of
data, please provide the query profile (accessible from the Impala Web
UI) so we can better assist you.

Cheers,

Alex



On Wed, Oct 23, 2013 at 3:57 PM, wrote:
I've noticed an unexpected performance issue with impala. I expected
multiple "union all" select statements to be run by child daemons in
parallel, then combined. However, based on the timings, they seem to be run
sequentially. Is this correct? Here's an example.

Say I have the following very simple table:

create table foo(bar string) stored as parquetfile;

I then insert a single row:

insert into foo values('something');

Now if I issue the two queries, the latter will take roughly 2x as long as
the first:

select sum(c) from (select count(*) as c from foo) as blah

select sum(c) from ((select count(*) as c from foo) union all (select
count(*) as c from foo)) as blah.

Likewise, this will take roughly 4x as long as the first:

select sum(c) from ((select count(*) as c from foo) union all (select
count(*) as c from foo) union all (select count(*) as c from foo) union all
(select count(*) as c from foo)) as blah

I have a 4 machine cluster w/ ec2 xlarge instances and 4 disks each. Is
this expected behavior, or am I missing something?

Keith

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 6 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedOct 23, '13 at 10:57p
activeOct 30, '13 at 6:12p
posts6
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase