FAQ
Hi,

I am getting strange results from a query which does left join of tables
with null values. Here is an example:

Data:
create table test_join_a(x integer, y integer);
create table test_join_b(x integer, y integer);
insert into test_join_a values(null, 1);
insert into test_join_a values(null, 2);
insert into test_join_b values(null, 1);
insert into test_join_b values(null, 3);

Query:
select
     *
from test_join_a a left join test_join_b b
     on (a.x = b.x or (a.x is null and b.x is null)) and a.y = b.y;

Result of query:
+------+---+------+------+
x | y | x | y |
+------+---+------+------+
NULL | 2 | NULL | NULL |
NULL | 1 | NULL | NULL |
+------+---+------+------+

I expect this as result because both tables have row (null, 1):
+------+---+------+------+
x | y | x | y |
+------+---+------+------+
NULL | 2 | NULL | NULL |
NULL | 1 | NULL | 1 |
+------+---+------+------+

Anyway, I found a workaround. I replaced (a.x = b.x or (a.x is null and b.x
is null)) with coalesce(a.x, -1) = coalesce(b.x, -1). But I would like to
know why my join from example above does not match rows with y=1 value.

Also, do you have some article about LEFT SEMI JOIN? I can't find it in
documentation, but I noticed it is different from Hive. For example, you
can't select columns from right table in Hive while it is allowed in Impala.

Thanks,
Dejan

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedNov 28, '13 at 6:49p
activeNov 28, '13 at 10:48p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Dejan Prokić: 1 post Alex Behm: 1 post

People

Translate

site design / logo © 2022 Grokbase