in general hive does not offer features it can not do well. Cross joins on
any data set where one table is not very small do not scale in map reduce.
So there is not a big win for offering syntax for it.
Not talking about pig but one very common unnamed map reduce framework
offers Many features that do not paralize into map reduce. I find this
framework a total 'tease'.
On Saturday, March 17, 2012, buddhika chamith wrote:
I think matt's solution is the way to go for now. If you need some basic
understanding on how reduce and map side joins work see  whether if it
Google on "theta joins parallel database" and you will find some
interesting references. I am not aware of any tools that implement these
yet. You can also do it via a cross join followed by a filter, but again
you need special algorithms to do a cross in MapReduce, which Hive doesn't
implement yet. Seehttp://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html(search
for the section on Cross) for a discussion of how to do cross in
On Mar 13, 2012, at 10:13 AM, Tucker, Matt wrote:
For theta joins, you’ll have to convert the query to an equi-join, and
then filter for non-equality in the WHERE clause. Depending upon the size
of each table, you might consider looking at map-side joins, which will
allow for doing non-equality filters during a join before it’s passed to
From: mahsa mofidpoor
Sent: Tuesday, March 13, 2012 1:02 PM
Subject: Re: non-equality joins
Do you know exactly how an algorithm should be in order to fit in the
MapReduce framework? Could you refer me to some references?
conditions as it is very difficult to express such conditions as a
I admit, that isn't a very detailed answer, but it gives some
indication of the reason for the discrepancy between Hive and other
databases. Hive fundamentally operates on Hadoop, namely on MapReduce (we
all know this, I'm just reiterating the train of thought). The problem is
that certain algorithms are exceedingly difficult to wedge into the
That is as detailed as my personal insight can get. I've done a lot
of MapReduce programming in Hadoop but I'm not a database expert and I
don't really understand the steps involved in various kinds of table-joins,
so I don't understand the particular ways in which certain database
operations do or do not fit into MapReduce...but presumably nonequality
joins (whatever those are :-D ) are particularly difficult to MapReduceify.
On Mar 13, 2012, at 09:17 , mahsa mofidpoor wrote:
Is there a reason behind not implementing non-equality joins in
Hive? In other words, is there any usage for theta-join, if implemented?
Thank you in advance for your response,
Keith Wiley firstname.lastname@example.org keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley