Grokbase Groups Pig user March 2010
FAQ
Alan, great comments, thanks.

Reply:
Thinking in terms of computational power, I think that it is relevant to put
all languages on the same scale (from left to right):
Not Relationally Complete -> Relationally Complete -> Turing Complete

This is how I see it:
- Relational Completeness is a mathematical notation defining relations in a
standardized algebraic syntax, including project, select, union, intersect
etc...
- Turing completeness requires looping constructs, a infinite memory model
and conditional constructs

SQL - is relationally complete. It is more powerful than relational algebra
as it contains aggregation functions, but is certainly not Turing Complete.

Pig - The link to the article you wrote in January refers more to the
approach to data pipelines or queries. I am more concerned with the apparent
computational power of the languages. I would argue that Pig Latin *is* as
powerful as SQL, as Pig Latin provides the numeric aggregation functions
(count, average etc...) that SQL does. For this reason, I am positive that
Pig Latin is strictly more powerful than relational algebra (no aggregation
in relational algebra).

Hive QL - That's an easier one. Attempts to provide all SQL functions, and
does a good job. therefore computationally as powerful as SQL, but also
definitely not Turing Complete.

JAQL - Does provide SQL-like aggregation functions
http://code.google.com/p/jaql/wiki/Builtin_functions#agg but perhaps not all
SQL operators.


MapReduce - I agree (contrary to my diagram), it is not relationally
complete. I have come across a SIGMOD 2007 paper (written by some of your
friends at Yahoo) on Map-Reduce-Merge. This paper clearly states that
MapReduce is not relationally complete, and so I have incorporated this
knowledge into my document.


thanks Alan. As a side note, you may be interested to know that I am
publishing all of my project results, hopefully within the next 24 hours. I
will be sure to post notification of this on this pig-users mailing list.

Rob



On 22 March 2010 21:32, Alan Gates wrote:

Comments

1) "Pig Latin is equivalent to the computational power of SQL" Pig Latin
is different than SQL in that it is a dataflow language, where as SQL is a
query language (for more on this see
http://feeds.developer.yahoo.net/~r/YDNHadoop/~3/i2Cdha8kedw/comparing_pig_latin_and_sql_fo.html) So in your chart I would put Pig Latin across from relational algebra.
The translation from Pig Latin to relational algebra is nearly 1-1.

2) MR should not be across from Relational Algebra. MR does not itself
present a relational algebra. It is possible to implement relational
algebras over MR, as Pig, Hive, and JAQL all do. Evaluated just as MR I
would argue that it is not relationally complete because it presents only
grouping and sorting options. It is straight forward to implement
projection and filtering via the Map function, but this is not natively
provided. Joins can be implemented via Reduce, but doing so is not trivial.
At the same time, since MR is written in Java, which is definitely Turing
complete, then MR must also be Turning complete.

3) I'm not sure where MR goes in your chart, but the bottom seems the wrong
spot. By definition, anything you can implement in JAQL, Pig, or Hive you
can implement in MR. So to say it is the least computationally powerful
seems wrong. Perhaps I don't understand what is meant by Computational
Power.

To sum up, it seems to me that you're trying to map three separate
dimensions (relational completeness, turing completeness, and SQLness) all
onto one dimension. Instead it would make more sense to me to present each
project relative to these three dimensions: JAQL (yes, yes, no), Pig (yes,
no, no), Hive (yes, no, yes), and MR (no, yes, no).

Alan.


On Mar 20, 2010, at 4:54 PM, Rob Stewart wrote:

Hi,
I have a question, in a remark that Alan Gates made a few months ago on
these mailing lists regarding the computability and expressibility of Pig,
Hive, and the MapReduce model.

In particular, it was a question regarding the Turing computability of
each.

Is anyone able to remark on my discussion of this in my forthcoming paper
(this is just a small extract). I am pretty confident as to where I have
put
Pig and Hive, quite confident with JAQL (I've checked it out with the JAQL
dev's), and far less sure of my assessment of the MapReduce model (only
relationally complete??)

Find it here:
http://www.macs.hw.ac.uk/~rs46/dropbox/computability.pdf


I would greatly appreciate any comments of pointers to any inaccuracies.


thanks,


Rob Stewart

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 20, '10 at 11:55p
activeMar 23, '10 at 2:25a
posts3
users2
websitepig.apache.org

2 users in discussion

Rob Stewart: 2 posts Alan Gates: 1 post

People

Translate

site design / logo © 2021 Grokbase