Reply:

Thinking in terms of computational power, I think that it is relevant to put

all languages on the same scale (from left to right):

Not Relationally Complete -> Relationally Complete -> Turing Complete

This is how I see it:

- Relational Completeness is a mathematical notation defining relations in a

standardized algebraic syntax, including project, select, union, intersect

etc...

- Turing completeness requires looping constructs, a infinite memory model

and conditional constructs

SQL - is relationally complete. It is more powerful than relational algebra

as it contains aggregation functions, but is certainly not Turing Complete.

Pig - The link to the article you wrote in January refers more to the

approach to data pipelines or queries. I am more concerned with the apparent

computational power of the languages. I would argue that Pig Latin *is* as

powerful as SQL, as Pig Latin provides the numeric aggregation functions

(count, average etc...) that SQL does. For this reason, I am positive that

Pig Latin is strictly more powerful than relational algebra (no aggregation

in relational algebra).

Hive QL - That's an easier one. Attempts to provide all SQL functions, and

does a good job. therefore computationally as powerful as SQL, but also

definitely not Turing Complete.

JAQL - Does provide SQL-like aggregation functions

http://code.google.com/p/jaql/wiki/Builtin_functions#agg but perhaps not all

SQL operators.

MapReduce - I agree (contrary to my diagram), it is not relationally

complete. I have come across a SIGMOD 2007 paper (written by some of your

friends at Yahoo) on Map-Reduce-Merge. This paper clearly states that

MapReduce is not relationally complete, and so I have incorporated this

knowledge into my document.

thanks Alan. As a side note, you may be interested to know that I am

publishing all of my project results, hopefully within the next 24 hours. I

will be sure to post notification of this on this pig-users mailing list.

Rob

On 22 March 2010 21:32, Alan Gates wrote:

Comments

1) "Pig Latin is equivalent to the computational power of SQL" Pig Latin

is different than SQL in that it is a dataflow language, where as SQL is a

query language (for more on this see

http://feeds.developer.yahoo.net/~r/YDNHadoop/~3/i2Cdha8kedw/comparing_pig_latin_and_sql_fo.html) So in your chart I would put Pig Latin across from relational algebra.

The translation from Pig Latin to relational algebra is nearly 1-1.

2) MR should not be across from Relational Algebra. MR does not itself

present a relational algebra. It is possible to implement relational

algebras over MR, as Pig, Hive, and JAQL all do. Evaluated just as MR I

would argue that it is not relationally complete because it presents only

grouping and sorting options. It is straight forward to implement

projection and filtering via the Map function, but this is not natively

provided. Joins can be implemented via Reduce, but doing so is not trivial.

At the same time, since MR is written in Java, which is definitely Turing

complete, then MR must also be Turning complete.

3) I'm not sure where MR goes in your chart, but the bottom seems the wrong

spot. By definition, anything you can implement in JAQL, Pig, or Hive you

can implement in MR. So to say it is the least computationally powerful

seems wrong. Perhaps I don't understand what is meant by Computational

Power.

To sum up, it seems to me that you're trying to map three separate

dimensions (relational completeness, turing completeness, and SQLness) all

onto one dimension. Instead it would make more sense to me to present each

project relative to these three dimensions: JAQL (yes, yes, no), Pig (yes,

no, no), Hive (yes, no, yes), and MR (no, yes, no).

Alan.

On Mar 20, 2010, at 4:54 PM, Rob Stewart wrote:

Hi,

Comments

1) "Pig Latin is equivalent to the computational power of SQL" Pig Latin

is different than SQL in that it is a dataflow language, where as SQL is a

query language (for more on this see

http://feeds.developer.yahoo.net/~r/YDNHadoop/~3/i2Cdha8kedw/comparing_pig_latin_and_sql_fo.html) So in your chart I would put Pig Latin across from relational algebra.

The translation from Pig Latin to relational algebra is nearly 1-1.

2) MR should not be across from Relational Algebra. MR does not itself

present a relational algebra. It is possible to implement relational

algebras over MR, as Pig, Hive, and JAQL all do. Evaluated just as MR I

would argue that it is not relationally complete because it presents only

grouping and sorting options. It is straight forward to implement

projection and filtering via the Map function, but this is not natively

provided. Joins can be implemented via Reduce, but doing so is not trivial.

At the same time, since MR is written in Java, which is definitely Turing

complete, then MR must also be Turning complete.

3) I'm not sure where MR goes in your chart, but the bottom seems the wrong

spot. By definition, anything you can implement in JAQL, Pig, or Hive you

can implement in MR. So to say it is the least computationally powerful

seems wrong. Perhaps I don't understand what is meant by Computational

Power.

To sum up, it seems to me that you're trying to map three separate

dimensions (relational completeness, turing completeness, and SQLness) all

onto one dimension. Instead it would make more sense to me to present each

project relative to these three dimensions: JAQL (yes, yes, no), Pig (yes,

no, no), Hive (yes, no, yes), and MR (no, yes, no).

Alan.

On Mar 20, 2010, at 4:54 PM, Rob Stewart wrote:

Hi,

I have a question, in a remark that Alan Gates made a few months ago on

these mailing lists regarding the computability and expressibility of Pig,

Hive, and the MapReduce model.

In particular, it was a question regarding the Turing computability of

each.

Is anyone able to remark on my discussion of this in my forthcoming paper

(this is just a small extract). I am pretty confident as to where I have

put

Pig and Hive, quite confident with JAQL (I've checked it out with the JAQL

dev's), and far less sure of my assessment of the MapReduce model (only

relationally complete??)

Find it here:

http://www.macs.hw.ac.uk/~rs46/dropbox/computability.pdf

I would greatly appreciate any comments of pointers to any inaccuracies.

thanks,

Rob Stewart

these mailing lists regarding the computability and expressibility of Pig,

Hive, and the MapReduce model.

In particular, it was a question regarding the Turing computability of

each.

Is anyone able to remark on my discussion of this in my forthcoming paper

(this is just a small extract). I am pretty confident as to where I have

put

Pig and Hive, quite confident with JAQL (I've checked it out with the JAQL

dev's), and far less sure of my assessment of the MapReduce model (only

relationally complete??)

Find it here:

http://www.macs.hw.ac.uk/~rs46/dropbox/computability.pdf

I would greatly appreciate any comments of pointers to any inaccuracies.

thanks,

Rob Stewart