Grokbase Groups Pig user March 2010
FAQ
Hi,

I have a question, in a remark that Alan Gates made a few months ago on
these mailing lists regarding the computability and expressibility of Pig,
Hive, and the MapReduce model.

In particular, it was a question regarding the Turing computability of each.

Is anyone able to remark on my discussion of this in my forthcoming paper
(this is just a small extract). I am pretty confident as to where I have put
Pig and Hive, quite confident with JAQL (I've checked it out with the JAQL
dev's), and far less sure of my assessment of the MapReduce model (only
relationally complete??)

Find it here:
http://www.macs.hw.ac.uk/~rs46/dropbox/computability.pdf


I would greatly appreciate any comments of pointers to any inaccuracies.


thanks,


Rob Stewart

Search Discussions

  • Alan Gates at Mar 22, 2010 at 9:33 pm
    Comments

    1) "Pig Latin is equivalent to the computational power of SQL" Pig
    Latin is different than SQL in that it is a dataflow language, where
    as SQL is a query language (for more on this see http://feeds.developer.yahoo.net/~r/YDNHadoop/
    ~3/i2Cdha8kedw/comparing_pig_latin_and_sql_fo.html ) So in your chart
    I would put Pig Latin across from relational algebra. The translation
    from Pig Latin to relational algebra is nearly 1-1.

    2) MR should not be across from Relational Algebra. MR does not
    itself present a relational algebra. It is possible to implement
    relational algebras over MR, as Pig, Hive, and JAQL all do. Evaluated
    just as MR I would argue that it is not relationally complete because
    it presents only grouping and sorting options. It is straight forward
    to implement projection and filtering via the Map function, but this
    is not natively provided. Joins can be implemented via Reduce, but
    doing so is not trivial. At the same time, since MR is written in
    Java, which is definitely Turing complete, then MR must also be
    Turning complete.

    3) I'm not sure where MR goes in your chart, but the bottom seems the
    wrong spot. By definition, anything you can implement in JAQL, Pig,
    or Hive you can implement in MR. So to say it is the least
    computationally powerful seems wrong. Perhaps I don't understand what
    is meant by Computational Power.

    To sum up, it seems to me that you're trying to map three separate
    dimensions (relational completeness, turing completeness, and SQLness)
    all onto one dimension. Instead it would make more sense to me to
    present each project relative to these three dimensions: JAQL (yes,
    yes, no), Pig (yes, no, no), Hive (yes, no, yes), and MR (no, yes, no).

    Alan.
    On Mar 20, 2010, at 4:54 PM, Rob Stewart wrote:

    Hi,

    I have a question, in a remark that Alan Gates made a few months ago
    on
    these mailing lists regarding the computability and expressibility
    of Pig,
    Hive, and the MapReduce model.

    In particular, it was a question regarding the Turing computability
    of each.

    Is anyone able to remark on my discussion of this in my forthcoming
    paper
    (this is just a small extract). I am pretty confident as to where I
    have put
    Pig and Hive, quite confident with JAQL (I've checked it out with
    the JAQL
    dev's), and far less sure of my assessment of the MapReduce model
    (only
    relationally complete??)

    Find it here:
    http://www.macs.hw.ac.uk/~rs46/dropbox/computability.pdf


    I would greatly appreciate any comments of pointers to any
    inaccuracies.


    thanks,


    Rob Stewart
  • Rob Stewart at Mar 23, 2010 at 2:25 am
    Alan, great comments, thanks.

    Reply:
    Thinking in terms of computational power, I think that it is relevant to put
    all languages on the same scale (from left to right):
    Not Relationally Complete -> Relationally Complete -> Turing Complete

    This is how I see it:
    - Relational Completeness is a mathematical notation defining relations in a
    standardized algebraic syntax, including project, select, union, intersect
    etc...
    - Turing completeness requires looping constructs, a infinite memory model
    and conditional constructs

    SQL - is relationally complete. It is more powerful than relational algebra
    as it contains aggregation functions, but is certainly not Turing Complete.

    Pig - The link to the article you wrote in January refers more to the
    approach to data pipelines or queries. I am more concerned with the apparent
    computational power of the languages. I would argue that Pig Latin *is* as
    powerful as SQL, as Pig Latin provides the numeric aggregation functions
    (count, average etc...) that SQL does. For this reason, I am positive that
    Pig Latin is strictly more powerful than relational algebra (no aggregation
    in relational algebra).

    Hive QL - That's an easier one. Attempts to provide all SQL functions, and
    does a good job. therefore computationally as powerful as SQL, but also
    definitely not Turing Complete.

    JAQL - Does provide SQL-like aggregation functions
    http://code.google.com/p/jaql/wiki/Builtin_functions#agg but perhaps not all
    SQL operators.


    MapReduce - I agree (contrary to my diagram), it is not relationally
    complete. I have come across a SIGMOD 2007 paper (written by some of your
    friends at Yahoo) on Map-Reduce-Merge. This paper clearly states that
    MapReduce is not relationally complete, and so I have incorporated this
    knowledge into my document.


    thanks Alan. As a side note, you may be interested to know that I am
    publishing all of my project results, hopefully within the next 24 hours. I
    will be sure to post notification of this on this pig-users mailing list.

    Rob



    On 22 March 2010 21:32, Alan Gates wrote:

    Comments

    1) "Pig Latin is equivalent to the computational power of SQL" Pig Latin
    is different than SQL in that it is a dataflow language, where as SQL is a
    query language (for more on this see
    http://feeds.developer.yahoo.net/~r/YDNHadoop/~3/i2Cdha8kedw/comparing_pig_latin_and_sql_fo.html) So in your chart I would put Pig Latin across from relational algebra.
    The translation from Pig Latin to relational algebra is nearly 1-1.

    2) MR should not be across from Relational Algebra. MR does not itself
    present a relational algebra. It is possible to implement relational
    algebras over MR, as Pig, Hive, and JAQL all do. Evaluated just as MR I
    would argue that it is not relationally complete because it presents only
    grouping and sorting options. It is straight forward to implement
    projection and filtering via the Map function, but this is not natively
    provided. Joins can be implemented via Reduce, but doing so is not trivial.
    At the same time, since MR is written in Java, which is definitely Turing
    complete, then MR must also be Turning complete.

    3) I'm not sure where MR goes in your chart, but the bottom seems the wrong
    spot. By definition, anything you can implement in JAQL, Pig, or Hive you
    can implement in MR. So to say it is the least computationally powerful
    seems wrong. Perhaps I don't understand what is meant by Computational
    Power.

    To sum up, it seems to me that you're trying to map three separate
    dimensions (relational completeness, turing completeness, and SQLness) all
    onto one dimension. Instead it would make more sense to me to present each
    project relative to these three dimensions: JAQL (yes, yes, no), Pig (yes,
    no, no), Hive (yes, no, yes), and MR (no, yes, no).

    Alan.


    On Mar 20, 2010, at 4:54 PM, Rob Stewart wrote:

    Hi,
    I have a question, in a remark that Alan Gates made a few months ago on
    these mailing lists regarding the computability and expressibility of Pig,
    Hive, and the MapReduce model.

    In particular, it was a question regarding the Turing computability of
    each.

    Is anyone able to remark on my discussion of this in my forthcoming paper
    (this is just a small extract). I am pretty confident as to where I have
    put
    Pig and Hive, quite confident with JAQL (I've checked it out with the JAQL
    dev's), and far less sure of my assessment of the MapReduce model (only
    relationally complete??)

    Find it here:
    http://www.macs.hw.ac.uk/~rs46/dropbox/computability.pdf


    I would greatly appreciate any comments of pointers to any inaccuracies.


    thanks,


    Rob Stewart

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 20, '10 at 11:55p
activeMar 23, '10 at 2:25a
posts3
users2
websitepig.apache.org

2 users in discussion

Rob Stewart: 2 posts Alan Gates: 1 post

People

Translate

site design / logo © 2021 Grokbase