Grokbase Groups Pig user March 2011
FAQ
Pig Users and Developers,

We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

I believe that we are ready to declare 1.0. Here are my reasons:

(1) We are mature enough and produce good quality releases
(2) Our interface no longer change in major ways
(3) We have a growing user community and we want the newcomers to know that our releases are stable
(4) If the next release is 0.10 and we decide that we should switch on the following release going from 0.10 to 1.0 will generate a lot of confusion.

I wanted to start this conversation and see what others think before deciding if it is worth while to call a vote.

Olga

Search Discussions

  • Dmitriy Ryaboy at Mar 3, 2011 at 2:31 am
    I am worried that the new optimization plan work has not had a chance to
    settle in, and we are releasing a brand new parser for the language in 0.9.
    Those are pretty significant changes, if the idea behind calling something a
    "1.0" is stability, we may want to give them a release to mature a bit. Of
    course we can just release 0.9x for a while until we feel this stuff has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Santhosh Srinivasan at Mar 3, 2011 at 2:45 am
    I am in agreement with Dmitriy. In addition, Hadoop itself has not gone 1.0 due to the lack of stable APIs. We should probably aim for 1.0 around the same time.

    Santhosh

    -----Original Message-----
    From: Dmitriy Ryaboy
    Sent: Wednesday, March 02, 2011 6:31 PM
    To: user@pig.apache.org
    Cc: Olga Natkovich
    Subject: Re: [DISCUSSION] Pig.next

    I am worried that the new optimization plan work has not had a chance to settle in, and we are releasing a brand new parser for the language in 0.9.
    Those are pretty significant changes, if the idea behind calling something a "1.0" is stability, we may want to give them a release to mature a bit. Of course we can just release 0.9x for a while until we feel this stuff has been tested in a wide enough variety of installations / hadoop configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need to
    decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Dmitriy Ryaboy at Mar 3, 2011 at 2:57 am
    by way of crazy ideas -- I kind of feel like 0.8 + a few patches might be
    our 1.0, and 0.9 can be 1.1 branch.

    D
    On Wed, Mar 2, 2011 at 6:44 PM, Santhosh Srinivasan wrote:

    I am in agreement with Dmitriy. In addition, Hadoop itself has not gone 1.0
    due to the lack of stable APIs. We should probably aim for 1.0 around the
    same time.

    Santhosh

    -----Original Message-----
    From: Dmitriy Ryaboy
    Sent: Wednesday, March 02, 2011 6:31 PM
    To: user@pig.apache.org
    Cc: Olga Natkovich
    Subject: Re: [DISCUSSION] Pig.next

    I am worried that the new optimization plan work has not had a chance to
    settle in, and we are releasing a brand new parser for the language in 0.9.
    Those are pretty significant changes, if the idea behind calling something
    a "1.0" is stability, we may want to give them a release to mature a bit. Of
    course we can just release 0.9x for a while until we feel this stuff has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need to
    decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0.
    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Santhosh Srinivasan at Mar 3, 2011 at 2:59 am
    I am not in agreement with that :)

    ________________________________
    From: Dmitriy Ryaboy
    Sent: Wednesday, March 02, 2011 6:57 PM
    To: user@pig.apache.org
    Cc: Santhosh Srinivasan; Olga Natkovich
    Subject: Re: [DISCUSSION] Pig.next

    by way of crazy ideas -- I kind of feel like 0.8 + a few patches might be our 1.0, and 0.9 can be 1.1 branch.

    D

    On Wed, Mar 2, 2011 at 6:44 PM, Santhosh Srinivasan wrote:
    I am in agreement with Dmitriy. In addition, Hadoop itself has not gone 1.0 due to the lack of stable APIs. We should probably aim for 1.0 around the same time.

    Santhosh

    -----Original Message-----
    From: Dmitriy Ryaboy
    Sent: Wednesday, March 02, 2011 6:31 PM
    To: user@pig.apache.org
    Cc: Olga Natkovich
    Subject: Re: [DISCUSSION] Pig.next

    I am worried that the new optimization plan work has not had a chance to settle in, and we are releasing a brand new parser for the language in 0.9.
    Those are pretty significant changes, if the idea behind calling something a "1.0" is stability, we may want to give them a release to mature a bit. Of course we can just release 0.9x for a while until we feel this stuff has been tested in a wide enough variety of installations / hadoop configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need to
    decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Alan Gates at Mar 3, 2011 at 6:44 pm
    I agree that there will probably need to be several 0.9.x releases as
    the new optimization and parser work mature. As a consequence of this
    it may be longer between 0.9 and Pig.next then there has been between
    the last few releases. That only delays the question of what we call
    Pig.next, it does not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product
    until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we
    have declared stable.

    Pig is in use in production in multiple places, I do not think anyone
    will argue that it is not rough around the edges, and because we have
    users who run tens of thousands of Pig jobs daily non-backward
    compatible changes are impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress
    to fix social security. I am sure they will get there, but I may be
    retired first. In all seriousness, the Hadoop project has not been
    moving with speed or agility over the last few years, and I do not
    think waiting for them to do something is a good idea. Nor do I see
    it as necessary. Before we could go 1.0 would we insist that every
    jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then
    we are to log4j. But we are still our own project. 1.0 is a claim we
    are making about ourselves, not about the platform we run on. We
    should choose our release numbering in a way that sends a clear
    message to our users, and let those same users evaluate Hadoop
    separately.

    Also the argument that we should not go 1.0 because we are changing a
    lot of things is bogus. We are always changing a lot of things. If
    1.0 means we will not make any major changes, then we will not get
    there until we go into some kinds of maintenance mode where we deem
    the majority of the work to have been done. I hope I have retired
    before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer
    inside the project. I would be interested in hearing from users and
    anyone with a more marketing oriented perspective on what message 1.0
    would send to (potential) pig users.

    Alan.
    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a
    chance to
    settle in, and we are releasing a brand new parser for the language
    in 0.9.
    Those are pretty significant changes, if the idea behind calling
    something a
    "1.0" is stability, we may want to give them a release to mature a
    bit. Of
    course we can just release 0.9x for a while until we feel this stuff
    has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need
    to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers
    to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should
    switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Santhosh Srinivasan at Mar 3, 2011 at 7:53 pm
    Hilarious.

    Getting to the serious points.

    What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything.

    1. The language syntax
    2. The language semantics
    3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.)
    4. Java APIs (PigServer, etc.)

    In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats).

    I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility.

    Santhosh

    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, March 03, 2011 10:44 AM
    To: user@pig.apache.org
    Subject: Re: [DISCUSSION] Pig.next

    I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we have declared stable.

    Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately.

    Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users.

    Alan.
    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a
    chance to
    settle in, and we are releasing a brand new parser for the language
    in 0.9.
    Those are pretty significant changes, if the idea behind calling
    something a
    "1.0" is stability, we may want to give them a release to mature a
    bit. Of
    course we can just release 0.9x for a while until we feel this stuff
    has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need
    to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers
    to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should
    switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Thejas M Nair at Mar 4, 2011 at 1:35 am
    The interfaces that pig have are at different levels of maturity, and most of the interfaces have been marked as stable or evolving to indicate that.
    Most of the core interfaces including the language, and udfs belong to the stable category. I think this is sufficient for 1.0. There will always be some new interfaces that will be in evolving category.

    The hadoop classes used by the load/store functions probably belong to the 'slowly evolving' category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 !

    Regarding the impact of big changes in 0.8 and 0.9 not having had the time to settle in, I think by the time 1.0/0.10 is ready those changes would have been well tested in all sorts of setups/configurations.

    -Thejas



    On 3/3/11 11:51 AM, "Santhosh Srinivasan" wrote:

    Hilarious.

    Getting to the serious points.

    What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything.

    1. The language syntax
    2. The language semantics
    3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.)
    4. Java APIs (PigServer, etc.)

    In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats).

    I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility.

    Santhosh

    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, March 03, 2011 10:44 AM
    To: user@pig.apache.org
    Subject: Re: [DISCUSSION] Pig.next

    I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we have declared stable.

    Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately.

    Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users.

    Alan.
    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a
    chance to
    settle in, and we are releasing a brand new parser for the language
    in 0.9.
    Those are pretty significant changes, if the idea behind calling
    something a
    "1.0" is stability, we may want to give them a release to mature a
    bit. Of
    course we can just release 0.9x for a while until we feel this stuff
    has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need
    to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers
    to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should
    switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Santhosh Srinivasan at Mar 4, 2011 at 1:51 am
    The hadoop classes used by the load/store functions probably belong to the 'slowly evolving' category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 !
    Exactly! How do you know that no changes are anticipated? We need inputs from the Hadoop team because we don't know. If they promise that these APIs will not change, lets say till mid-2012 then we should be good to go. If they say that it will change in 2011 then we will be breaking backward compatibility pretty soon.

    Santhosh

    ________________________________
    From: Thejas M Nair
    Sent: Thursday, March 03, 2011 5:35 PM
    To: user@pig.apache.org; Santhosh Srinivasan
    Subject: Re: [DISCUSSION] Pig.next

    The interfaces that pig have are at different levels of maturity, and most of the interfaces have been marked as stable or evolving to indicate that.
    Most of the core interfaces including the language, and udfs belong to the stable category. I think this is sufficient for 1.0. There will always be some new interfaces that will be in evolving category.

    The hadoop classes used by the load/store functions probably belong to the 'slowly evolving' category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 !

    Regarding the impact of big changes in 0.8 and 0.9 not having had the time to settle in, I think by the time 1.0/0.10 is ready those changes would have been well tested in all sorts of setups/configurations.

    -Thejas



    On 3/3/11 11:51 AM, "Santhosh Srinivasan" wrote:

    Hilarious.

    Getting to the serious points.

    What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything.

    1. The language syntax
    2. The language semantics
    3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.)
    4. Java APIs (PigServer, etc.)

    In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats).

    I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility.

    Santhosh

    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, March 03, 2011 10:44 AM
    To: user@pig.apache.org
    Subject: Re: [DISCUSSION] Pig.next

    I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we have declared stable.

    Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately.

    Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users.

    Alan.
    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a
    chance to
    settle in, and we are releasing a brand new parser for the language
    in 0.9.
    Those are pretty significant changes, if the idea behind calling
    something a
    "1.0" is stability, we may want to give them a release to mature a
    bit. Of
    course we can just release 0.9x for a while until we feel this stuff
    has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need
    to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers
    to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should
    switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Dmitriy Ryaboy at Mar 4, 2011 at 1:54 am
    Only if we start supporting a different version of Hadoop.
    And they did just un-deprecate the "old" interface...
    On Thu, Mar 3, 2011 at 5:50 PM, Santhosh Srinivasan wrote:

    The hadoop classes used by the load/store functions probably belong to
    the 'slowly evolving' category. But I don't think any change is anticipated
    soon. By the time it changes we might be ready for pig 2.0 !
    Exactly! How do you know that no changes are anticipated? We need inputs
    from the Hadoop team because we don't know. If they promise that these APIs
    will not change, lets say till mid-2012 then we should be good to go. If
    they say that it will change in 2011 then we will be breaking backward
    compatibility pretty soon.

    Santhosh

    ________________________________
    From: Thejas M Nair
    Sent: Thursday, March 03, 2011 5:35 PM
    To: user@pig.apache.org; Santhosh Srinivasan
    Subject: Re: [DISCUSSION] Pig.next

    The interfaces that pig have are at different levels of maturity, and most
    of the interfaces have been marked as stable or evolving to indicate that.
    Most of the core interfaces including the language, and udfs belong to the
    stable category. I think this is sufficient for 1.0. There will always be
    some new interfaces that will be in evolving category.

    The hadoop classes used by the load/store functions probably belong to the
    'slowly evolving' category. But I don't think any change is anticipated
    soon. By the time it changes we might be ready for pig 2.0 !

    Regarding the impact of big changes in 0.8 and 0.9 not having had the time
    to settle in, I think by the time 1.0/0.10 is ready those changes would have
    been well tested in all sorts of setups/configurations.

    -Thejas



    On 3/3/11 11:51 AM, "Santhosh Srinivasan" wrote:

    Hilarious.

    Getting to the serious points.

    What are the user facing items? I have listed a few below. Please feel free
    to add if I have missed out on anything.

    1. The language syntax
    2. The language semantics
    3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.)
    4. Java APIs (PigServer, etc.)

    In the past, we have agreed that Pig will support Hadoop APIs. I think its
    very important to understand when Hadoop will stabilize the APIs. It will
    have an impact on the APIs that we expose to our users (e.g., input and
    output formats).

    I strongly believe that this is an important input in the decision making
    process, especially wrt backward compatibility.

    Santhosh

    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, March 03, 2011 10:44 AM
    To: user@pig.apache.org
    Subject: Re: [DISCUSSION] Pig.next

    I agree that there will probably need to be several 0.9.x releases as the
    new optimization and parser work mature. As a consequence of this it may be
    longer between 0.9 and Pig.next then there has been between the last few
    releases. That only delays the question of what we call Pig.next, it does
    not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product
    until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we have
    declared stable.

    Pig is in use in production in multiple places, I do not think anyone will
    argue that it is not rough around the edges, and because we have users who
    run tens of thousands of Pig jobs daily non-backward compatible changes are
    impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress to
    fix social security. I am sure they will get there, but I may be retired
    first. In all seriousness, the Hadoop project has not been moving with
    speed or agility over the last few years, and I do not think waiting for
    them to do something is a good idea. Nor do I see it as necessary. Before
    we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we
    are bound more tightly to Hadoop then we are to log4j. But we are still our
    own project. 1.0 is a claim we are making about ourselves, not about the
    platform we run on. We should choose our release numbering in a way that
    sends a clear message to our users, and let those same users evaluate Hadoop
    separately.

    Also the argument that we should not go 1.0 because we are changing a lot
    of things is bogus. We are always changing a lot of things. If 1.0 means
    we will not make any major changes, then we will not get there until we go
    into some kinds of maintenance mode where we deem the majority of the work
    to have been done. I hope I have retired before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside
    the project. I would be interested in hearing from users and anyone with a
    more marketing oriented perspective on what message 1.0 would send to
    (potential) pig users.

    Alan.
    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a
    chance to
    settle in, and we are releasing a brand new parser for the language
    in 0.9.
    Those are pretty significant changes, if the idea behind calling
    something a
    "1.0" is stability, we may want to give them a release to mature a
    bit. Of
    course we can just release 0.9x for a while until we feel this stuff
    has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D

    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <olgan@yahoo-inc.com>
    wrote:
    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need
    to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers
    to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should
    switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga

  • Eric Lubow at Mar 3, 2011 at 8:04 pm
    Coming from a user's perspective, I would have the following to say:

    Anyone who is using Hadoop has an obvious understanding that 1.0 doesn't
    really mean much if it's in use (which Pig obviously is). What 1.0 has the
    potential to do for someone like me is that I may be able to go to Amazon
    and say, look, Pig is at 1.0 and you are still offering 0.6 on EMR. Having
    Pig on something like EMR is what allows wider spread adoption because it
    lowers the barrier to entry.

    I am not an expert at any of this stuff (in fact, I don't even know Java),
    but I am able to use Hadoop and then train others to write MR jobs with a
    fair amount of ease because of a query language like Pig. Tagging it with
    1.0 might make a statement to larger organizations, but most smaller
    companies and startups just want to know it's usable. And since there is no
    alpha or beta attached anywhere, that's good enough for most.

    The only caveat is that I am working off of Pig 0.6 because all my data is
    in S3 and I use Elastic Map Reduce for my jobs.

    The only other thing I would say is that if Pig goes 1.0, can it get a new
    logo? I know there are a lot of +1s for this so I figured I would throw my
    +1 here too.

    -e
    On Thu, Mar 3, 2011 at 13:43, Alan Gates wrote:

    I agree that there will probably need to be several 0.9.x releases as the
    new optimization and parser work mature. As a consequence of this it may be
    longer between 0.9 and Pig.next then there has been between the last few
    releases. That only delays the question of what we call Pig.next, it does
    not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product
    until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we have
    declared stable.

    Pig is in use in production in multiple places, I do not think anyone will
    argue that it is not rough around the edges, and because we have users who
    run tens of thousands of Pig jobs daily non-backward compatible changes are
    impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress to
    fix social security. I am sure they will get there, but I may be retired
    first. In all seriousness, the Hadoop project has not been moving with
    speed or agility over the last few years, and I do not think waiting for
    them to do something is a good idea. Nor do I see it as necessary. Before
    we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we
    are bound more tightly to Hadoop then we are to log4j. But we are still our
    own project. 1.0 is a claim we are making about ourselves, not about the
    platform we run on. We should choose our release numbering in a way that
    sends a clear message to our users, and let those same users evaluate Hadoop
    separately.

    Also the argument that we should not go 1.0 because we are changing a lot
    of things is bogus. We are always changing a lot of things. If 1.0 means
    we will not make any major changes, then we will not get there until we go
    into some kinds of maintenance mode where we deem the majority of the work
    to have been done. I hope I have retired before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside
    the project. I would be interested in hearing from users and anyone with a
    more marketing oriented perspective on what message 1.0 would send to
    (potential) pig users.

    Alan.

    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a chance to
    settle in, and we are releasing a brand new parser for the language in
    0.9.
    Those are pretty significant changes, if the idea behind calling something
    a
    "1.0" is stability, we may want to give them a release to mature a bit. Of
    course we can just release 0.9x for a while until we feel this stuff has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D

    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <olgan@yahoo-inc.com>
    wrote:

    Pig Users and Developers,
    We are starting to plan the work after Pig 0.9. One thing we need to
    decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to
    know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch
    on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
    Eric Lubow e: eric.lubow@gmail.com w: eric.lubow.org
  • Corbin Hoenes at Mar 4, 2011 at 12:45 pm
    What is wrong with porky the pig as the logo?

    :)

    That's all folks!

    Sent from my iPhone
    On Mar 3, 2011, at 1:03 PM, Eric Lubow wrote:

    Coming from a user's perspective, I would have the following to say:

    Anyone who is using Hadoop has an obvious understanding that 1.0 doesn't
    really mean much if it's in use (which Pig obviously is). What 1.0 has the
    potential to do for someone like me is that I may be able to go to Amazon
    and say, look, Pig is at 1.0 and you are still offering 0.6 on EMR. Having
    Pig on something like EMR is what allows wider spread adoption because it
    lowers the barrier to entry.

    I am not an expert at any of this stuff (in fact, I don't even know Java),
    but I am able to use Hadoop and then train others to write MR jobs with a
    fair amount of ease because of a query language like Pig. Tagging it with
    1.0 might make a statement to larger organizations, but most smaller
    companies and startups just want to know it's usable. And since there is no
    alpha or beta attached anywhere, that's good enough for most.

    The only caveat is that I am working off of Pig 0.6 because all my data is
    in S3 and I use Elastic Map Reduce for my jobs.

    The only other thing I would say is that if Pig goes 1.0, can it get a new
    logo? I know there are a lot of +1s for this so I figured I would throw my
    +1 here too.

    -e
    On Thu, Mar 3, 2011 at 13:43, Alan Gates wrote:

    I agree that there will probably need to be several 0.9.x releases as the
    new optimization and parser work mature. As a consequence of this it may be
    longer between 0.9 and Pig.next then there has been between the last few
    releases. That only delays the question of what we call Pig.next, it does
    not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product
    until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we have
    declared stable.

    Pig is in use in production in multiple places, I do not think anyone will
    argue that it is not rough around the edges, and because we have users who
    run tens of thousands of Pig jobs daily non-backward compatible changes are
    impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress to
    fix social security. I am sure they will get there, but I may be retired
    first. In all seriousness, the Hadoop project has not been moving with
    speed or agility over the last few years, and I do not think waiting for
    them to do something is a good idea. Nor do I see it as necessary. Before
    we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we
    are bound more tightly to Hadoop then we are to log4j. But we are still our
    own project. 1.0 is a claim we are making about ourselves, not about the
    platform we run on. We should choose our release numbering in a way that
    sends a clear message to our users, and let those same users evaluate Hadoop
    separately.

    Also the argument that we should not go 1.0 because we are changing a lot
    of things is bogus. We are always changing a lot of things. If 1.0 means
    we will not make any major changes, then we will not get there until we go
    into some kinds of maintenance mode where we deem the majority of the work
    to have been done. I hope I have retired before we reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside
    the project. I would be interested in hearing from users and anyone with a
    more marketing oriented perspective on what message 1.0 would send to
    (potential) pig users.

    Alan.

    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a chance to
    settle in, and we are releasing a brand new parser for the language in
    0.9.
    Those are pretty significant changes, if the idea behind calling something
    a
    "1.0" is stability, we may want to give them a release to mature a bit. Of
    course we can just release 0.9x for a while until we feel this stuff has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D

    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <olgan@yahoo-inc.com>
    wrote:

    Pig Users and Developers,
    We are starting to plan the work after Pig 0.9. One thing we need to
    decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to
    know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch
    on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
    Eric Lubow e: eric.lubow@gmail.com w: eric.lubow.org
  • Kaluskar, Sanjay at Mar 4, 2011 at 12:54 am
    Alan,

    Here's another perspective, based on the conventions used in most of the
    products I have worked on (okay, that's not a lot but some of them are
    well regarded by customers).

    Rather than focusing on the specific number, it is the transition which
    is important & tells users something. Let me explain - most products use
    a <major>.<minor>.<patch> style 3-number release numbering externally.
    Change of each of these numbers has some significance:
    - there should be complete (binary) backward compatibility for
    interfaces across patch releases, interoperability with other products;
    product change should be primarily bug fixes
    - there should be backward compatibility for interfaces across minor
    releases, there may be some interoperability changes (e.g., requiring a
    different version of one of the dependencies); product change is
    expected to contain new features
    - there can be substantial changes across major releases (architecture,
    interfaces, interoperability); interfaces (APIs, callback interfaces,
    etc.) are still expected to be source-level compatible (i.e., you may
    ask clients to recompile); in rare cases you can break interfaces and
    ask external code to be re-written/edited.

    By this definition, 0.7.0 was probably 1.0.0 (given that UDFs were
    forced to make code changes).

    -sanjay

    -----Original Message-----
    From: Alan Gates
    Sent: 04 March 2011 00:14
    To: user@pig.apache.org
    Subject: Re: [DISCUSSION] Pig.next

    I agree that there will probably need to be several 0.9.x releases as
    the new optimization and parser work mature. As a consequence of this
    it may be longer between 0.9 and Pig.next then there has been between
    the last few releases. That only delays the question of what we call
    Pig.next, it does not answer it.

    To me, declaring 1.0 would mean the following things:

    1) Pig is ready for production use, at least by the brave.
    2) It is still rough around the edges, you do not get a smooth product
    until 2.0 or later.
    3) We will not make non-backward compatible changes to interfaces we
    have declared stable.

    Pig is in use in production in multiple places, I do not think anyone
    will argue that it is not rough around the edges, and because we have
    users who run tens of thousands of Pig jobs daily non-backward
    compatible changes are impossible anyway.

    As for waiting for Hadoop to go 1.0, that is like waiting for Congress
    to fix social security. I am sure they will get there, but I may be
    retired first. In all seriousness, the Hadoop project has not been
    moving with speed or agility over the last few years, and I do not think
    waiting for them to do something is a good idea. Nor do I see it as
    necessary. Before we could go 1.0 would we insist that every jar we
    import is >= 1.0? Yes we are bound more tightly to Hadoop then we are
    to log4j. But we are still our own project. 1.0 is a claim we are
    making about ourselves, not about the platform we run on. We should
    choose our release numbering in a way that sends a clear message to our
    users, and let those same users evaluate Hadoop separately.

    Also the argument that we should not go 1.0 because we are changing a
    lot of things is bogus. We are always changing a lot of things. If
    1.0 means we will not make any major changes, then we will not get there
    until we go into some kinds of maintenance mode where we deem the
    majority of the work to have been done. I hope I have retired before we
    reach that state.

    My perspective on what 1.0 means obviously comes from a developer inside
    the project. I would be interested in hearing from users and anyone
    with a more marketing oriented perspective on what message 1.0 would
    send to (potential) pig users.

    Alan.
    On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

    I am worried that the new optimization plan work has not had a
    chance to
    settle in, and we are releasing a brand new parser for the language
    in 0.9.
    Those are pretty significant changes, if the idea behind calling
    something a
    "1.0" is stability, we may want to give them a release to mature a
    bit. Of
    course we can just release 0.9x for a while until we feel this stuff
    has
    been tested in a wide enough variety of installations / hadoop
    configurations / use cases.

    D
    On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need
    to decide
    is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers
    to know
    that our releases are stable
    (4) If the next release is 0.10 and we decide that we should
    switch on
    the following release going from 0.10 to 1.0 will generate a lot of
    confusion.

    I wanted to start this conversation and see what others think before
    deciding if it is worth while to call a vote.

    Olga
  • Jai Krishna at Mar 3, 2011 at 4:14 am
    I tend to interpret Hadoop 0.21 and Pig 0.9 as "Hadoop has had 21 releases" and "Pig has had 9 releases" respectively.
    In keeping with that, Pig version numbers that trail Hadoop seem logically consistent because Pig, in practice, primarily works off Hadoop (though it can do local mode, drive non Hadoop backends etc.).

    So, Hadoop at 0.21 and Pig at 0.10 seems right.

    Of course, I may be missing a lot of things here with regard to how Apache projects works.

    Thanks
    Jai


    On 3/3/11 6:22 AM, "Olga Natkovich" wrote:

    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to know that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch on the following release going from 0.10 to 1.0 will generate a lot of confusion.

    I wanted to start this conversation and see what others think before deciding if it is worth while to call a vote.

    Olga
  • Mridul Muralidharan at Mar 4, 2011 at 1:49 pm
    IMO 1.0 for a product typically promises :

    1) Reasonable stability of interfaces.
    Typically only major version changes break interface compatibility.
    While we are at 0.x, it seems to be considered 'okish' to violate this :
    but once you are at 1.0 and higher, breaking interface contracts will
    not be desired behavior.

    We should be reasonably confident about the interfaces we expose to
    users : this includes the shell, exec envs, properties, api and spi's.


    (This also depends on hadoop btw).




    2) Reasonable stability and code quality.


    Typically a major release promises reasonable rigor in terms of code
    quality, stability and functionality.

    As mentioned, it is easier to get amazon, etc to move to pig 1.0, but
    probably not so for 0.7 or 1.0.1 or 1.1, etc. Declaring something as 1.0
    typically has this expectation.

    Considering the pretty invasive changes which has happened off late,
    maybe we do need to have a cool off period for the code to settle and
    focus on the bugs instead of features if we need a 1.0 release ?

    Though as a developer, we always want to work on new & exciting things,
    we should balance it against user expectations for a stable product.




    3) reasonable 'polish' in the product.

    In general, it is not very easy to use pig - and it keeps violating
    principle of least surprise even after having used it for 3+ years now.

    Typically related to schema, parsing, changing udf contracts, property
    interactions, multi-query optimization effects, null
    handling/interactions and the like.

    A lot of it is probably just due to idioms and expectations which are
    not well known, bugs which should be filed, problems of trying to debug
    in a distributed cluster, constructs which are not
    adequately/well-defined, and lot due to mismatch between a novice user
    and pig-dev expectations.


    We tend to work around/avoid a lot of these issues without a second
    thought, but exposing to someone new does bring out the confusion.


    Considering the scope of pig, probably this is a pie in the sky goal -
    but it definitely would be good if pig "felt" stable and usable without
    need for too many caveats/gotchas.



    Until these are "reasonably" well tackled, imo, it is not a good idea to
    go 1.0.

    Regards,
    Mridul




    On Thursday 03 March 2011 06:22 AM, Olga Natkovich wrote:
    Pig Users and Developers,

    We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0.

    I believe that we are ready to declare 1.0. Here are my reasons:

    (1) We are mature enough and produce good quality releases
    (2) Our interface no longer change in major ways
    (3) We have a growing user community and we want the newcomers to know that our releases are stable
    (4) If the next release is 0.10 and we decide that we should switch on the following release going from 0.10 to 1.0 will generate a lot of confusion.

    I wanted to start this conversation and see what others think before deciding if it is worth while to call a vote.

    Olga

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 3, '11 at 12:53a
activeMar 4, '11 at 1:49p
posts15
users10
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase