Grokbase Groups Hive dev March 2009
FAQ
Hey folks,

A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

- new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
- the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

What do people think?

Joydeep

Search Discussions

  • Johan Oskarsson at Mar 3, 2009 at 10:54 am
    To be honest I must've missed that 0.2 was branched (I found the email
    now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3
    and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about
    setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Ashish Thusoo at Mar 10, 2009 at 1:18 am
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then
    issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Johan Oskarsson at Mar 10, 2009 at 10:43 am
    I'm worried that trying to create a new release every other week will be
    too often. Isn't there a risk that we're still fixing bugs in 0.3 when
    the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much larger
    codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then
    issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Joydeep Sen Sarma at Mar 10, 2009 at 4:37 pm
    I am also a little worried about a lot of releases and managing them. perhaps what's clouding my judgement is that there are a lot of critical bugs yet to be fixed - so I don't see how we can stabilize the first release in a couple of weeks - or even a month (which is what killed 0.2 I think to some extent).

    I would say that the first release is somewhat special. We are fixing a boatload of issues from a very large push of code (all of it!). In subsequent releases - there wouldn't be as many bugs - and a faster release cycle would be feasible.

    So my vote would be to branch now (before predicate push down), get the release stable as fast as possible (but potentially wait as long as it takes) - and then only start cutting more branches. Over time - we can converge to a faster release cycle - but right now this seems dubious to me.

    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will be
    too often. Isn't there a risk that we're still fixing bugs in 0.3 when
    the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much larger
    codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then
    issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Johan Oskarsson at Mar 10, 2009 at 4:52 pm
    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them. perhaps what's clouding my judgement is that there are a lot of critical bugs yet to be fixed - so I don't see how we can stabilize the first release in a couple of weeks - or even a month (which is what killed 0.2 I think to some extent).

    I would say that the first release is somewhat special. We are fixing a boatload of issues from a very large push of code (all of it!). In subsequent releases - there wouldn't be as many bugs - and a faster release cycle would be feasible.

    So my vote would be to branch now (before predicate push down), get the release stable as fast as possible (but potentially wait as long as it takes) - and then only start cutting more branches. Over time - we can converge to a faster release cycle - but right now this seems dubious to me.

    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will be
    too often. Isn't there a risk that we're still fixing bugs in 0.3 when
    the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much larger
    codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then
    issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Jeff Hammerbacher at Mar 10, 2009 at 5:47 pm
    +1. I agree that two weeks per release is quite nice from an agile kinda
    perspective (sprints), but the project could use a few longer release cycles
    with concerted efforts towards stability, qa, and documentation. Once that's
    on a roll, tightening the release cycle seems like a good idea.
    On Tue, Mar 10, 2009 at 9:52 AM, Johan Oskarsson wrote:

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them.
    perhaps what's clouding my judgement is that there are a lot of critical
    bugs yet to be fixed - so I don't see how we can stabilize the first release
    in a couple of weeks - or even a month (which is what killed 0.2 I think to
    some extent).
    I would say that the first release is somewhat special. We are fixing a
    boatload of issues from a very large push of code (all of it!). In
    subsequent releases - there wouldn't be as many bugs - and a faster release
    cycle would be feasible.
    So my vote would be to branch now (before predicate push down), get the
    release stable as fast as possible (but potentially wait as long as it
    takes) - and then only start cutting more branches. Over time - we can
    converge to a faster release cycle - but right now this seems dubious to me.
    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will be
    too often. Isn't there a risk that we're still fixing bugs in 0.3 when
    the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much larger
    codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I had
    mentioned in the previous email, the plan was cut a branch on the last
    wednesday of every month and then
    issue a vote for making it a release once it ran satisfactorily (no
    blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that
    we would limit the changes that would go into the branch (0.2) in this case
    to the blocker bugs only but it seems that we had some feature creep and as
    a result we switched to using trunk at facebook without giving sufficient
    time for 0.2 to stabilize. It also means that perhaps waiting for a month
    for each release is too long at this stage at least for FB. If others are in
    agreement, how about we do the following going forward..

    Cut a branch every other wednesday, only checkin the most ciritcal
    blocker bugs into the branch and reserve the features for trunk which will
    be picked up in the next branch and relegiously deploy only the versions of
    the branch at FB. We can start off a vote to make a branch an official
    release once we have atleast 2 weeks of run on the branch without any
    blocker bugs (i.e. we did not have a need to upgrade the production machines
    at FB).
    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the
    wiki and religiously follow it. Without controlling the tendency of a
    feature creep it would be difficult to get a stable version out...
    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email
    now though), was there a feature freeze date set?
    After branching shouldn't we have moved the non critical issues to 0.3
    and pushed for fixing the remaining bugs in order to release?
    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about
    setting a feature freeze date now and take it from there?
    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera
    folks) on best approach to get to a first Hive release.
    While 0.2 has been branched - it seems awkward to base the first
    release on it. The reason is twofold:
    - new changes to trunk since 0.2 have been relatively
    contained AFAIK (so no added instability). As evidence - Facebook has
    reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are
    extremely important from performance perspective. This includes the
    LazySerDe that Zheng added and upcoming hive-232.
    So one proposal is to branch 0.3 at this point and try to make that
    first official release for Hive.
    This does look a little haphazard - and the natural question is whether
    we can stick to this (or we end up repeating this once we throw in some more
    goodies). The feeling is that this may be a good time - hive-279 has major
    changes to the hive compiler and branching 0.3 before those changes are
    checked in gives us a good chance of producing a stable release with good
    performance (and the major changes will probably prevent us from repeating
    this trick going forward :)).
    What do people think?

    Joydeep
  • Ashish Thusoo at Mar 10, 2009 at 6:08 pm
    I think a big reason for what killed 0.2 was the fact that we decided to deploy trunk into production because of some features that the internal users were asking for, instead of just continuing with the 0.2 branch. What I want to stress is that we cannot do that going forward. Once we branch out 0.3, we have to let 0.3 soak in production till we have atleast 2 weeks of run with no blockers (I did not mean that we will just certify a branch to be a relase after 2 weeks - what I meant was that we have at least 2 weeks of run with no blockers) before we cut out a release from the branch. Again I must stress that we have to continue deploying the candidate branch into production and we cannot move the production machines to trunk as that will completely kill the branch (as happened with 0.2). We have to realy isolate blocker bug fixes from features and we have to understand that we cannot role out features overnight (as we have done so far for our users at FB) as doing that will make it absolutely hopeless in getting any branch stable.

    Having said that, we could move to a model where we make a new branch (not a release) from trunk once the previous candidate branch is released instead of having a train of branches at every 2 weeks. I am fine with that too. What is perhaps more critical is that we have a firm commitment that we are not going to deploy new features into production till we stabilize 0.3 and we should set the expectations accordingly...

    Ashish

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 9:52 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them. perhaps what's clouding my judgement is that there are a lot of critical bugs yet to be fixed - so I don't see how we can stabilize the first release in a couple of weeks - or even a month (which is what killed 0.2 I think to some extent).

    I would say that the first release is somewhat special. We are fixing a boatload of issues from a very large push of code (all of it!). In subsequent releases - there wouldn't be as many bugs - and a faster release cycle would be feasible.

    So my vote would be to branch now (before predicate push down), get the release stable as fast as possible (but potentially wait as long as it takes) - and then only start cutting more branches. Over time - we can converge to a faster release cycle - but right now this seems dubious to me.

    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will
    be too often. Isn't there a risk that we're still fixing bugs in 0.3
    when the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much
    larger codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I
    had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Joydeep Sen Sarma at Mar 10, 2009 at 6:37 pm
    I am in general agreement - but the problems is the mail below doesn't explain why trunk was deployed.

    Performance fixes are like critical bugs. We cannot run a production cluster that's hurting for performance on non-performant software. To that extent - it was a mistake for us to consider lazyserde to be a 'feature' (which is why we didn't back-port it to 0.2). so is hive-223 for example - we just need to have it asap in deployment - and by conventional definition - it certainly wasn't a regression that would go into a bug fix branch. I suspect there may be more such jiras.

    One way of looking at this is that we either branched too early, or we need to reconsider what goes into a branch.

    The other way to look at this is that every cluster administrator (including the one at Facebook - who is just like any user of Hive) - needs to have the option to pull in latest patches that are critical to his/her deployment. The success of Hive and the happiness of it's internal Facebook users should not and cannot be at odds with each other.


    -----Original Message-----
    From: Ashish Thusoo
    Sent: Tuesday, March 10, 2009 11:08 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I think a big reason for what killed 0.2 was the fact that we decided to deploy trunk into production because of some features that the internal users were asking for, instead of just continuing with the 0.2 branch. What I want to stress is that we cannot do that going forward. Once we branch out 0.3, we have to let 0.3 soak in production till we have atleast 2 weeks of run with no blockers (I did not mean that we will just certify a branch to be a relase after 2 weeks - what I meant was that we have at least 2 weeks of run with no blockers) before we cut out a release from the branch. Again I must stress that we have to continue deploying the candidate branch into production and we cannot move the production machines to trunk as that will completely kill the branch (as happened with 0.2). We have to realy isolate blocker bug fixes from features and we have to understand that we cannot role out features overnight (as we have done so far for our users at FB) as doing that will make it absolutely hopeless in getting any branch stable.

    Having said that, we could move to a model where we make a new branch (not a release) from trunk once the previous candidate branch is released instead of having a train of branches at every 2 weeks. I am fine with that too. What is perhaps more critical is that we have a firm commitment that we are not going to deploy new features into production till we stabilize 0.3 and we should set the expectations accordingly...

    Ashish

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 9:52 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them. perhaps what's clouding my judgement is that there are a lot of critical bugs yet to be fixed - so I don't see how we can stabilize the first release in a couple of weeks - or even a month (which is what killed 0.2 I think to some extent).

    I would say that the first release is somewhat special. We are fixing a boatload of issues from a very large push of code (all of it!). In subsequent releases - there wouldn't be as many bugs - and a faster release cycle would be feasible.

    So my vote would be to branch now (before predicate push down), get the release stable as fast as possible (but potentially wait as long as it takes) - and then only start cutting more branches. Over time - we can converge to a faster release cycle - but right now this seems dubious to me.

    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will
    be too often. Isn't there a risk that we're still fixing bugs in 0.3
    when the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much
    larger codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I
    had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Ashish Thusoo at Mar 10, 2009 at 8:09 pm
    Agreed.

    I think we moved to trunk because of lazy serde from what Zheng tells me (I was out of office when this happened)...

    Regarding performance fixes, I would rather categorize performance regressions as blocker bugs and keep performance improvements as features. By that measure I think lazy serde was fine as a feature. I think we should just have let 0.2 stabilize and deployed lazy serde when we released 0.2 and cut out a 0.3 branch and moved our systems to 0.3. Keeping the criteria for what gets categorized as a blocker tight is quite critical otherwise we will always be in danger of a constant feature creep and that would totally defeat the purpose of stabilization. In any case if we had been able to stabilize in a months time say for 0.2, I do not think the users would be too unhappy to get the lazy serde a month late. So from that token I would not categorize it to be a blocker as such.

    One constant problem is that the best stress testing environment that we have for Hive right now is our production work load at FB. So I am not sure whether we can have a certificate of stability to a branch if we at FB pull in patches and run a version that is different from the release. Though of course others are always free to get the patches from the JIRA and apply them as they see fit. I am not sure how to address this. Thoughts?

    Ashish

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Tuesday, March 10, 2009 11:37 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I am in general agreement - but the problems is the mail below doesn't explain why trunk was deployed.

    Performance fixes are like critical bugs. We cannot run a production cluster that's hurting for performance on non-performant software. To that extent - it was a mistake for us to consider lazyserde to be a 'feature' (which is why we didn't back-port it to 0.2). so is hive-223 for example - we just need to have it asap in deployment - and by conventional definition - it certainly wasn't a regression that would go into a bug fix branch. I suspect there may be more such jiras.

    One way of looking at this is that we either branched too early, or we need to reconsider what goes into a branch.

    The other way to look at this is that every cluster administrator (including the one at Facebook - who is just like any user of Hive) - needs to have the option to pull in latest patches that are critical to his/her deployment. The success of Hive and the happiness of it's internal Facebook users should not and cannot be at odds with each other.


    -----Original Message-----
    From: Ashish Thusoo
    Sent: Tuesday, March 10, 2009 11:08 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I think a big reason for what killed 0.2 was the fact that we decided to deploy trunk into production because of some features that the internal users were asking for, instead of just continuing with the 0.2 branch. What I want to stress is that we cannot do that going forward. Once we branch out 0.3, we have to let 0.3 soak in production till we have atleast 2 weeks of run with no blockers (I did not mean that we will just certify a branch to be a relase after 2 weeks - what I meant was that we have at least 2 weeks of run with no blockers) before we cut out a release from the branch. Again I must stress that we have to continue deploying the candidate branch into production and we cannot move the production machines to trunk as that will completely kill the branch (as happened with 0.2). We have to realy isolate blocker bug fixes from features and we have to understand that we cannot role out features overnight (as we have done so far for our users at FB) as doing that will make it absolutely hopeless in getting any branch stable.

    Having said that, we could move to a model where we make a new branch (not a release) from trunk once the previous candidate branch is released instead of having a train of branches at every 2 weeks. I am fine with that too. What is perhaps more critical is that we have a firm commitment that we are not going to deploy new features into production till we stabilize 0.3 and we should set the expectations accordingly...

    Ashish

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 9:52 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them. perhaps what's clouding my judgement is that there are a lot of critical bugs yet to be fixed - so I don't see how we can stabilize the first release in a couple of weeks - or even a month (which is what killed 0.2 I think to some extent).

    I would say that the first release is somewhat special. We are fixing a boatload of issues from a very large push of code (all of it!). In subsequent releases - there wouldn't be as many bugs - and a faster release cycle would be feasible.

    So my vote would be to branch now (before predicate push down), get the release stable as fast as possible (but potentially wait as long as it takes) - and then only start cutting more branches. Over time - we can converge to a faster release cycle - but right now this seems dubious to me.

    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will
    be too often. Isn't there a risk that we're still fixing bugs in 0.3
    when the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much
    larger codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I
    had mentioned in the previous email, the plan was cut a branch on the last wednesday of every month and then issue a vote for making it a release once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook. Accordingly I was hoping that we would limit the changes that would go into the branch (0.2) in this case to the blocker bugs only but it seems that we had some feature creep and as a result we switched to using trunk at facebook without giving sufficient time for 0.2 to stabilize. It also means that perhaps waiting for a month for each release is too long at this stage at least for FB. If others are in agreement, how about we do the following going forward..


    Cut a branch every other wednesday, only checkin the most ciritcal blocker bugs into the branch and reserve the features for trunk which will be picked up in the next branch and relegiously deploy only the versions of the branch at FB. We can start off a vote to make a branch an official release once we have atleast 2 weeks of run on the branch without any blocker bugs (i.e. we did not have a need to upgrade the production machines at FB).

    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the wiki and religiously follow it. Without controlling the tendency of a feature creep it would be difficult to get a stable version out...

    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email now though), was there a feature freeze date set?

    After branching shouldn't we have moved the non critical issues to 0.3 and pushed for fixing the remaining bugs in order to release?

    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about setting a feature freeze date now and take it from there?

    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera folks) on best approach to get to a first Hive release.

    While 0.2 has been branched - it seems awkward to base the first release on it. The reason is twofold:

    - new changes to trunk since 0.2 have been relatively contained AFAIK (so no added instability). As evidence - Facebook has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are extremely important from performance perspective. This includes the LazySerDe that Zheng added and upcoming hive-232.

    So one proposal is to branch 0.3 at this point and try to make that first official release for Hive.

    This does look a little haphazard - and the natural question is whether we can stick to this (or we end up repeating this once we throw in some more goodies). The feeling is that this may be a good time - hive-279 has major changes to the hive compiler and branching 0.3 before those changes are checked in gives us a good chance of producing a stable release with good performance (and the major changes will probably prevent us from repeating this trick going forward :)).

    What do people think?

    Joydeep
  • Zheng Shao at Mar 10, 2009 at 9:03 pm
    Till now the discussion is mainly on policy.

    However, another important thing is what we plan to work on in the next 2-3
    weeks. Let's say we are focusing on fixing bugs (both Facebook users and
    open-source community users are crying for bug fixes, from what I can see).
    After 2-3 weeks when most bugs are fixed and Hive is more stable, we can
    come back to review this and decide the policy. Given the more information
    we have at that time, we might be able to make a better decision on
    policies.

    Also, if we are focusing on fixing bugs for the next 2-3 weeks, there is no
    point to make a 0.3 branch right now because every bug fix will go into both
    0.3 and trunk anyway.

    Let's fix most of the important bugs first, then make 0.3 branch, then we
    can work on 2 things at the same time: new features/perf improvement that
    goes only to trunk, and other minor bugs that goes to both 0.3 and trunk.

    Thoughts?

    Zheng
    On Tue, Mar 10, 2009 at 1:09 PM, Ashish Thusoo wrote:

    Agreed.

    I think we moved to trunk because of lazy serde from what Zheng tells me (I
    was out of office when this happened)...

    Regarding performance fixes, I would rather categorize performance
    regressions as blocker bugs and keep performance improvements as features.
    By that measure I think lazy serde was fine as a feature. I think we should
    just have let 0.2 stabilize and deployed lazy serde when we released 0.2 and
    cut out a 0.3 branch and moved our systems to 0.3. Keeping the criteria for
    what gets categorized as a blocker tight is quite critical otherwise we will
    always be in danger of a constant feature creep and that would totally
    defeat the purpose of stabilization. In any case if we had been able to
    stabilize in a months time say for 0.2, I do not think the users would be
    too unhappy to get the lazy serde a month late. So from that token I would
    not categorize it to be a blocker as such.

    One constant problem is that the best stress testing environment that we
    have for Hive right now is our production work load at FB. So I am not sure
    whether we can have a certificate of stability to a branch if we at FB pull
    in patches and run a version that is different from the release. Though of
    course others are always free to get the patches from the JIRA and apply
    them as they see fit. I am not sure how to address this. Thoughts?

    Ashish

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Tuesday, March 10, 2009 11:37 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I am in general agreement - but the problems is the mail below doesn't
    explain why trunk was deployed.

    Performance fixes are like critical bugs. We cannot run a production
    cluster that's hurting for performance on non-performant software. To that
    extent - it was a mistake for us to consider lazyserde to be a 'feature'
    (which is why we didn't back-port it to 0.2). so is hive-223 for example -
    we just need to have it asap in deployment - and by conventional definition
    - it certainly wasn't a regression that would go into a bug fix branch. I
    suspect there may be more such jiras.

    One way of looking at this is that we either branched too early, or we need
    to reconsider what goes into a branch.

    The other way to look at this is that every cluster administrator
    (including the one at Facebook - who is just like any user of Hive) - needs
    to have the option to pull in latest patches that are critical to his/her
    deployment. The success of Hive and the happiness of it's internal Facebook
    users should not and cannot be at odds with each other.


    -----Original Message-----
    From: Ashish Thusoo
    Sent: Tuesday, March 10, 2009 11:08 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I think a big reason for what killed 0.2 was the fact that we decided to
    deploy trunk into production because of some features that the internal
    users were asking for, instead of just continuing with the 0.2 branch. What
    I want to stress is that we cannot do that going forward. Once we branch out
    0.3, we have to let 0.3 soak in production till we have atleast 2 weeks of
    run with no blockers (I did not mean that we will just certify a branch to
    be a relase after 2 weeks - what I meant was that we have at least 2 weeks
    of run with no blockers) before we cut out a release from the branch. Again
    I must stress that we have to continue deploying the candidate branch into
    production and we cannot move the production machines to trunk as that will
    completely kill the branch (as happened with 0.2). We have to realy isolate
    blocker bug fixes from features and we have to understand that we cannot
    role out features overnight (as we have done so far for our users at FB) as
    doing that will make it absolutely hopeless in getting any branch stable.

    Having said that, we could move to a model where we make a new branch (not
    a release) from trunk once the previous candidate branch is released instead
    of having a train of branches at every 2 weeks. I am fine with that too.
    What is perhaps more critical is that we have a firm commitment that we are
    not going to deploy new features into production till we stabilize 0.3 and
    we should set the expectations accordingly...

    Ashish

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 9:52 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them.
    perhaps what's clouding my judgement is that there are a lot of critical
    bugs yet to be fixed - so I don't see how we can stabilize the first release
    in a couple of weeks - or even a month (which is what killed 0.2 I think to
    some extent).
    I would say that the first release is somewhat special. We are fixing a
    boatload of issues from a very large push of code (all of it!). In
    subsequent releases - there wouldn't be as many bugs - and a faster release
    cycle would be feasible.
    So my vote would be to branch now (before predicate push down), get the
    release stable as fast as possible (but potentially wait as long as it
    takes) - and then only start cutting more branches. Over time - we can
    converge to a faster release cycle - but right now this seems dubious to me.
    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will
    be too often. Isn't there a risk that we're still fixing bugs in 0.3
    when the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much
    larger codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I
    had mentioned in the previous email, the plan was cut a branch on the
    last wednesday of every month and then issue a vote for making it a release
    once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook.
    Accordingly I was hoping that we would limit the changes that would go into
    the branch (0.2) in this case to the blocker bugs only but it seems that we
    had some feature creep and as a result we switched to using trunk at
    facebook without giving sufficient time for 0.2 to stabilize. It also means
    that perhaps waiting for a month for each release is too long at this stage
    at least for FB. If others are in agreement, how about we do the following
    going forward..

    Cut a branch every other wednesday, only checkin the most ciritcal
    blocker bugs into the branch and reserve the features for trunk which will
    be picked up in the next branch and relegiously deploy only the versions of
    the branch at FB. We can start off a vote to make a branch an official
    release once we have atleast 2 weeks of run on the branch without any
    blocker bugs (i.e. we did not have a need to upgrade the production machines
    at FB).
    We can start off by creating a 0.3 branch this wednesday accordingly...

    Once we have an agreement on this we can document this procedure on the
    wiki and religiously follow it. Without controlling the tendency of a
    feature creep it would be difficult to get a stable version out...
    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email
    now though), was there a feature freeze date set?
    After branching shouldn't we have moved the non critical issues to 0.3
    and pushed for fixing the remaining bugs in order to release?
    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about
    setting a feature freeze date now and take it from there?
    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera
    folks) on best approach to get to a first Hive release.
    While 0.2 has been branched - it seems awkward to base the first
    release on it. The reason is twofold:
    - new changes to trunk since 0.2 have been relatively
    contained AFAIK (so no added instability). As evidence - Facebook has
    reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are
    extremely important from performance perspective. This includes the
    LazySerDe that Zheng added and upcoming hive-232.
    So one proposal is to branch 0.3 at this point and try to make that
    first official release for Hive.
    This does look a little haphazard - and the natural question is whether
    we can stick to this (or we end up repeating this once we throw in some more
    goodies). The feeling is that this may be a good time - hive-279 has major
    changes to the hive compiler and branching 0.3 before those changes are
    checked in gives us a good chance of producing a stable release with good
    performance (and the major changes will probably prevent us from repeating
    this trick going forward :)).
    What do people think?

    Joydeep

    --
    Yours,
    Zheng
  • Jeff Hammerbacher at Mar 26, 2009 at 6:19 pm
    Hey,

    What's the state of the release process? We'd really, really like to see a
    Hive release. As Nigel said on the Pig list, releasing often is good for the
    community.

    Thanks,
    Jeff
    On Tue, Mar 10, 2009 at 2:03 PM, Zheng Shao wrote:

    Till now the discussion is mainly on policy.

    However, another important thing is what we plan to work on in the next 2-3
    weeks. Let's say we are focusing on fixing bugs (both Facebook users and
    open-source community users are crying for bug fixes, from what I can see).
    After 2-3 weeks when most bugs are fixed and Hive is more stable, we can
    come back to review this and decide the policy. Given the more information
    we have at that time, we might be able to make a better decision on
    policies.

    Also, if we are focusing on fixing bugs for the next 2-3 weeks, there is no
    point to make a 0.3 branch right now because every bug fix will go into
    both
    0.3 and trunk anyway.

    Let's fix most of the important bugs first, then make 0.3 branch, then we
    can work on 2 things at the same time: new features/perf improvement that
    goes only to trunk, and other minor bugs that goes to both 0.3 and trunk.

    Thoughts?

    Zheng
    On Tue, Mar 10, 2009 at 1:09 PM, Ashish Thusoo wrote:

    Agreed.

    I think we moved to trunk because of lazy serde from what Zheng tells me (I
    was out of office when this happened)...

    Regarding performance fixes, I would rather categorize performance
    regressions as blocker bugs and keep performance improvements as features.
    By that measure I think lazy serde was fine as a feature. I think we should
    just have let 0.2 stabilize and deployed lazy serde when we released 0.2 and
    cut out a 0.3 branch and moved our systems to 0.3. Keeping the criteria for
    what gets categorized as a blocker tight is quite critical otherwise we will
    always be in danger of a constant feature creep and that would totally
    defeat the purpose of stabilization. In any case if we had been able to
    stabilize in a months time say for 0.2, I do not think the users would be
    too unhappy to get the lazy serde a month late. So from that token I would
    not categorize it to be a blocker as such.

    One constant problem is that the best stress testing environment that we
    have for Hive right now is our production work load at FB. So I am not sure
    whether we can have a certificate of stability to a branch if we at FB pull
    in patches and run a version that is different from the release. Though of
    course others are always free to get the patches from the JIRA and apply
    them as they see fit. I am not sure how to address this. Thoughts?

    Ashish

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Tuesday, March 10, 2009 11:37 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I am in general agreement - but the problems is the mail below doesn't
    explain why trunk was deployed.

    Performance fixes are like critical bugs. We cannot run a production
    cluster that's hurting for performance on non-performant software. To that
    extent - it was a mistake for us to consider lazyserde to be a 'feature'
    (which is why we didn't back-port it to 0.2). so is hive-223 for example -
    we just need to have it asap in deployment - and by conventional
    definition
    - it certainly wasn't a regression that would go into a bug fix branch. I
    suspect there may be more such jiras.

    One way of looking at this is that we either branched too early, or we need
    to reconsider what goes into a branch.

    The other way to look at this is that every cluster administrator
    (including the one at Facebook - who is just like any user of Hive) - needs
    to have the option to pull in latest patches that are critical to his/her
    deployment. The success of Hive and the happiness of it's internal Facebook
    users should not and cannot be at odds with each other.


    -----Original Message-----
    From: Ashish Thusoo
    Sent: Tuesday, March 10, 2009 11:08 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I think a big reason for what killed 0.2 was the fact that we decided to
    deploy trunk into production because of some features that the internal
    users were asking for, instead of just continuing with the 0.2 branch. What
    I want to stress is that we cannot do that going forward. Once we branch out
    0.3, we have to let 0.3 soak in production till we have atleast 2 weeks of
    run with no blockers (I did not mean that we will just certify a branch to
    be a relase after 2 weeks - what I meant was that we have at least 2 weeks
    of run with no blockers) before we cut out a release from the branch. Again
    I must stress that we have to continue deploying the candidate branch into
    production and we cannot move the production machines to trunk as that will
    completely kill the branch (as happened with 0.2). We have to realy isolate
    blocker bug fixes from features and we have to understand that we cannot
    role out features overnight (as we have done so far for our users at FB) as
    doing that will make it absolutely hopeless in getting any branch stable.

    Having said that, we could move to a model where we make a new branch (not
    a release) from trunk once the previous candidate branch is released instead
    of having a train of branches at every 2 weeks. I am fine with that too.
    What is perhaps more critical is that we have a firm commitment that we are
    not going to deploy new features into production till we stabilize 0.3 and
    we should set the expectations accordingly...

    Ashish

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 9:52 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them.
    perhaps what's clouding my judgement is that there are a lot of critical
    bugs yet to be fixed - so I don't see how we can stabilize the first release
    in a couple of weeks - or even a month (which is what killed 0.2 I think to
    some extent).
    I would say that the first release is somewhat special. We are fixing a
    boatload of issues from a very large push of code (all of it!). In
    subsequent releases - there wouldn't be as many bugs - and a faster release
    cycle would be feasible.
    So my vote would be to branch now (before predicate push down), get the
    release stable as fast as possible (but potentially wait as long as it
    takes) - and then only start cutting more branches. Over time - we can
    converge to a faster release cycle - but right now this seems dubious to
    me.
    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week will
    be too often. Isn't there a risk that we're still fixing bugs in 0.3
    when the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even though
    they branch quarterly, 0.19 still have lots of issues open when people
    are committing patches to 0.21 (trunk). Granted Hadoop is a much
    larger codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how it
    goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and as I
    had mentioned in the previous email, the plan was cut a branch on the
    last wednesday of every month and then issue a vote for making it a release
    once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook.
    Accordingly I was hoping that we would limit the changes that would go into
    the branch (0.2) in this case to the blocker bugs only but it seems that we
    had some feature creep and as a result we switched to using trunk at
    facebook without giving sufficient time for 0.2 to stabilize. It also means
    that perhaps waiting for a month for each release is too long at this stage
    at least for FB. If others are in agreement, how about we do the following
    going forward..

    Cut a branch every other wednesday, only checkin the most ciritcal
    blocker bugs into the branch and reserve the features for trunk which will
    be picked up in the next branch and relegiously deploy only the versions of
    the branch at FB. We can start off a vote to make a branch an official
    release once we have atleast 2 weeks of run on the branch without any
    blocker bugs (i.e. we did not have a need to upgrade the production machines
    at FB).
    We can start off by creating a 0.3 branch this wednesday
    accordingly...
    Once we have an agreement on this we can document this procedure on
    the
    wiki and religiously follow it. Without controlling the tendency of a
    feature creep it would be difficult to get a stable version out...
    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the email
    now though), was there a feature freeze date set?
    After branching shouldn't we have moved the non critical issues to 0.3
    and pushed for fixing the remaining bugs in order to release?
    That aside, I don't have a strong opinion whether the next release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How about
    setting a feature freeze date now and take it from there?
    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and Cloudera
    folks) on best approach to get to a first Hive release.
    While 0.2 has been branched - it seems awkward to base the first
    release on it. The reason is twofold:
    - new changes to trunk since 0.2 have been relatively
    contained AFAIK (so no added instability). As evidence - Facebook has
    reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are
    extremely important from performance perspective. This includes the
    LazySerDe that Zheng added and upcoming hive-232.
    So one proposal is to branch 0.3 at this point and try to make that
    first official release for Hive.
    This does look a little haphazard - and the natural question is
    whether
    we can stick to this (or we end up repeating this once we throw in some more
    goodies). The feeling is that this may be a good time - hive-279 has major
    changes to the hive compiler and branching 0.3 before those changes are
    checked in gives us a good chance of producing a stable release with good
    performance (and the major changes will probably prevent us from repeating
    this trick going forward :)).
    What do people think?

    Joydeep

    --
    Yours,
    Zheng
  • Ashish Thusoo at Mar 26, 2009 at 6:25 pm
    We wanted to get the blockers out of the way. We got them to zero last night. As such we are ready to branch. We can do this today and then open a vote on this in a weeks time whether to make the branch (0.3.0) as a first release. Once the first release is out we can follow the train model just as hadoop does...

    Ashish

    -----Original Message-----
    From: Jeff Hammerbacher
    Sent: Thursday, March 26, 2009 11:19 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    Hey,

    What's the state of the release process? We'd really, really like to see a Hive release. As Nigel said on the Pig list, releasing often is good for the community.

    Thanks,
    Jeff
    On Tue, Mar 10, 2009 at 2:03 PM, Zheng Shao wrote:

    Till now the discussion is mainly on policy.

    However, another important thing is what we plan to work on in the
    next 2-3 weeks. Let's say we are focusing on fixing bugs (both
    Facebook users and open-source community users are crying for bug fixes, from what I can see).
    After 2-3 weeks when most bugs are fixed and Hive is more stable, we
    can come back to review this and decide the policy. Given the more
    information we have at that time, we might be able to make a better
    decision on policies.

    Also, if we are focusing on fixing bugs for the next 2-3 weeks, there
    is no point to make a 0.3 branch right now because every bug fix will
    go into both
    0.3 and trunk anyway.

    Let's fix most of the important bugs first, then make 0.3 branch, then
    we can work on 2 things at the same time: new features/perf
    improvement that goes only to trunk, and other minor bugs that goes to both 0.3 and trunk.

    Thoughts?

    Zheng
    On Tue, Mar 10, 2009 at 1:09 PM, Ashish Thusoo wrote:

    Agreed.

    I think we moved to trunk because of lazy serde from what Zheng
    tells me (I
    was out of office when this happened)...

    Regarding performance fixes, I would rather categorize performance
    regressions as blocker bugs and keep performance improvements as features.
    By that measure I think lazy serde was fine as a feature. I think we should
    just have let 0.2 stabilize and deployed lazy serde when we released
    0.2 and
    cut out a 0.3 branch and moved our systems to 0.3. Keeping the
    criteria for
    what gets categorized as a blocker tight is quite critical otherwise
    we will
    always be in danger of a constant feature creep and that would
    totally defeat the purpose of stabilization. In any case if we had
    been able to stabilize in a months time say for 0.2, I do not think
    the users would be too unhappy to get the lazy serde a month late.
    So from that token I would
    not categorize it to be a blocker as such.

    One constant problem is that the best stress testing environment
    that we have for Hive right now is our production work load at FB.
    So I am not sure
    whether we can have a certificate of stability to a branch if we at
    FB pull
    in patches and run a version that is different from the release.
    Though of
    course others are always free to get the patches from the JIRA and
    apply them as they see fit. I am not sure how to address this. Thoughts?

    Ashish

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Tuesday, March 10, 2009 11:37 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I am in general agreement - but the problems is the mail below
    doesn't explain why trunk was deployed.

    Performance fixes are like critical bugs. We cannot run a production
    cluster that's hurting for performance on non-performant software.
    To that
    extent - it was a mistake for us to consider lazyserde to be a 'feature'
    (which is why we didn't back-port it to 0.2). so is hive-223 for
    example -
    we just need to have it asap in deployment - and by conventional
    definition
    - it certainly wasn't a regression that would go into a bug fix
    branch. I suspect there may be more such jiras.

    One way of looking at this is that we either branched too early, or
    we need
    to reconsider what goes into a branch.

    The other way to look at this is that every cluster administrator
    (including the one at Facebook - who is just like any user of Hive)
    - needs
    to have the option to pull in latest patches that are critical to
    his/her deployment. The success of Hive and the happiness of it's
    internal Facebook
    users should not and cannot be at odds with each other.


    -----Original Message-----
    From: Ashish Thusoo
    Sent: Tuesday, March 10, 2009 11:08 AM
    To: hive-dev@hadoop.apache.org
    Subject: RE: branching Hive and getting to first release

    I think a big reason for what killed 0.2 was the fact that we
    decided to deploy trunk into production because of some features
    that the internal users were asking for, instead of just continuing with the 0.2 branch. What
    I want to stress is that we cannot do that going forward. Once we
    branch out
    0.3, we have to let 0.3 soak in production till we have atleast 2
    weeks of
    run with no blockers (I did not mean that we will just certify a
    branch to
    be a relase after 2 weeks - what I meant was that we have at least 2 weeks
    of run with no blockers) before we cut out a release from the branch. Again
    I must stress that we have to continue deploying the candidate
    branch into
    production and we cannot move the production machines to trunk as
    that will
    completely kill the branch (as happened with 0.2). We have to realy isolate
    blocker bug fixes from features and we have to understand that we
    cannot role out features overnight (as we have done so far for our
    users at FB) as
    doing that will make it absolutely hopeless in getting any branch stable.

    Having said that, we could move to a model where we make a new
    branch (not
    a release) from trunk once the previous candidate branch is released instead
    of having a train of branches at every 2 weeks. I am fine with that too.
    What is perhaps more critical is that we have a firm commitment that
    we are
    not going to deploy new features into production till we stabilize
    0.3 and
    we should set the expectations accordingly...

    Ashish

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 9:52 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    +1, sounds like a solid plan.

    Joydeep Sen Sarma wrote:
    I am also a little worried about a lot of releases and managing them.
    perhaps what's clouding my judgement is that there are a lot of
    critical bugs yet to be fixed - so I don't see how we can stabilize
    the first release
    in a couple of weeks - or even a month (which is what killed 0.2 I
    think to
    some extent).
    I would say that the first release is somewhat special. We are
    fixing a
    boatload of issues from a very large push of code (all of it!). In
    subsequent releases - there wouldn't be as many bugs - and a faster release
    cycle would be feasible.
    So my vote would be to branch now (before predicate push down),
    get the
    release stable as fast as possible (but potentially wait as long as
    it
    takes) - and then only start cutting more branches. Over time - we
    can converge to a faster release cycle - but right now this seems
    dubious to
    me.
    Can't put a newborn into kindergarten directly man .. :-)

    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 10, 2009 3:43 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    I'm worried that trying to create a new release every other week
    will be too often. Isn't there a risk that we're still fixing bugs
    in 0.3 when the 0.5 branch is cut if we run into something unexpected?
    It seems Hadoop is suffering from this issue a bit lately even
    though they branch quarterly, 0.19 still have lots of issues open
    when people are committing patches to 0.21 (trunk). Granted Hadoop
    is a much larger codebase with more patches applied.

    That said, I won't oppose trying the period suggested and see how
    it goes, it's quite easy to change after all.

    /Johan

    Ashish Thusoo wrote:
    For 0.2 we had set a feature freeze date on the 28th of Jan and
    as I had mentioned in the previous email, the plan was cut a
    branch on the
    last wednesday of every month and then issue a vote for making it a release
    once it ran satisfactorily (no blocker bugs) for atleast 2 weeks @ facebook.
    Accordingly I was hoping that we would limit the changes that would
    go into
    the branch (0.2) in this case to the blocker bugs only but it seems
    that we
    had some feature creep and as a result we switched to using trunk at
    facebook without giving sufficient time for 0.2 to stabilize. It
    also means
    that perhaps waiting for a month for each release is too long at
    this stage
    at least for FB. If others are in agreement, how about we do the following
    going forward..

    Cut a branch every other wednesday, only checkin the most
    ciritcal
    blocker bugs into the branch and reserve the features for trunk
    which will
    be picked up in the next branch and relegiously deploy only the
    versions of
    the branch at FB. We can start off a vote to make a branch an
    official release once we have atleast 2 weeks of run on the branch
    without any blocker bugs (i.e. we did not have a need to upgrade the
    production machines
    at FB).
    We can start off by creating a 0.3 branch this wednesday
    accordingly...
    Once we have an agreement on this we can document this procedure
    on
    the
    wiki and religiously follow it. Without controlling the tendency of
    a feature creep it would be difficult to get a stable version out...
    Thoughts?

    Ashish



    -----Original Message-----
    From: Johan Oskarsson
    Sent: Tuesday, March 03, 2009 2:54 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: branching Hive and getting to first release

    To be honest I must've missed that 0.2 was branched (I found the
    email
    now though), was there a feature freeze date set?
    After branching shouldn't we have moved the non critical issues
    to 0.3
    and pushed for fixing the remaining bugs in order to release?
    That aside, I don't have a strong opinion whether the next
    release is
    0.2 or 0.3, since there hasn't been an Apache release yet. How
    about
    setting a feature freeze date now and take it from there?
    /Johan

    Joydeep Sen Sarma wrote:
    Hey folks,

    A few of us were chatting earlier today (some Facebook and
    Cloudera
    folks) on best approach to get to a first Hive release.
    While 0.2 has been branched - it seems awkward to base the first
    release on it. The reason is twofold:
    - new changes to trunk since 0.2 have been relatively
    contained AFAIK (so no added instability). As evidence - Facebook
    has reverted to running trunk in production for the last week or so.
    - the changes that have gone into trunk since 0.2 are
    extremely important from performance perspective. This includes the
    LazySerDe that Zheng added and upcoming hive-232.
    So one proposal is to branch 0.3 at this point and try to make
    that
    first official release for Hive.
    This does look a little haphazard - and the natural question is
    whether
    we can stick to this (or we end up repeating this once we throw in
    some more
    goodies). The feeling is that this may be a good time - hive-279 has major
    changes to the hive compiler and branching 0.3 before those changes
    are checked in gives us a good chance of producing a stable release
    with good performance (and the major changes will probably prevent
    us from repeating
    this trick going forward :)).
    What do people think?

    Joydeep

    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshive, hadoop
postedMar 3, '09 at 6:30a
activeMar 26, '09 at 6:25p
posts13
users5
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase