FAQ
Even with the work on hadoop-0.22 (trunk) starting in earnest it is
fairly obvious, given our past history, that it will take a while for
us to get it stable and deployable - for e.g. it took us nearly 6
months to deploy hadoop-0.20.

In the interim I'd like to propose we push a hadoop-0.20-security
release off the Yahoo! patchset (http://github.com/yahoo/hadoop-
common). This will ensure the community benefits from all the work
done at Yahoo! for over 12 months *now*, and ensures that we do not
have to wait until hadoop-0.22 which has all of these patches.

Some salient aspects:
a) Full-fledged security implementation deployed at scale (4000 nodes)
in production.
b) Lots of work on the stabilizing and optimizing the NameNode and
JobTracker for over 12 months. This has been critical in deploying
Hadoop at scale i.e. clusters of 4000 nodes. For e.g. we have a 50%
improvement in CPU utilization on the JobTracker vis-a-vis the
hadoop-0.20.2 release.
c) Several new features in the scheduler (CapacityScheduler), Map-
Reduce framework, better support for multi-tenancy etc.
d) Several performance and stability improvements to the system e.g.
iterative ls, robustness against rogue clients/jobs/users etc.

Also, given the huge number of features and enhancements I'd like to
propose we create a new 0.20-security branch and commit the Yahoo
patchset there for the release.

This has been proposed earlier by Doug and did not get far due to
concerns about the effect this would have on development on trunk.
However, I believe, we have a case for demonstrable progress on trunk
now, and it would be useful to have an interim, fully-tested Apache
Hadoop release available to the community.

Conceivably, one could imagine a Hadoop Security + Append release
soon after. At this point a Hadoop Security release alone would add
tremendous value for the reasons above. Presently we would like to get
this release out quickly to focus the majority of our efforts on trunk.

Thoughts?

Arun

Search Discussions

  • Tom White at Aug 25, 2010 at 5:16 pm
    Hi Arun,

    I think it would be good to have a shared 0.20 Apache security branch.
    Since security isn't in 0.21, and the 0.22 release is a some way off
    as you mention, this would be useful for folks who want the security
    features sooner (and want to use an Apache release).

    Thanks,
    Tom
    On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy wrote:
    Even with the work on hadoop-0.22 (trunk) starting in earnest it is fairly
    obvious, given our past history, that it will take a while for us to get it
    stable and deployable - for e.g. it took us nearly 6 months to deploy
    hadoop-0.20.

    In the interim I'd like to propose we push a hadoop-0.20-security release
    off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will
    ensure the community benefits from all the work done at Yahoo! for over 12
    months *now*, and ensures that we do not have to wait until hadoop-0.22
    which has all of these patches.

    Some salient aspects:
    a) Full-fledged security implementation deployed at scale (4000 nodes) in
    production.
    b) Lots of work on the stabilizing and optimizing the NameNode and
    JobTracker for over 12 months. This has been critical in deploying Hadoop at
    scale i.e. clusters of 4000 nodes. For e.g. we have a 50% improvement in CPU
    utilization on the JobTracker vis-a-vis the hadoop-0.20.2 release.
    c) Several new features in the scheduler (CapacityScheduler), Map-Reduce
    framework, better support for multi-tenancy etc.
    d) Several performance and stability improvements to the system e.g.
    iterative ls, robustness against rogue clients/jobs/users etc.

    Also, given the huge number of features and enhancements I'd like to propose
    we create a new 0.20-security branch and commit the Yahoo patchset there for
    the release.

    This has been proposed earlier by Doug and did not get far due to concerns
    about the effect this would have on development on trunk. However, I
    believe, we have a case for demonstrable progress on trunk now, and it would
    be useful to have an interim, fully-tested Apache Hadoop release available
    to the community.

    Conceivably, one could imagine a Hadoop Security + Append release soon
    after. At this point a Hadoop Security release alone would add tremendous
    value for the reasons above. Presently we would like to get this release out
    quickly to focus the majority of our efforts on trunk.

    Thoughts?

    Arun
  • Hemanth Yamijala at Aug 25, 2010 at 5:47 pm
    Arun,

    How much time do you think it would take to have a version of 0.20
    with the security features in it ready ? In a different thread, Owen
    has started discussing plans around 0.22. Do you think this effort
    would affect 0.22 release ?

    I do agree that this would be very useful for folks who want security
    sooner. And the fact that Yahoo! have been running it at scale for a
    good while now is also assuring.

    Thanks
    hemanth
    On Tue, Aug 24, 2010 at 5:57 AM, Arun C Murthy wrote:
    Even with the work on hadoop-0.22 (trunk) starting in earnest it is fairly
    obvious, given our past history, that it will take a while for us to get it
    stable and deployable - for e.g. it took us nearly 6 months to deploy
    hadoop-0.20.

    In the interim I'd like to propose we push a hadoop-0.20-security release
    off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will
    ensure the community benefits from all the work done at Yahoo! for over 12
    months *now*, and ensures that we do not have to wait until hadoop-0.22
    which has all of these patches.

    Some salient aspects:
    a) Full-fledged security implementation deployed at scale (4000 nodes) in
    production.
    b) Lots of work on the stabilizing and optimizing the NameNode and
    JobTracker for over 12 months. This has been critical in deploying Hadoop at
    scale i.e. clusters of 4000 nodes. For e.g. we have a 50% improvement in CPU
    utilization on the JobTracker vis-a-vis the hadoop-0.20.2 release.
    c) Several new features in the scheduler (CapacityScheduler), Map-Reduce
    framework, better support for multi-tenancy etc.
    d) Several performance and stability improvements to the system e.g.
    iterative ls, robustness against rogue clients/jobs/users etc.

    Also, given the huge number of features and enhancements I'd like to propose
    we create a new 0.20-security branch and commit the Yahoo patchset there for
    the release.

    This has been proposed earlier by Doug and did not get far due to concerns
    about the effect this would have on development on trunk. However, I
    believe, we have a case for demonstrable progress on trunk now, and it would
    be useful to have an interim, fully-tested Apache Hadoop release available
    to the community.

    Conceivably, one could imagine a Hadoop Security + Append release soon
    after. At this point a Hadoop Security release alone would add tremendous
    value for the reasons above. Presently we would like to get this release out
    quickly to focus the majority of our efforts on trunk.

    Thoughts?

    Arun
  • Arun C Murthy at Aug 25, 2010 at 6:00 pm

    On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:

    Arun,

    How much time do you think it would take to have a version of 0.20
    with the security features in it ready ? In a different thread, Owen
    has started discussing plans around 0.22. Do you think this effort
    would affect 0.22 release ?
    I think it should be fairly trivial to get this release out - most of
    the effort is just the mechanics of committing the patches to an
    Apache branch from the yahoo git repository, creating a release
    candidate and calling it a success! *smile*

    I think doing this quickly is critical in ensuring that we do not lose
    focus on 0.22, but I believe this will definitely help the community.
    I do agree that this would be very useful for folks who want security
    sooner. And the fact that Yahoo! have been running it at scale for a
    good while now is also assuring.
    Just to clarify - this has security and a bunch of other enhancements
    (which are either in 0.21 or 0.22 or both).

    Arun
  • Steve Loughran at Aug 26, 2010 at 2:12 pm

    On 25/08/10 18:59, Arun C Murthy wrote:
    On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:

    Arun,

    How much time do you think it would take to have a version of 0.20
    with the security features in it ready ? In a different thread, Owen
    has started discussing plans around 0.22. Do you think this effort
    would affect 0.22 release ?
    I think it should be fairly trivial to get this release out - most of
    the effort is just the mechanics of committing the patches to an Apache
    branch from the yahoo git repository, creating a release candidate and
    calling it a success! *smile*
    oh, and testing it..



    what scalability patches like HDFS-599 are in?
  • Arun C Murthy at Aug 26, 2010 at 4:10 pm

    On Aug 26, 2010, at 7:11 AM, Steve Loughran wrote:
    On 25/08/10 18:59, Arun C Murthy wrote:
    On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:

    Arun,

    How much time do you think it would take to have a version of 0.20
    with the security features in it ready ? In a different thread, Owen
    has started discussing plans around 0.22. Do you think this effort
    would affect 0.22 release ?
    I think it should be fairly trivial to get this release out - most of
    the effort is just the mechanics of committing the patches to an
    Apache
    branch from the yahoo git repository, creating a release candidate
    and
    calling it a success! *smile*
    oh, and testing it..
    Already is! *smile*
    It's running on 4k clusters in production at this point...
  • Steve Loughran at Aug 26, 2010 at 4:41 pm

    On 26/08/10 17:09, Arun C Murthy wrote:
    On Aug 26, 2010, at 7:11 AM, Steve Loughran wrote:
    On 25/08/10 18:59, Arun C Murthy wrote:
    On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:

    Arun,

    How much time do you think it would take to have a version of 0.20
    with the security features in it ready ? In a different thread, Owen
    has started discussing plans around 0.22. Do you think this effort
    would affect 0.22 release ?
    I think it should be fairly trivial to get this release out - most of
    the effort is just the mechanics of committing the patches to an Apache
    branch from the yahoo git repository, creating a release candidate and
    calling it a success! *smile*
    oh, and testing it..
    Already is! *smile*
    It's running on 4k clusters in production at this point...
    +1 then, ship it.
  • Allen Wittenauer at Aug 25, 2010 at 8:25 pm

    On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:
    I do agree that this would be very useful for folks who want security
    sooner. And the fact that Yahoo! have been running it at scale for a
    good while now is also assuring.
    As has been mentioned a few times, part of the security features are dependent upon Yahoo!-type operations. Those would need to get replaced or a decision would need to be made that we are removing/regressing certain features (the cluster-wide start scripts).
  • Devaraj Das at Aug 25, 2010 at 9:14 pm
    As has been mentioned a few times, part of the security features are dependent upon Yahoo!-type operations.
    Allen, could you please enlist them here again (for the benefit of the community)? Or, are you referring to only the cluster-wide start scripts?




    On 8/25/10 1:25 PM, "Allen Wittenauer" wrote:


    On Aug 25, 2010, at 10:46 AM, Hemanth Yamijala wrote:
    I do agree that this would be very useful for folks who want security
    sooner. And the fact that Yahoo! have been running it at scale for a
    good while now is also assuring.
    As has been mentioned a few times, part of the security features are dependent upon Yahoo!-type operations. Those would need to get replaced or a decision would need to be made that we are removing/regressing certain features (the cluster-wide start scripts).
  • Stack at Aug 26, 2010 at 7:09 pm

    On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy wrote:
    In the interim I'd like to propose we push a hadoop-0.20-security release
    off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will
    ensure the community benefits from all the work done at Yahoo! for over 12
    months *now*, and ensures that we do not have to wait until hadoop-0.22
    which has all of these patches.
    Sounds good to me. What will this release be called? hadoop-0.20.3-security?
    Conceivably, one could imagine a Hadoop Security + Append release soon
    after.
    Well, it'd probably be better if we just did an append release first?
    A good few of us have been banging on the 0.20-append branch w/ a
    while now and its for sure doing append better than 0.20 did (smile).

    St.Ack
  • Arun C Murthy at Aug 26, 2010 at 11:24 pm

    On Aug 26, 2010, at 12:08 PM, Stack wrote:
    On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy wrote:
    In the interim I'd like to propose we push a hadoop-0.20-security
    release
    off the Yahoo! patchset (http://github.com/yahoo/hadoop-common).
    This will
    ensure the community benefits from all the work done at Yahoo! for
    over 12
    months *now*, and ensures that we do not have to wait until
    hadoop-0.22
    which has all of these patches.
    Sounds good to me. What will this release be called? hadoop-0.20.3-
    security?
    hadoop-0.20-security. I want to ensure hadoop-0.20 be a separate line,
    so as to not confuse people.

    Conceivably, one could imagine a Hadoop Security + Append release
    soon
    after.
    Well, it'd probably be better if we just did an append release first?
    A good few of us have been banging on the 0.20-append branch w/ a
    while now and its for sure doing append better than 0.20 did (smile).
    I think these are orthogonal and both can run their own course.

    Arun
  • Ted Yu at Aug 26, 2010 at 11:30 pm
    This would imply hadoop-0.20-security-append or hadoop-0.20-append-security
    release be created which contains security and append features.
    On Thu, Aug 26, 2010 at 4:22 PM, Arun C Murthy wrote:


    On Aug 26, 2010, at 12:08 PM, Stack wrote:
    On Mon, Aug 23, 2010 at 5:27 PM, Arun C Murthy wrote:

    In the interim I'd like to propose we push a hadoop-0.20-security release
    off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This
    will
    ensure the community benefits from all the work done at Yahoo! for over
    12
    months *now*, and ensures that we do not have to wait until hadoop-0.22
    which has all of these patches.
    Sounds good to me. What will this release be called?
    hadoop-0.20.3-security?
    hadoop-0.20-security. I want to ensure hadoop-0.20 be a separate line, so
    as to not confuse people.


    Conceivably, one could imagine a Hadoop Security + Append release soon
    after.
    Well, it'd probably be better if we just did an append release first?
    A good few of us have been banging on the 0.20-append branch w/ a
    while now and its for sure doing append better than 0.20 did (smile).
    I think these are orthogonal and both can run their own course.

    Arun
  • Arun C Murthy at Aug 26, 2010 at 11:42 pm

    On Aug 26, 2010, at 4:30 PM, Ted Yu wrote:

    This would imply hadoop-0.20-security-append or hadoop-0.20-append-
    security
    release be created which contains security and append features.
    As I mentioned in my initial proposal - it's conceivable, not imminent.
    The community might decide that it is a valuable direction and folks
    may work on integrating the two.

    At this point, I am signing up to shepherd hadoop-0.20-security. I'd
    like to do it quickly and move on to working on Hadoop trunk, others
    are welcome to take this and run further.

    Arun
  • Owen O'Malley at Aug 26, 2010 at 11:29 pm

    On Thu, Aug 26, 2010 at 12:08 PM, Stack wrote:
    Sounds good to me.  What will this release be called?  hadoop-0.20.3-security?
    It is a new branch, so the question is what is the branch name. I'd
    propose calling it 0.20-security and the releases would be
    0.20-security.0, etc.
    Well, it'd probably be better if we just did an append release first?
    I don't think the ordering maters. 0.20-security is a different branch
    that isn't comparable to 0.20-append.

    0.20 < 0.20-security < 0.22
    0.20 < 0.20-append < 0.21 < 0.22

    It does make a bit of a mess.

    -- Owen
  • Arun C Murthy at Aug 30, 2010 at 9:15 pm

    On Aug 23, 2010, at 5:27 PM, Arun C Murthy wrote:
    In the interim I'd like to propose we push a hadoop-0.20-security
    release off the Yahoo! patchset (http://github.com/yahoo/hadoop-
    common). This will ensure the community benefits from all the work
    done at Yahoo! for over 12 months *now*, and ensures that we do not
    have to wait until hadoop-0.22 which has all of these patches.
    Since most people seemed to think of it as a reasonable idea, I'm
    going to create the hadoop-0.20-security branch and start the
    necessary work.

    thanks,
    Arun
  • Doug Cutting at Oct 15, 2010 at 9:28 pm

    On 08/30/2010 02:14 PM, Arun C Murthy wrote:
    Since most people seemed to think of it as a reasonable idea, I'm going
    to create the hadoop-0.20-security branch and start the necessary work.
    I don't yet see this branch. Are you still intending to do this?

    Doug
  • Arun C Murthy at Jan 11, 2011 at 7:12 am

    On 10/15/2010 02:28 PM, Doug Cutting wrote:

    On 08/30/2010 02:14 PM, Arun C Murthy wrote:
    Since most people seemed to think of it as a reasonable idea, I'm
    going
    to create the hadoop-0.20-security branch and start the necessary
    work.

    I don't yet see this branch. Are you still intending to do this?

    Doug
    Things stalled, my apologies. Turns out having a kid is a lot of work,
    who knew! *smile*

    I'm back now and plan to start work on this. Hopefully I can get this
    over with quickly, in a couple of weeks, to focus on the next
    release(s).

    thanks,
    Arun
  • Stack at Jan 11, 2011 at 7:14 pm

    On Mon, Jan 10, 2011 at 11:11 PM, Arun C Murthy wrote:
    Things stalled, my apologies. Turns out having a kid is a lot of work, who
    knew! *smile*
    Really (smile -- congrats Arun).
    I'm back now and plan to start work on this. Hopefully I can get this over
    with quickly, in a couple of weeks, to focus on the next release(s).
    What you thinking? What'll you call it?

    Good on you,
    St.Ack
  • Arun C Murthy at Jan 12, 2011 at 5:10 am

    On Jan 11, 2011, at 11:14 AM, "Stack" wrote:

    I'm back now and plan to start work on this. Hopefully I can get this over
    with quickly, in a couple of weeks, to focus on the next release(s).
    What you thinking? What'll you call it?

    Good on you,
    St.Ack
    Thanks Stack.

    I'm open to suggestions - how about something like 20.100 to show that it's a big jump? Anything else?

    Arun
  • Mahadev Konar at Jan 12, 2011 at 5:09 pm
    +1. I like the idea of 20.100.

    Thanks
    mahadev

    On 1/11/11 9:09 PM, "Arun C Murthy" wrote:
    On Jan 11, 2011, at 11:14 AM, "Stack" wrote:

    I'm back now and plan to start work on this. Hopefully I can get this over
    with quickly, in a couple of weeks, to focus on the next release(s).
    What you thinking? What'll you call it?

    Good on you,
    St.Ack
    Thanks Stack.

    I'm open to suggestions - how about something like 20.100 to show that it's a
    big jump? Anything else?

    Arun
  • Patrick Angeles at Jan 12, 2011 at 5:27 pm
    You're gonna call your kid 20.100?

    :)

    Congratz
    On Wed, Jan 12, 2011 at 12:09 AM, Arun C Murthy wrote:
    On Jan 11, 2011, at 11:14 AM, "Stack" wrote:

    I'm back now and plan to start work on this. Hopefully I can get this
    over
    with quickly, in a couple of weeks, to focus on the next release(s).
    What you thinking? What'll you call it?

    Good on you,
    St.Ack
    Thanks Stack.

    I'm open to suggestions - how about something like 20.100 to show that it's
    a big jump? Anything else?

    Arun
  • Owen O'Malley at Jan 12, 2011 at 9:11 pm

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Ian Holsman at Jan 12, 2011 at 9:27 pm
    so if 0.20 becomes 1.0, what does 0.22 become ?

    I'm still not sure if we shouldn't just add security to 0.22, and leave the 0.20 in maintenance mode from here on.
    On Jan 12, 2011, at 4:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this patch set is neither a subset or superset of the 0.21 or 0.22 branches. Given that, I think that a new major release number makes the most sense. It is also relatively likely that additional minor releases will be made off of this branch while 0.22 is stabilizing. We've talked about declaring 0.20 a 1.0 for a long time and this feels like backing into the decision, but technically, I believe it to be the right name for such a release.

    Thoughts?

    -- Owen
  • Chris Douglas at Jan 12, 2011 at 9:58 pm
    I had exactly the same reaction when this came up in the past:

    http://s.apache.org/l9
    http://s.apache.org/5Gv

    but our experience with myriad 0.20 variants has demonstrated that
    Hadoop can support both a stable branch and a development branch.
    Trying to direct effort away from 0.20 by preventing it from happening
    in Apache didn't work, and I was wrong to advocate for it. The
    interest in a more slow-moving, stable version of Hadoop will exist
    whether we give it an outlet in Apache or not, most of us work on both
    anyway, so we might as well collaborate in both fora. -C
    On Wed, Jan 12, 2011 at 1:26 PM, Ian Holsman wrote:
    so if 0.20 becomes 1.0, what does 0.22 become ?

    I'm still not sure if we shouldn't just add security to 0.22, and leave the 0.20 in maintenance mode from here on.
    On Jan 12, 2011, at 4:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this patch set is neither a subset or superset of the 0.21 or 0.22 branches. Given that, I think that a new major release number makes the most sense. It is also relatively likely that additional minor releases will be made off of this branch while 0.22 is stabilizing. We've talked about declaring 0.20 a 1.0 for a long time and this feels like backing into the decision, but technically, I believe it to be the right name for such a release.

    Thoughts?

    -- Owen
  • Arun C Murthy at Jan 12, 2011 at 9:35 pm
    I'm willing to discuss any and all options, for a very short period.

    Technically you have a reasonable point, Doug has suggested this in
    the past too. If everyone agrees, fine; if not, I'm do not want hung
    up on a release number. I just *do not* want a controversy.

    As I mentioned, I'm looking to finish this up in a couple of weeks;
    so, I could do without a long discussion on the on the critical path.

    I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100
    is what I'm priming for.

    Heck, if Stack wants to call the append release (not sure how far
    ahead he is) as hadoop-0.20.100, I'm willing to call this
    hadoop-0.20.200.

    All I care about is having a distinct release number from 0.20.2 (our
    last stable release). Again, I just want to get a release into the
    hands of our users. Please, let's resolve this quickly. Please.

    Arun
    On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Eric Baldeschwieler at Jan 12, 2011 at 9:54 pm
    Let me second arun here.

    This is incremental work on 0.20. We're happy to support any branch naming strategy the community likes, but sticking with 20.<minor> seems like the right default approach.

    Let's discuss 1.0 issues on another thread. Our priority is to get our work into other folks hands.

    Thanks!
    E14
    On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote:

    I'm willing to discuss any and all options, for a very short period.

    Technically you have a reasonable point, Doug has suggested this in
    the past too. If everyone agrees, fine; if not, I'm do not want hung
    up on a release number. I just *do not* want a controversy.

    As I mentioned, I'm looking to finish this up in a couple of weeks;
    so, I could do without a long discussion on the on the critical path.

    I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100
    is what I'm priming for.

    Heck, if Stack wants to call the append release (not sure how far
    ahead he is) as hadoop-0.20.100, I'm willing to call this
    hadoop-0.20.200.

    All I care about is having a distinct release number from 0.20.2 (our
    last stable release). Again, I just want to get a release into the
    hands of our users. Please, let's resolve this quickly. Please.

    Arun
    On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Nigel Daley at Jan 12, 2011 at 10:57 pm
    +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion.

    Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious.

    Cheers,
    Nige

    On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote:

    I'm willing to discuss any and all options, for a very short period.

    Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy.

    As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path.

    I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for.

    Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200.

    All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please.

    Arun
    On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Eli Collins at Jan 12, 2011 at 11:03 pm
    +1 on 0.20.x (where x is a J > 3)

    Nigel - could we make all the patches in this branch that have not
    been committed up stream (that need to be) blockers for 22? This way
    22 is not a regression against 0.20.x.

    Thanks,
    Eli
    On Wed, Jan 12, 2011 at 2:56 PM, Nigel Daley wrote:
    +1 for 0.20.x, where x >= 100.  I agree that the 1.0 moniker would involve more discussion.

    Will this be a jumbo patch attached to a Jira and then committed to the branch?  Just curious.

    Cheers,
    Nige

    On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote:

    I'm willing to discuss any and all options, for a very short period.

    Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy.

    As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path.

    I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for.

    Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200.

    All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please.

    Arun
    On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Nigel Daley at Jan 12, 2011 at 11:16 pm

    Nigel - could we make all the patches in this branch that have not
    been committed up stream (that need to be) blockers for 22? This way
    22 is not a regression against 0.20.x.
    I sure hope so. May be difficult to untangle if it's a jumbo patch -- answer is in the details of how it's contributed.

    Cheers,
    Nige

    On Jan 12, 2011, at 3:02 PM, Eli Collins wrote:

    +1 on 0.20.x (where x is a J > 3)

    Nigel - could we make all the patches in this branch that have not
    been committed up stream (that need to be) blockers for 22? This way
    22 is not a regression against 0.20.x.

    Thanks,
    Eli
    On Wed, Jan 12, 2011 at 2:56 PM, Nigel Daley wrote:
    +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion.

    Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious.

    Cheers,
    Nige

    On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote:

    I'm willing to discuss any and all options, for a very short period.

    Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy.

    As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path.

    I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for.

    Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200.

    All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please.

    Arun
    On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Ian Holsman at Jan 12, 2011 at 11:27 pm
    So what is the plan with 20.3 that Owen volunteered to RM?
    Should we do that, or just integrate the security code with that and call it 20.x?

    ---
    Ian Holsman - 703 879-3128

    I saw the angel in the marble and carved until I set him free -- Michelangelo
    On 12/01/2011, at 6:02 PM, Eli Collins wrote:

    +1 on 0.20.x (where x is a J > 3)

    Nigel - could we make all the patches in this branch that have not
    been committed up stream (that need to be) blockers for 22? This way
    22 is not a regression against 0.20.x.

    Thanks,
    Eli
    On Wed, Jan 12, 2011 at 2:56 PM, Nigel Daley wrote:
    +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion.

    Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious.

    Cheers,
    Nige

    On Jan 12, 2011, at 1:34 PM, Arun C Murthy wrote:

    I'm willing to discuss any and all options, for a very short period.

    Technically you have a reasonable point, Doug has suggested this in the past too. If everyone agrees, fine; if not, I'm do not want hung up on a release number. I just *do not* want a controversy.

    As I mentioned, I'm looking to finish this up in a couple of weeks; so, I could do without a long discussion on the on the critical path.

    I'm happy to go with a reasonable compromise, if not, hadoop-0.20.100 is what I'm priming for.

    Heck, if Stack wants to call the append release (not sure how far ahead he is) as hadoop-0.20.100, I'm willing to call this hadoop-0.20.200.

    All I care about is having a distinct release number from 0.20.2 (our last stable release). Again, I just want to get a release into the hands of our users. Please, let's resolve this quickly. Please.

    Arun
    On Jan 12, 2011, at 1:10 PM, Owen O'Malley wrote:

    On Jan 11, 2011, at 9:09 PM, Arun C Murthy wrote:

    I'm open to suggestions - how about something like 20.100 to show
    that it's a big jump? Anything else?

    Although I'm not wild about any of the potential release names, this
    patch set is neither a subset or superset of the 0.21 or 0.22
    branches. Given that, I think that a new major release number makes
    the most sense. It is also relatively likely that additional minor
    releases will be made off of this branch while 0.22 is stabilizing.
    We've talked about declaring 0.20 a 1.0 for a long time and this feels
    like backing into the decision, but technically, I believe it to be
    the right name for such a release.

    Thoughts?

    -- Owen
  • Arun C Murthy at Jan 13, 2011 at 7:08 am

    On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote:

    +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would
    involve more discussion.
    Ok, seems like we are converging; we can continue talking. I've
    created the branch to get the ball rolling.
    Will this be a jumbo patch attached to a Jira and then committed to
    the branch? Just curious.
    I'm afraid that the svn log of the branch from github Y! branch is
    fairly useless since a single JIRA might have multiple commits in the
    Y! branch (bugfix on top of a bugfix). We have done that in several
    cases (but the patches committed to trunk have a single patch which is
    the result of forward porting a complete feature/bugfix). IAC the this
    branch and 0.22 have diverged so much that almost no non-trivial patch
    would apply without a significant amount of work.

    Thus, I think a jumbo patch should suffice. It will also ensure this
    can done quickly so that the community can then concentrate on 0.22
    and beyond.

    However, I will (manually) ensure all relevant jiras are referenced in
    the CHANGES.txt and Release Notes for folks to see the contents of the
    release. This is the hardest part of the exercise. Also, this ensures
    that we can track these jiras for 0.22 as Eli suggested.

    Does that seem like a reasonable way forward? I'm happy to brainstorm.

    thanks,
    Arun
  • Nigel Daley at Jan 13, 2011 at 7:10 am

    On Jan 12, 2011, at 11:07 PM, Arun C Murthy wrote:

    On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote:

    +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve more discussion.
    Ok, seems like we are converging; we can continue talking. I've created the branch to get the ball rolling.
    Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious.
    I'm afraid that the svn log of the branch from github Y! branch is fairly useless since a single JIRA might have multiple commits in the Y! branch (bugfix on top of a bugfix). We have done that in several cases (but the patches committed to trunk have a single patch which is the result of forward porting a complete feature/bugfix). IAC the this branch and 0.22 have diverged so much that almost no non-trivial patch would apply without a significant amount of work.

    Thus, I think a jumbo patch should suffice. It will also ensure this can done quickly so that the community can then concentrate on 0.22 and beyond.

    However, I will (manually) ensure all relevant jiras are referenced in the CHANGES.txt and Release Notes for folks to see the contents of the release. This is the hardest part of the exercise. Also, this ensures that we can track these jiras for 0.22 as Eli suggested.

    Does that seem like a reasonable way forward? I'm happy to brainstorm.
    +1. If it turns out to be insufficient to figure out how to apply similar changes to trunk/0.22 then we can address that as needed.

    Thanks Arun!

    Nige
  • Todd Lipcon at Jan 13, 2011 at 10:05 pm
    Hi Arun, all,

    When we merged YDH and CDH for CDH3b3, we went through the effort of
    "linearizing" all of the YDH patches and squashing multiple commits into
    single ones corresponding to a single JIRA where possible. So, we have a
    100% linear set of patches that applies on top of the 0.20.2 source tree and
    includes Yahoo 0.20.100.3 as well as almost all the patches from 0.20-append
    and a number of other backports.

    Since this could be applied as a linear set of patches instead of a big
    lump, would there be interest in using this as the 0.20.>100 Apache release?
    I can take the time to remove any patches that are cloudera specific or not
    yet applied upstream.

    Thanks
    -Todd

    On Wed, Jan 12, 2011 at 11:07 PM, Arun C Murthy wrote:


    On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote:

    +1 for 0.20.x, where x >= 100. I agree that the 1.0 moniker would involve
    more discussion.
    Ok, seems like we are converging; we can continue talking. I've created the
    branch to get the ball rolling.


    Will this be a jumbo patch attached to a Jira and then committed to the
    branch? Just curious.
    I'm afraid that the svn log of the branch from github Y! branch is fairly
    useless since a single JIRA might have multiple commits in the Y! branch
    (bugfix on top of a bugfix). We have done that in several cases (but the
    patches committed to trunk have a single patch which is the result of
    forward porting a complete feature/bugfix). IAC the this branch and 0.22
    have diverged so much that almost no non-trivial patch would apply without a
    significant amount of work.

    Thus, I think a jumbo patch should suffice. It will also ensure this can
    done quickly so that the community can then concentrate on 0.22 and beyond.

    However, I will (manually) ensure all relevant jiras are referenced in the
    CHANGES.txt and Release Notes for folks to see the contents of the release.
    This is the hardest part of the exercise. Also, this ensures that we can
    track these jiras for 0.22 as Eli suggested.

    Does that seem like a reasonable way forward? I'm happy to brainstorm.

    thanks,
    Arun

    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Arun C Murthy at Jan 13, 2011 at 11:05 pm
    Todd,

    On Jan 13, 2011, at 2:04 PM, Todd Lipcon wrote:

    Hi Arun, all,

    When we merged YDH and CDH for CDH3b3, we went through the effort of
    "linearizing" all of the YDH patches and squashing multiple commits
    into
    single ones corresponding to a single JIRA where possible. So, we
    have a
    100% linear set of patches that applies on top of the 0.20.2 source
    tree and
    includes Yahoo 0.20.100.3 as well as almost all the patches from
    0.20-append
    and a number of other backports.

    Since this could be applied as a linear set of patches instead of a
    big
    lump, would there be interest in using this as the 0.20.>100 Apache
    release?
    I can take the time to remove any patches that are cloudera specific
    or not
    yet applied upstream.
    Interesting discussion, thanks.

    I'm sure it took you a fair amount of work to squash patches (which I
    tried too, btw). That, plus the fact that we would need to do a
    similar amount of work for the 10 or so releases we have done after
    0.20.100.3 scares me.

    As we Nigel and I discussed here, the jumbo patch and an up-to-date
    CHANGES.txt provides almost all of the benefits we seek and allows all
    of us to get this done very quickly to focus on hadoop-0.22 and beyond.

    What do you think?

    OTOH, I could get this release done and start squashing patches for
    the sake of completeness as a background activity.

    Thoughts?

    thanks,
    Arun
  • Todd Lipcon at Jan 13, 2011 at 11:39 pm

    On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy wrote:

    Since this could be applied as a linear set of patches instead of a big
    lump, would there be interest in using this as the 0.20.>100 Apache
    release?
    I can take the time to remove any patches that are cloudera specific or
    not
    yet applied upstream.
    Interesting discussion, thanks.

    I'm sure it took you a fair amount of work to squash patches (which I tried
    too, btw).

    Yep, I had a great summer ;-)

    That, plus the fact that we would need to do a similar amount of work for
    the 10 or so releases we have done after 0.20.100.3 scares me.
    Sorry, I actually meant 0.20.104.3. Have there been many releases since
    then? That's the last version available on the Yahoo github, and that's the
    version we incorporated/linearized.

    If there is a large sequence of patches after this that you're planning on
    including, it would be good to see them in your git repo.


    As we Nigel and I discussed here, the jumbo patch and an up-to-date
    CHANGES.txt provides almost all of the benefits we seek and allows all of us
    to get this done very quickly to focus on hadoop-0.22 and beyond.
    In my opinion here are the downsides to this plan:

    - a mondo "merge" patch is a big pain when trying to do debugging. It may be
    sufficient for a user to look at CHANGES.txt, but I find myself using
    blame/log/etc on individual files to understand code lineage on a daily
    basis. If all of the merge shows up as a big patch it will be very difficult
    (at least the way I work with code) to help users debug issues or understand
    which JIRA a certain regression may have come from.

    - CHANGES.txt traditionally doesn't reference which patch file from a JIRA
    was checked in. So we may know that a given JIRA has been included, but
    often there are several revisions of patches on the JIRA and it's difficult
    to be sure that we have the most up-to-date version. By looking at change
    history it's usually easy to pick this out, but if it's one giant patch
    apply, this isn't possible.

    - the proposal to use the YDH distro certainly solves the Security issue,
    but doesn't help out HBase at all. Given HBase has been asking for a long
    time to get a real release of the append branch, I think it would be better
    to have one 20-based release which has both of these features, rather than
    further fragmenting the community into 0.20.2, 0.20.2+security,
    0.20.2+append.

    I think the first two points could be addressed if you push your git tree
    either to github or an apache-hosted git, and then include in SVN as a mondo
    patch. It's not ideal, but at least when trying to debug issues and
    understand the history of this branch there will be a publicly available
    change history to reference.

    To clarify my position a bit here - I definitely appreciate your
    volunteering to do the work, and wouldn't *block* the proposal as you've put
    it forth. I just think it will have limited utility for the community by
    being opaque (if contributed as a giant patch) and by not including the sync
    feature which is critical for a large segment of users. Given those
    downsides I'd rather see the effort diverted towards making a killer 0.22
    release that we can all jump on.

    Thanks
    -Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Arun C Murthy at Jan 14, 2011 at 12:59 am

    On Jan 13, 2011, at 3:34 PM, Todd Lipcon wrote:
    On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy wrote:

    Since this could be applied as a linear set of patches instead of a
    big
    lump, would there be interest in using this as the 0.20.>100 Apache
    release?
    I can take the time to remove any patches that are cloudera
    specific or
    not
    yet applied upstream.
    Interesting discussion, thanks.

    I'm sure it took you a fair amount of work to squash patches (which
    I tried
    too, btw).

    Yep, I had a great summer ;-)

    That, plus the fact that we would need to do a similar amount of
    work for
    the 10 or so releases we have done after 0.20.100.3 scares me.
    Sorry, I actually meant 0.20.104.3. Have there been many releases
    since
    then? That's the last version available on the Yahoo github, and
    that's the
    version we incorporated/linearized.
    Yep. I had a great summer! ;-)
    As we Nigel and I discussed here, the jumbo patch and an up-to-date
    CHANGES.txt provides almost all of the benefits we seek and allows
    all of us
    to get this done very quickly to focus on hadoop-0.22 and beyond.
    In my opinion here are the downsides to this plan:
    I agree there are downsides, I think I did point them out at the
    outset! :)
    - a mondo "merge" patch is a big pain when trying to do debugging.
    It may be
    sufficient for a user to look at CHANGES.txt, but I find myself using
    blame/log/etc on individual files to understand code lineage on a
    daily
    basis. If all of the merge shows up as a big patch it will be very
    difficult
    (at least the way I work with code) to help users debug issues or
    understand
    which JIRA a certain regression may have come from.
    Right, no question. Which is why I offered to do this as a background
    activity right after... this ensures that the source of truth is
    *always* a branch in Apache subversion.

    I feel that we could get a usable release out of door quickly for our
    users. Also, please remember that almost every patch we have committed
    is available on relevant jiras. I understand the devs have a problem
    and I feel we can bear with it for a little while. Again, I agree this
    isn't an ideal solution, I'm just trying to expedite the release for
    the users.
    To clarify my position a bit here - I definitely appreciate your
    volunteering to do the work, and wouldn't *block* the proposal as
    you've put
    it forth. I just think it will have limited utility for the
    community by
    being opaque (if contributed as a giant patch) and by not including
    the sync
    feature which is critical for a large segment of users. Given those
    downsides I'd rather see the effort diverted towards making a killer
    0.22
    release that we can all jump on.
    Thanks for understanding.

    Again, I completely agree this isn't an ideal situation, but I do hope
    it has a bit more than *limited utility* for our end-users. Who knows,
    I maybe hopelessly deluded! *smile*

    Also, I'm trying to do exactly what you suggested - spend very little
    time on this so that everyone, including me, can focus on the future.

    thanks,
    Arun
  • Eli Collins at Jan 14, 2011 at 1:35 am

    On Thursday, January 13, 2011, Arun C Murthy wrote:
    On Jan 13, 2011, at 3:34 PM, Todd Lipcon wrote:


    On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy wrote:


    Since this could be applied as a linear set of patches instead of a big

    lump, would there be interest in using this as the 0.20.>100 Apache
    release?
    I can take the time to remove any patches that are cloudera specific or
    not
    yet applied upstream.



    Interesting discussion, thanks.

    I'm sure it took you a fair amount of work to squash patches (which I tried
    too, btw).



    Yep, I had a great summer ;-)



    That, plus the fact that we would need to do a similar amount of work for
    the 10 or so releases we have done after 0.20.100.3 scares me.



    Sorry, I actually meant 0.20.104.3. Have there been many releases since
    then? That's the last version available on the Yahoo github, and that's the
    version we incorporated/linearized.


    Yep. I had a great summer! ;-)



    As we Nigel and I discussed here, the jumbo  patch and an up-to-date
    CHANGES.txt provides almost all of the benefits we seek and allows all of us
    to get this done very quickly to focus on hadoop-0.22 and beyond.



    In my opinion here are the downsides to this plan:



    I agree there are downsides, I think I did point them out at the outset! :)


    - a mondo "merge" patch is a big pain when trying to do debugging. It may be
    sufficient for a user to look at CHANGES.txt, but I find myself using
    blame/log/etc on individual files to understand code lineage on a daily
    basis. If all of the merge shows up as a big patch it will be very difficult
    (at least the way I work with code) to help users debug issues or understand
    which JIRA a certain regression may have come from.



    Right, no question. Which is why I offered to do this as a background activity right after... this ensures that the source of truth is *always* a branch in Apache subversion.

    I feel that we could get a usable release out of door quickly for our users. Also, please remember that almost every patch we have committed is available on relevant jiras. I understand the devs have a problem and I feel we can bear with it for a little while. Again, I agree this isn't an ideal solution, I'm just trying to expedite the release for the users.



    To clarify my position a bit here - I definitely appreciate your
    volunteering to do the work, and wouldn't *block* the proposal as you've put
    it forth. I just think it will have limited utility for the community by
    being opaque (if contributed as a giant patch) and by not including the sync
    feature which is critical for a large segment of users. Given those
    downsides I'd rather see the effort diverted towards making a killer 0.22
    release that we can all jump on.



    Thanks for understanding.

    Again, I completely agree this isn't an ideal situation, but I do hope it has a bit more than *limited utility* for our end-users. Who knows, I maybe hopelessly deluded! *smile*

    Also, I'm trying to do exactly what you suggested - spend very little time on this so that everyone, including me, can focus on the future.

    thanks,
    Arun
    Given that Todd has already done the work to rebase the 0.20.104.3
    patch set on 0.20.2, and in a way that doesn't require one big change,
    and his patch set includes branch20-append which the HBase guys want
    an Apache release of wouldn't it make sense to go this route? What do
    others think? Seems better to have one 0.20.100 release than multiple
    ones for security and append.

    Thanks,
    Eli
  • Arun C Murthy at Jan 14, 2011 at 2:13 am

    On Jan 13, 2011, at 5:35 PM, Eli Collins wrote:
    Given that Todd has already done the work to rebase the 0.20.104.3
    patch set on 0.20.2, and in a way that doesn't require one big change,
    and his patch set includes branch20-append which the HBase guys want
    an Apache release of wouldn't it make sense to go this route? What do
    others think? Seems better to have one 0.20.100 release than multiple
    ones for security and append.

    My concern around 0.20.104.3 is that it has serious security holes
    including a root exploit that we have since fixed. I'm sure you guys
    are aware of them, Todd has helped to fix some.

    The version I'm offering to push to the community has fixed all of
    them, *plus* the added benefit of several stability and performance
    fixes we have done since 20.104.3, almost 10 internal releases. This
    is a battle tested and hardened version which we have deployed on
    40,000+ nodes. It is a significant upgrade on 0.20.104.3 which we
    never deployed. I'm pretty sure *some* users will find that valuable. ;)

    Also, I've offered to push individual patches as a background activity
    on a branch - that should suffice, no? Or, do you consider this a
    blocker?

    Again, my goal in this exercise is to get a stable, improved version
    of Hadoop into the hands of our users asap, and focus on 0.22 and
    beyond.

    thanks,
    Arun
  • Eli Collins at Jan 14, 2011 at 2:50 am

    On Thu, Jan 13, 2011 at 6:12 PM, Arun C Murthy wrote:
    On Jan 13, 2011, at 5:35 PM, Eli Collins wrote:

    Given that Todd has already done the work to rebase the 0.20.104.3
    patch set on 0.20.2, and in a way that doesn't require one big change,
    and his patch set includes branch20-append which the HBase guys want
    an Apache release of wouldn't it make sense to go this route?  What do
    others think? Seems better to have one 0.20.100 release than multiple
    ones for security and append.

    My concern around 0.20.104.3 is that it has serious security holes including
    a root exploit that we have since fixed. I'm sure you guys are aware of
    them, Todd has helped to fix some.
    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    The version I'm offering to push to the community has fixed all of them,
    *plus* the added benefit of several stability and performance fixes we have
    done since 20.104.3, almost 10 internal releases. This is a battle tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.
    Also, I've offered to push individual patches as a background activity on a
    branch - that should suffice, no? Or, do you consider this a blocker?
    Definitely not a blocker.
    Again, my goal in this exercise is to get a stable, improved version of
    Hadoop into the hands of our users asap, and focus on 0.22 and beyond.
    Agree, that's everyone's goal. My point is that a release that's
    already been re-based on 20.2, doesn't require a separate HBase
    release, and doesn't require you spend time on a background task to
    break up the big change into smaller ones seems like a faster way
    forward.

    Thanks,
    Eli
  • Arun C Murthy at Jan 14, 2011 at 4:23 am

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since
    20.104.3, it's over 6 months of development. The enhancements includes
    work on stability such as iterative ls, limits on JT to prevent single
    jobs/users from taking it down etc. and lots of bug-fixes to security.
    So, unfortunately the delta is pretty large.

    I'm working on a CHANGES.txt which should reflect all the changes i.e.
    bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of
    them,
    *plus* the added benefit of several stability and performance fixes
    we have
    done since 20.104.3, almost 10 internal releases. This is a battle
    tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm
    pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is
    that it's stable, tested and deployed at scale. Doing any more work
    merging 6 months of work at Yahoo (again, I guarantee it's a lot of
    work) will need a lots of cycles to validate, test and stabilize.

    I feel the alternative is a distraction for me, I'd rather work on 0.22.

    I can get 20+security+enhancements done very, very, quickly precisely
    because I don't have to spend cycles testing it.

    Does that make sense? Thanks for being patient and bearing with me...

    Arun
  • Tsz Wo \(Nicholas\), Sze at Jan 14, 2011 at 4:59 am
    Below are copied from http://httpd.apache.org/dev/release.html. Not sure if it
    helps.

    What power does the RM yield?
    Regarding what makes it into a release, the RM is the unquestioned authority. No
    one can contest what makes it into the release. The community will judge the
    release's quality after it has been issued, but the community can not force the
    RM to include a feature that they feel uncomfortable adding. Remember that this
    document is only a guideline to the community and future RMs - each RM may run a
    release in a different way. If you don't like what an RM is doing, start
    preparing for your own competing release.

    Nicholas
  • Eric Baldeschwieler at Jan 14, 2011 at 5:12 am
    Hi Eli,

    Thanks for the suggestion.

    +1 to nigel and arun's proposal.

    I completely support the idea of creating a version of 20 with append for HBASE. However, the append issue is very complicated and there does not exist any version of append that is certified against a workload as diverse as what this branch has been tested against. I think you are trying to cross too many streams here. If you have resources to help integrate any version of Hadoop 0.20 with append, package and test it, I fully support you doing so. But that effort is not aligned with the goal of this branch, which is to share a substantial amount of fully integrated and tested work. Members of the community have expressed interest in seeing this tested work get checked into Apache and I would like to share it. Mashing it up with other patches would invalidate months of testing, defeating the purpose of the exercise.

    If you are interested in integrating Append with this branch, why not create a 20.200 branch and do so?

    Unless you are vetoing the sharing of work as is on a branch (the purpose of the branch), I suggest we move on.

    Thanks,

    E14

    On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since
    20.104.3, it's over 6 months of development. The enhancements includes
    work on stability such as iterative ls, limits on JT to prevent single
    jobs/users from taking it down etc. and lots of bug-fixes to security.
    So, unfortunately the delta is pretty large.

    I'm working on a CHANGES.txt which should reflect all the changes i.e.
    bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of
    them,
    *plus* the added benefit of several stability and performance fixes
    we have
    done since 20.104.3, almost 10 internal releases. This is a battle
    tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm
    pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is
    that it's stable, tested and deployed at scale. Doing any more work
    merging 6 months of work at Yahoo (again, I guarantee it's a lot of
    work) will need a lots of cycles to validate, test and stabilize.

    I feel the alternative is a distraction for me, I'd rather work on 0.22.

    I can get 20+security+enhancements done very, very, quickly precisely
    because I don't have to spend cycles testing it.

    Does that make sense? Thanks for being patient and bearing with me...

    Arun
  • Nigel Daley at Jan 14, 2011 at 6:07 am
    I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough.

    Let's move on to 0.22 and beyond.

    Nige
    On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large.

    I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of them,
    *plus* the added benefit of several stability and performance fixes we have
    done since 20.104.3, almost 10 internal releases. This is a battle tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize.

    I feel the alternative is a distraction for me, I'd rather work on 0.22.

    I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it.

    Does that make sense? Thanks for being patient and bearing with me...

    Arun
  • Arun C Murthy at Jan 14, 2011 at 6:22 am
    *nod* Ok.

    Arun
    On Jan 13, 2011, at 10:08 PM, "Nigel Daley" wrote:

    I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough.

    Let's move on to 0.22 and beyond.

    Nige
    On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large.

    I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of them,
    *plus* the added benefit of several stability and performance fixes we have
    done since 20.104.3, almost 10 internal releases. This is a battle tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize.

    I feel the alternative is a distraction for me, I'd rather work on 0.22.

    I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it.

    Does that make sense? Thanks for being patient and bearing with me...

    Arun
  • Eli Collins at Jan 14, 2011 at 6:29 am
    Sorry for rattling you guys, definitely wasn't discussing a veto. I'm
    absolutely not opposed, just thought the alternative Todd raised was
    worth a couple emails since users have requested both security and
    append, and such a branch that includes both of those plus
    enhancements and substantial testing exists.

    Arun - I appreciate all the info, looking forward to the release.

    Thanks,
    Eli
    On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy wrote:
    *nod* Ok.

    Arun
    On Jan 13, 2011, at 10:08 PM, "Nigel Daley" wrote:

    I say just do it.  Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough.

    Let's move on to 0.22 and beyond.

    Nige
    On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github?  If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large.

    I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of them,
    *plus* the added benefit of several stability and performance fixes we have
    done since 20.104.3, almost 10 internal releases. This is a battle tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right?  Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize.

    I feel the alternative is a distraction for me, I'd rather work on 0.22.

    I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it.

    Does that make sense? Thanks for being patient and bearing with me...

    Arun
  • Todd Lipcon at Jan 14, 2011 at 6:37 am

    On Thu, Jan 13, 2011 at 10:29 PM, Eli Collins wrote:

    Sorry for rattling you guys, definitely wasn't discussing a veto. I'm
    absolutely not opposed, just thought the alternative Todd raised was
    worth a couple emails since users have requested both security and
    append, and such a branch that includes both of those plus
    enhancements and substantial testing exists.

    Arun - I appreciate all the info, looking forward to the release.
    Same here.

    Back to the patch queue for me! 0.22 here we come.

    -Todd

    On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy wrote:
    *nod* Ok.

    Arun
    On Jan 13, 2011, at 10:08 PM, "Nigel Daley" wrote:

    I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect,
    but it's good enough.
    Let's move on to 0.22 and beyond.

    Nige
    On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since
    20.104.3, it's over 6 months of development. The enhancements includes work
    on stability such as iterative ls, limits on JT to prevent single jobs/users
    from taking it down etc. and lots of bug-fixes to security. So,
    unfortunately the delta is pretty large.
    I'm working on a CHANGES.txt which should reflect all the changes i.e.
    bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of
    them,
    *plus* the added benefit of several stability and performance fixes
    we have
    done since 20.104.3, almost 10 internal releases. This is a battle
    tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm pretty
    sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is
    that it's stable, tested and deployed at scale. Doing any more work merging
    6 months of work at Yahoo (again, I guarantee it's a lot of work) will need
    a lots of cycles to validate, test and stabilize.
    I feel the alternative is a distraction for me, I'd rather work on
    0.22.
    I can get 20+security+enhancements done very, very, quickly precisely
    because I don't have to spend cycles testing it.
    Does that make sense? Thanks for being patient and bearing with me...

    Arun


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Arun C Murthy at Jan 14, 2011 at 6:37 am
    No worries. Thanks to both Eli & Todd for the discussion.

    I look forward to getting this done and moving ahead to 0.22 and beyond.

    thanks,
    Arun
    On Jan 13, 2011, at 10:29 PM, "Eli Collins" wrote:

    Sorry for rattling you guys, definitely wasn't discussing a veto. I'm
    absolutely not opposed, just thought the alternative Todd raised was
    worth a couple emails since users have requested both security and
    append, and such a branch that includes both of those plus
    enhancements and substantial testing exists.

    Arun - I appreciate all the info, looking forward to the release.

    Thanks,
    Eli
    On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy wrote:
    *nod* Ok.

    Arun
    On Jan 13, 2011, at 10:08 PM, "Nigel Daley" wrote:

    I say just do it. Eli said it wasn't a blocker. Sure it ain't perfect, but it's good enough.

    Let's move on to 0.22 and beyond.

    Nige
    On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

    On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:

    The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
    104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
    performance and stability fixes I think you're referring to, at least
    the ones that have been posted to Apache jira).

    Can you post a pointer to the version you're referring to, eg on
    github? If there isn't a big delta between it and the cdh3 patch set
    (which should have the 20-based patches from jira) perhaps you and
    Todd could easily merge in the delta to create 0.20.x?
    I can guarantee it will need work to merge the enhancements since 20.104.3, it's over 6 months of development. The enhancements includes work on stability such as iterative ls, limits on JT to prevent single jobs/users from taking it down etc. and lots of bug-fixes to security. So, unfortunately the delta is pretty large.

    I'm working on a CHANGES.txt which should reflect all the changes i.e. bug-fixes and enhancements.
    The version I'm offering to push to the community has fixed all of them,
    *plus* the added benefit of several stability and performance fixes we have
    done since 20.104.3, almost 10 internal releases. This is a battle tested
    and hardened version which we have deployed on 40,000+ nodes. It is a
    significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
    *some* users will find that valuable. ;)
    Definitely, but better to hit two birds with one stone right? Instead
    of a security + enhancements release and an append release we could
    have a single security + append + enhancements release and users don't
    have to choose.

    We are discussing two options:
    20 + security + enhancements
    20 + security + append

    I think the value we provide via 20+security+enhancements release is that it's stable, tested and deployed at scale. Doing any more work merging 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need a lots of cycles to validate, test and stabilize.

    I feel the alternative is a distraction for me, I'd rather work on 0.22.

    I can get 20+security+enhancements done very, very, quickly precisely because I don't have to spend cycles testing it.

    Does that make sense? Thanks for being patient and bearing with me...

    Arun
  • Stack at Jan 14, 2011 at 7:00 am
    (Man, it was looking good there for a second when 0.20.100 was about
    security+append!)

    Good luck w/ the release Arun.

    We might be following your 0.20.100 with a 0.20.200 append.

    St.Ack
  • Eric Baldeschwieler at Jan 14, 2011 at 7:17 am
    I'd love to see that!
    On Jan 13, 2011, at 10:59 PM, Stack wrote:

    (Man, it was looking good there for a second when 0.20.100 was about
    security+append!)

    Good luck w/ the release Arun.

    We might be following your 0.20.100 with a 0.20.200 append.

    St.Ack
  • Arun C Murthy at Jan 14, 2011 at 7:21 am

    On Jan 13, 2011, at 10:59 PM, Stack wrote:

    (Man, it was looking good there for a second when 0.20.100 was about
    security+append!)

    Good luck w/ the release Arun. Thanks!
    We might be following your 0.20.100 with a 0.20.200 append.
    Super!

    Arun
  • Ian Holsman at Jan 14, 2011 at 2:15 pm
    (with my Apache hat on)
    I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches).

    for the following reasons:

    1. It encourages bad behavior. We want discussion (and development) to happen on the lists, not in some office. By allowing these large code-dumps it condones this behavior, and we will likely see it again and again. Like it or not, this is not the apache model of open source governance.

    2. There is a risk that some code that is not in a JIRA or separate patch creeps in unwittingly. This isn't a major deal per se, but we don't really have the proper paper trail, or the documentation on what bug it fixed etc etc.

    3. Other groups (Facebook for example) are running with their own set of patches. They currently have the luxury of examining each individual patch to decide if they want to integrate it (and test it) in their environment. We are forcing them to do the work of finding the bits they want in this huge patch.

    4. By not including the append patch, we are making this release unusable for a large portion of our community who run hbase.

    5. It makes it very hard to test. While It makes me comfortable that it has gone through Yahoo!'s QA and is running in their environments, it doesn't mean that it will work in other organizations who have different workload mixes and software running on them. With one huge patch it makes it all or nothing.. either they take the code-drop and perform a large QA-integration effort, or they forgo the whole patch together.


    **BUT** we have both the Yahoo! & Cloudera guys happy to do it, and to spend their time doing it.. so I think having the code-drop will put us in a better place then where we are.


    BTW, I'd like to point out a discrepancy here:

    On another thread discussing hadoop-0.20-append as a separate branch, most people agreed that new features shouldn't be added to 0.20, now we have a major feature and we are all gung ho for it..

    --Ian
    On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote:

    On Jan 13, 2011, at 10:59 PM, Stack wrote:

    (Man, it was looking good there for a second when 0.20.100 was about
    security+append!)

    Good luck w/ the release Arun. Thanks!
    We might be following your 0.20.100 with a 0.20.200 append.
    Super!

    Arun

Related Discussions

People

Translate

site design / logo © 2022 Grokbase