Even with the work on hadoop-0.22 (trunk) starting in earnest it is
fairly obvious, given our past history, that it will take a while for
us to get it stable and deployable - for e.g. it took us nearly 6
months to deploy hadoop-0.20.
In the interim I'd like to propose we push a hadoop-0.20-security
release off the Yahoo! patchset (http://github.com/yahoo/hadoop-
common). This will ensure the community benefits from all the work
done at Yahoo! for over 12 months *now*, and ensures that we do not
have to wait until hadoop-0.22 which has all of these patches.
Some salient aspects:
a) Full-fledged security implementation deployed at scale (4000 nodes)
b) Lots of work on the stabilizing and optimizing the NameNode and
JobTracker for over 12 months. This has been critical in deploying
Hadoop at scale i.e. clusters of 4000 nodes. For e.g. we have a 50%
improvement in CPU utilization on the JobTracker vis-a-vis the
c) Several new features in the scheduler (CapacityScheduler), Map-
Reduce framework, better support for multi-tenancy etc.
d) Several performance and stability improvements to the system e.g.
iterative ls, robustness against rogue clients/jobs/users etc.
Also, given the huge number of features and enhancements I'd like to
propose we create a new 0.20-security branch and commit the Yahoo
patchset there for the release.
This has been proposed earlier by Doug and did not get far due to
concerns about the effect this would have on development on trunk.
However, I believe, we have a case for demonstrable progress on trunk
now, and it would be useful to have an interim, fully-tested Apache
Hadoop release available to the community.
Conceivably, one could imagine a Hadoop Security + Append release
soon after. At this point a Hadoop Security release alone would add
tremendous value for the reasons above. Presently we would like to get
this release out quickly to focus the majority of our efforts on trunk.