FAQ

On 26 November 2012 21:25, Radim Kolar wrote:
The main "feature" is that when you get the +1 vote you yourself get to
deal with the grunge work of apply
patches to one or more svn branches, resyncing that with the git branches
you inevitably do your own work on.
no, main feature is major speed advantage. It takes forever to get
something committed. I was annoyed with apache nutch last year and forked
it, here is snapshot from forked codebase http://forum.lupa.cz/index.**
php?action=dlattach;topic=**1674.0;attach=3439<http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439>now its 160k LOC on top of apache nutch 1.4. If i worked with these guys,
it would be never done because it took them 4 months to get 200 lines patch
reviewed.
I'm sorry you missed the bit in my slides where I emphasised that
review-then-commit is the same rule even if you are a committer. It's not
like you can suddenly put changes in without having gone through the JIRA
circuit. I also tried to explain why the project is so rigorous:

the value of Hadoop is the data stored in HDFS.

Imagine someone could put some minor bit of tuning in there that speeded up
their cluster slightly, but increased the risk of data loss. Or something
to the MR layer that introduced enough of a performance overhead that
someone like facebook would have to buy an extra rack of machines. That's
why there's a review process. Try getting a patch into ext4 or the linux
kernel scheduler and see if its any easier.


Hadoop has huge backlog of patches, you need way more committers then you
have today. I simply could not assign person to working on hadoop fulltime
because if he submits mere 5 patches per day, you will be never able to
process them.
The bottleneck is not #of committers, it is #of people who understand
hadoop well enough to be able to provide adequate reviews -and who have the
time to review patches thoroughly -especially the big ones. I think that is
a real problem.

Your current development process fail to scale. What are your plans for
moving development faster?
I don't disagree -again, in my slides I tried to make some proposals.


1. even if the source stays in SVN, we could use git-style work of pull
requests and gerrit/github code reviewing
2. better distributed development events, where a group of people can go
online via a google+ hangout and work together on a specific problem in
real-time.
3. more rigorous "review sundays" or similar -where we go through the
review queue on a free weekend day and see what can be done about them.
4. Some kind of mentorship process to work with people on larger
projects. Again, time is the constraint here.

If you've got some other ideas, it'd be good to know them.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupcommon-dev @
categorieshadoop
postedNov 21, '12 at 3:04p
activeNov 28, '12 at 10:13a
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Radim Kolar: 2 posts Steve Loughran: 2 posts

People

Translate

site design / logo © 2021 Grokbase