On Sat, Jun 4, 2011 at 1:46 AM, Andrew Purtell wrote:
This is not discouraging. :-)
HBasers patch CDH because trunk -- anything > 0.20 actually -- is not
trusted by consensus if you look at all of the production deployments. Does
ANYONE run trunk under anything approaching "production"? And trunk/upstream
has a history of ignoring any HBase specific concern. So the use of and
trading of patches will probably continue for a while, maybe forever.
Right - I wasn't suggesting that you run trunk in production as of yet. But
there has been very little activity in terms of HBase people running trunk
in dev/test clusters in the past. Stack has done some awesome work here in
the last few weeks, so that should open it up for some more people to jump
I agree that HBase has been treated as a second-class citizen in recent
years from HDFS's performance, but I think that has changed. All of the
major HDFS contributors now have serious stakes in HBase, and so long as
there are tests with sufficient testing that apply against trunk, I don't
see a reason they wouldn't be included.
Part of the problem is the expectation that any patch provided against
trunk may generate months of back and forth, as we have seen, which presents
difficulities to a potential contributor who does not work on e.g. HDFS
matters full time. Alternatively it may pick up a committer as sponsor and
then be vetoed by Yahoo because they're mad at Cloudera over some unrelated
issue and a patch appears to have a Cloudera sponsor and/or or vice versa.
Now, that situation I describe _is_ discouraging. It's not enough to say
that we must contribute through trunk. Trunk needs to earn back our trust.
Yes, there have been some unfortunate things in the past. There have also
been some half-finished or untested patches proposed, and you can't blame
HDFS folks for not taking a big patch that doesn't have a lot of confidence
I've been thinking about this this afternoon, and have an idea. It may prove
to be an awful one, but maybe it's a good one, only time will tell :) I'll
create a branch off of HDFS trunk specifically for HBase performance work.
We can commit these "90% done" patches there, which will make it easier for
others to test and gain confidence. Branches also can make it easier to
maintain patches over time with a changing trunk.
How does this sound to the HBase community? If it seems like a good idea,
*and* there are some people who would be willing to set it up on some small
dev clusters and run load tests, I'll move forward with it.
I believe I recently saw discussion that append should be removed or
disabled by default on 0.22 or trunk. Did you see anything like this? If I
am mistaken, fine. If not, this is going in the wrong direction, for
Not sure what you're referring to - I don't remember any discussion like
Software Engineer, Cloudera