On 23/03/11 15:32, Michael Segel wrote:
It sounds like you're only using Hadoop and have no intentions to really get into the internals.
I'm like most admins/developers/IT guys and I'm pretty lazy.
I find it easier to set up the yum repository and then issue the yum install hadoop command.
The thing about Cloudera is that they do back port patches so that while their release is 'heavily patched'.
But they are usually in some sort of sync with the Apache release. Since you're only working with HDFS and its pretty stable, I'd say go with the Cloudera release.
to be fair, the Y! version of 0.20.x has all the backportings to do with
scale, on a large cluster I'd pick up that one, with the understanding
that if you have support problems, you can't pay Cloudera to hold your hand.
If you have any plans to get involved in the Hadoop & friends code, to
move from a user to contributor, you should get with the official
releases. Similarly, if you have some problem and want to file a bug,
you should get the latest official release and test with that, as
-that will be the first question on the bug report "is it still there?"
-you'll need to help debug it.
Going forward, there are plans to do RPM and ideally deb artifacts of
0.22 and later versions of Hadoop, making them easier to install. This
still leaves the question of who supports it, the answers being you, or
anyone you pay to, that being the way open source works