I have some purely subjective experience. I invite anyone with empirical
evidence to pipe up if possible.
It can be used, but there are a couple of current important caveats:
1] If your maps have a tremendous amount of output, the TaskTrackers will
start producing OutOfMemory exceptions (and depending on which version
you're using, subsequently hang).
2] In our experience, you MUST compile native compression libraries, and
include those in your distribution. If you use Java's compression, you will
get wildly unpredictable performance, ranging from slow to "why do we even
bother with computers!?"
On 8/2/07 08:53, "Emmanuel" wrote:
I notice that the process reduce > copy is very slow.
I would like to configure hadoop to compress the map ouput.
I'm wondering if someone already use it or if you have some statistics about
Any advice or feedback are welcome.
Marco Nicosia - Kryptonite Grid
Systems, Tools, and Services Group