Java's features such as garbage collection, run time array index checking,
cleaner syntax (no pointers) make it a good language for Hadoop. One can
develop MapReduce apps faster and maintain code easier than in case of
C/C++, allowing clients to focus on their business logic/use cases.
For a fairly high level implementation of MapReduce which uses clusters of
COTS hardware as compute nodes, the main bottleneck in most applications
will be due to network I/O. In such cases, the speed advantage of C/C++ over
Java seems less attractive. You will be doing more work shuffling packets
C/C++ applications are difficult to port, and are too system specific. Let's
say you are trying to optimize a certain portion of your mapper code by
pointer manipulations. Such operations are inherently error prone because of
their proximity to the hardware. JVM alleviates most of these issues, you
don't have to think about what is the number of bytes for a double, your
code will be portable across 32 bit or 64 bit architectures, across all
endian systems etc.
Even with Java's safety and comfort, debugging distributed Hadoop MapReduce
apps are a pain in the butt. Just imagine what would happen if you had C/C++
where you are buried in Seg Faults.
I would say that you can use C/C++ to implement MapReduce, if you were using
multicore/GPU's as your underlying platform where you know the hardware
initimately and are free from network I/O latency.
On Tue, Aug 16, 2011 at 12:05 PM, Bill Graham wrote:
There was a fairly long discussion on this topic at the beginning of the
On Mon, Aug 15, 2011 at 9:00 PM, Chris Song wrote:
Why hadoop should be built in JAVA?
For integrity and stability, it is good for hadoop to be implemented in
But, when it comes to speed issue, I have a question...
How will it be if HADOOP is implemented in C or Phython?