I am working of implementing some machine learning algorithms using Map Red.
I want to know that If I have data that takes 5-6 hours to train on a normal
machine. Will putting in 2-3 more nodes have an effect? I read in the yahoo
hadoop tutorial.
"Executing Hadoop on a limited amount of data on a small number of nodes may
not demonstrate particularly stellar performance as the overhead involved in
starting Hadoop programs is relatively high. Other parallel/distributed
programming paradigms such as MPI (Message Passing Interface) may perform
much better on two, four, or perhaps a dozen machines."
I have at my disposal 3 laptops each with 4 G RAM and 150G hard disk space
each... I have 600M of training data....
--
View this message in context: http://www.nabble.com/How-many-nodes-does-one-man-want--tp22733399p22733399.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
View this message in context: http://www.nabble.com/How-many-nodes-does-one-man-want--tp22733399p22733399.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.