Hi everyone,
Currently I got a MapReduce program that soring input records and Map-Reduce them to output records with priority information for each of them. So far the program is running on 1 mainnode and 3 datanodes.
And I got data something like following:
--------------------------------------
number of records: 1000000 records
time to process: 100 seconds
input bytes : 20MB
number of datanodes: 3
-------------------------------------
I am wondering if I could make some assumption like "giving me 2000000 records" and the program could finish that in "200 seconds" ?
Just any kind of feasibility of scability will be helpful, as it is important to my analysis on the master thesis.
Any idea is well appreciated!
Thanks,
-Kun