We recently did some experiment on mapreduce job scheduling and found that
sometimes there were 2 jobs running on the same machine and each of them ran
very slowly. We used to think that 2nd job will wait for the 1st freeing the
slave machine occupied and then began to run and seems that this is wrong.
Our questions are:
(1) How does this scenario happen? Is it because that there's a threshold
about on workload and if a slave machine doesn't reach the threshold, then
it will carry new task ignoring that there's other task running on it
(2) If (1) is true, how can we avoid it? If (1) is not true, then what's
the reason of this scenario and how to avoid it?
Thanks very much in advance. J
One is never too old to learn. ^^