All,
I have been running terasort on a 480 node hadoop cluster. I have also collected cpu,memory,disk, network statistics during this run. The system stats are quite intersting. I can post it when I have put them together in some presentable format ( if there is interest.). However while looking at the data, I noticed something interesting.
I thought, intutively, that the all the systems in the cluster would have more or less similar behaviour ( time translation was possible) but the overall graph would look the same.,
Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network etc. activity when the sort was running. Strangeley enough, it was not so., Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O.
Is that normal and/or expected? SHouldn't all the nodes be utilized in more or less manner over the length of the run?
I generated the data forf the sort using teragen. ( 128MB bloick size, replication =3).
I would also be interested in other people timings of sort. Is there some place where people can post sort numbers ( not just the record.)
I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( Some logistical issues abt. posting them tonight)
I am using CDH3B3, even though I think this is not specific to CDH3B3.
Sorry for the cross post.
Raj