a surprise about Pipes!
Thanks for the data
- Aaron
Owen O'Malley wrote:
I set up a little benchmark on a 39 node cluster to sort 40gb of random
text data (generated by RandomTextWriter using key length: 1-10 words
and value length: 0-200 words, data uncompressed). The runtimes in
minutes are:
Java: 4:22
C++ (Pipes): 3:50
Streaming: 4:44
I was surprised to find that Pipes out performed Java, even with the
extra process. I suspect it was because of the buffering between the
input and output of Pipes.
-- Owen
text data (generated by RandomTextWriter using key length: 1-10 words
and value length: 0-200 words, data uncompressed). The runtimes in
minutes are:
Java: 4:22
C++ (Pipes): 3:50
Streaming: 4:44
I was surprised to find that Pipes out performed Java, even with the
extra process. I suspect it was because of the buffering between the
input and output of Pipes.
-- Owen