Thanks for your help!

I had a look at the gridmix_config.xml file in the gridmix2 directory. However, I'm having difficulties to map the descriptions of the simulated jobs from the README-file
1) Three stage map/reduce job
2) Large sort of variable key/value size
3) Reference select
4) API text sort (java, streaming)
5) Jobs with combiner (word count jobs)

to the jobs names in gridmix_config.xml:

I would really appreciate any help, getting the right configuration! Which job do I have to enable to simulate a pipelined execution as described in "1) Three stage map/reduce job"?


Am 23.02.2011 um 04:01 schrieb Shrinivas Joshi:
I am not sure about this but you might want to take a look at the GridMix config file. FWIU, it lets you define the # of jobs for different workloads and categories.


On Tue, Feb 22, 2011 at 10:46 AM, David Saile wrote:
Hello everybody,

I am trying to benchmark a Hadoop-cluster with regards to throughput of pipelined MapReduce jobs.
Looking for benchmarks, I found the "Gridmix" benchmark that is supplied with Hadoop. In its README-file it says that part of this benchmark is a "Three stage map/reduce job".

As this seems to match my needs, I was wondering if it possible to configure "Gridmix", in order to only run this job (without the rest of the "Gridmix" benchmark)?
Or do I have to build my own benchmark? If this is the case, which classes are used by this "Three stage map/reduce job"?

Thanks for any help!


Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupcommon-user @
postedFeb 22, '11 at 4:47p
activeFeb 24, '11 at 11:28a

2 users in discussion

David Saile: 2 posts Shrinivas Joshi: 1 post



site design / logo © 2022 Grokbase