FAQ
Which workloads are used for serious benchmarking of Hadoop clusters? Do you
care about any of the following workloads :
TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

Thanks,
-Shrinivas

Search Discussions

  • Ted Dunning at Feb 18, 2011 at 10:01 pm
    MalStone looks like a very narrow benchmark.

    Terasort is also a very narrow and somewhat idiosyncratic benchmark, but it
    has the characteristic that lots of people use it.

    You should add PigMix to your list. There java versions of the problems in
    PigMix that make a pretty good set of benchmarks independent of Pig itself.
    On Fri, Feb 18, 2011 at 1:32 PM, Shrinivas Joshi wrote:

    Which workloads are used for serious benchmarking of Hadoop clusters? Do
    you
    care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
    sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas
  • Jim Falgout at Feb 18, 2011 at 10:27 pm
    We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the data and the queries, if not the query generator. There is a Jira issue in Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I don't remember the issue number offhand.

    -----Original Message-----
    From: Shrinivas Joshi
    Sent: Friday, February 18, 2011 3:32 PM
    To: common-user@hadoop.apache.org
    Subject: benchmark choices

    Which workloads are used for serious benchmarking of Hadoop clusters? Do you care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench, sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas
  • Shrinivas Joshi at Feb 18, 2011 at 10:35 pm
    Thanks Jim. MRBench mentioned in this paper
    http://dcslab.snu.ac.kr/~khjeon/papers/2008/icpads_mrbench.pdf looks like a
    map/reduce port of TPC-H workload. BTW, MRBench mentioned in the above paper
    and the one in mapred/src/test/mapred/org/apache/hadoop/mapred/MRBench.java
    look different to me. Is that a fair statement?

    -Shrinivas
    On Fri, Feb 18, 2011 at 4:27 PM, Jim Falgout wrote:

    We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
    data and the queries, if not the query generator. There is a Jira issue in
    Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I
    don't remember the issue number offhand.

    -----Original Message-----
    From: Shrinivas Joshi
    Sent: Friday, February 18, 2011 3:32 PM
    To: common-user@hadoop.apache.org
    Subject: benchmark choices

    Which workloads are used for serious benchmarking of Hadoop clusters? Do
    you care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
    sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas
  • Ted Dunning at Feb 18, 2011 at 10:36 pm
    I just read the malstone report. They report times for a Java version that
    is many (5x) times slower than for a streaming implementation. That single
    fact indicates that the Java code is so appallingly bad that this is a very
    bad benchmark.
    On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout wrote:

    We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
    data and the queries, if not the query generator. There is a Jira issue in
    Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I
    don't remember the issue number offhand.

    -----Original Message-----
    From: Shrinivas Joshi
    Sent: Friday, February 18, 2011 3:32 PM
    To: common-user@hadoop.apache.org
    Subject: benchmark choices

    Which workloads are used for serious benchmarking of Hadoop clusters? Do
    you care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
    sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas
  • Konstantin Boudnik at Feb 18, 2011 at 10:51 pm

    On Fri, Feb 18, 2011 at 14:35, Ted Dunning wrote:
    I just read the malstone report.  They report times for a Java version that
    is many (5x) times slower than for a streaming implementation.  That single
    fact indicates that the Java code is so appallingly bad that this is a very
    bad benchmark.
    Slow Java code? That's funny ;) Running with Hotspot on by any chance?
    On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout wrote:

    We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
    data and the queries, if not the query generator. There is a Jira issue in
    Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I
    don't remember the issue number offhand.

    -----Original Message-----
    From: Shrinivas Joshi
    Sent: Friday, February 18, 2011 3:32 PM
    To: common-user@hadoop.apache.org
    Subject: benchmark choices

    Which workloads are used for serious benchmarking of Hadoop clusters? Do
    you care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
    sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas
  • Shrinivas Joshi at Feb 21, 2011 at 8:40 pm
    I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo
    etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of
    more interest to Yahoo.

    I would appreciate if someone can comment more on this.

    Thanks,
    -Shrinivas
    On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Boudnik wrote:
    On Fri, Feb 18, 2011 at 14:35, Ted Dunning wrote:
    I just read the malstone report. They report times for a Java version that
    is many (5x) times slower than for a streaming implementation. That single
    fact indicates that the Java code is so appallingly bad that this is a very
    bad benchmark.
    Slow Java code? That's funny ;) Running with Hotspot on by any chance?
    On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout <jim.falgout@pervasive.com
    wrote:
    We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
    data and the queries, if not the query generator. There is a Jira issue
    in
    Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I
    don't remember the issue number offhand.

    -----Original Message-----
    From: Shrinivas Joshi
    Sent: Friday, February 18, 2011 3:32 PM
    To: common-user@hadoop.apache.org
    Subject: benchmark choices

    Which workloads are used for serious benchmarking of Hadoop clusters? Do
    you care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
    sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas
  • Konstantin Boudnik at Feb 22, 2011 at 1:30 pm
    Adding Roman Shaposhnik to the list who's "tasked" with benchmarking @Cloudera
    On Mon, Feb 21, 2011 at 12:39, Shrinivas Joshi wrote:
    I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo
    etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of
    more interest to Yahoo.

    I would appreciate if someone can comment more on this.

    Thanks,
    -Shrinivas
    On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Boudnik wrote:
    On Fri, Feb 18, 2011 at 14:35, Ted Dunning wrote:
    I just read the malstone report.  They report times for a Java version
    that
    is many (5x) times slower than for a streaming implementation.  That
    single
    fact indicates that the Java code is so appallingly bad that this is a
    very
    bad benchmark.
    Slow Java code? That's funny ;) Running with Hotspot on by any chance?
    On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout
    wrote:
    We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
    data and the queries, if not the query generator. There is a Jira issue
    in
    Hive that discusses the TPC-H "benchmark" if you're interested. Sorry,
    I
    don't remember the issue number offhand.

    -----Original Message-----
    From: Shrinivas Joshi
    Sent: Friday, February 18, 2011 3:32 PM
    To: common-user@hadoop.apache.org
    Subject: benchmark choices

    Which workloads are used for serious benchmarking of Hadoop clusters?
    Do
    you care about any of the following workloads :
    TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench,
    NNBench,
    sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.

    Thanks,
    -Shrinivas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 18, '11 at 9:32p
activeFeb 22, '11 at 1:30p
posts8
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase