Grokbase Groups Pig user October 2008
I propose that pig develop a standard set of benchmark queries that can
be run from release to release to measure pig's (hopefully improving)
performance over time. This would be similar in nature to hadoop's
GridMix (see
and This set should be
relatively small (probably under 10). But it should cover a range of
operations being done by pig users.

So, if you have queries that you think would be good candidates and that
you can share (or obfuscate and then share), please do so. In addition
to the query, please give some idea of the type of data it runs over.
In particular we need to know how much data, how many fields are in your
data, the cardinality and distribution of any fields used as a group,
cogroup, or sort key.


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 2, '08 at 7:59p
activeOct 2, '08 at 7:59p

1 user in discussion

Alan Gates: 1 post



site design / logo © 2023 Grokbase