PigStats comes with Pig 0.8 (just released), this is probably why there is little material about it :)
You can use PigStats in two ways:
First, in a java program, you can invoke your Pig script through the new PigRunner API which takes the same argument as the Main class but returns a PigStats object. Some of the interesting stats you get are input/output records, the mapreduce job graph, the aliases, features, and counters associated with each job.
Second, Pig 0.8 added several interesting properties to the Hadoop Job XML file (in particular, a script id, and job parent ids).
So it is now easier for Pig users to correlate their Pig script with Hadoop Job tracker files. In addition, piggybank has a new loader (HadoopJobHistoryLoader) that is used to load Hadoop Job history files. After the files are loaded, users can use the power of Pig Latin to collect and analyze the Pig usage on their clusters.
PigStats is a new feature. Please let us know if you have any issues and suggestions.
On 12/16/10 1:20 PM, "felix gao" wrote:
My company uses pig a lot and I been looking for some examples on how to use
pigstats and there seems to be very little material about it. Can someone
point me to some useful references on how to use this and what are some of
the interesting stats that can be get out of it.