I've seen a few threads about counters, PigStats, Elephant-Bird's stats
utility class, etc.
http://www.mail-archive.com/pig-user@hadoop.apache.org/msg00900.html
http://www.mail-archive.com/user%40pig.apache.org/msg00034.html
Has any progress been made on this or to provide a comprehensive
stats/counter mechanism?
What I'm looking to do is three-fold:
1) Get stats on the number of records that are filtered out when using the
FILTER operation
2) Get stats on the number of records dropped/not loaded in a LOAD function
(and actual copies of the records/rows from the file for later evaluation)
3) Output my own stats from a Pig job (without resorting to writing my own
UDF and pushing things into PigStats using the Elephant-Bird utility)
If any of this is possible, it would be great to see some examples or
documentation. I would hate to go to raw Hadoop MR code just to get to
counters.
Thanks,
Josh