Grokbase Groups Pig dev April 2008
FAQ
In PigMapReduce.run(RecordReader input, OutputCollector output, Reporter
reporter), as I can see, Pig does create its own OutputCollector and write
output to its own files (using PigWriter).
How does the shuffle process work if the files aren't created from the
outputCollector supplied in run(RecordReader input, OutputCollector output,
Reporter reporter)? Do we just put the output files to the location where
shuffle expects?

Thanks for explanation in advance,
Pi

Search Discussions

  • Benjamin Reed at Apr 3, 2008 at 2:40 pm
    The PigWriter is only used if we are doing a map only job. In setupMapPipe the
    writer is only created if we aren't doing grouping. (If we aren't grouping,
    there will not be a reduce. No reduce -> no shuffle.) If we are grouping, we
    doing do a PigWriter and the output of map goes through the normal collector
    where it can be sorted/combined/shuffled/sorted/reduced.

    ben
    On Thursday 03 April 2008 06:18:53 pi song wrote:
    In PigMapReduce.run(RecordReader input, OutputCollector output, Reporter
    reporter), as I can see, Pig does create its own OutputCollector and write
    output to its own files (using PigWriter).
    How does the shuffle process work if the files aren't created from the
    outputCollector supplied in run(RecordReader input, OutputCollector output,
    Reporter reporter)? Do we just put the output files to the location where
    shuffle expects?

    Thanks for explanation in advance,
    Pi

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedApr 3, '08 at 1:19p
activeApr 3, '08 at 2:40p
posts2
users2
websitepig.apache.org

2 users in discussion

Pi song: 1 post Benjamin Reed: 1 post

People

Translate

site design / logo © 2022 Grokbase