I understand that given a file, the file is split across 'n' mapper
instances, which is the normal case.
The scenario i have is :
1. Two files which are not totally identical in terms of number of columns
(but have data that is similar in a few columns) need to be processed and
after computation a single output file has to be generated.
Note : CV - computedvalue
File1 belonging to one dataset has data for :
File2 belonging to another dataset has data for :
Computation to be carried out on these two files is :
And the final emitted output file should have data in the sequence:
The idea is to have two mappers (not instances) run on each of the file, and
a single reducer that emits the final result file.
On Wed, Sep 7, 2011 at 2:40 PM, Harsh J wrote:
Yes. But, isn't that how it is normally? What makes you question this
On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat wrote:
Is it possible to have multiple mappers where each mapper is
operating on a different input file and whose result (which is a key value
pair from different mappers) is processed by a single reducer?