I have a program that analyzes text from a CSV and on it I have 9 operators
or functions, so in my normal java program in the main class y call these
functions in serial mode (just when the first function finishes the second
starts and so on), my actual solution was to put all these functions on one
map function, some like the following:
static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{
//declaration of operators objects
Operator1 op1 = new Operator1();
...
...
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{
//convert the value (row of the csv) to
string
String line = value.toString();
//begin with the process of operators
op1.process(line );
String[] sentences = op1.getSentences();
op2.process(sentences);
String[][] tokens = toke.getTokens();
op3...
op4...
//final result is in a matrix and then write results
for( int k = 0 ; k < matrix.length ; k++ ){
for( int j = 0; j < matrix[k].length; j++
){
context.write(x,y); //results
}
}
}
}
when I test it vs the java normal program it reduces the time (java program
30 minutes, hadoop 6 minutes) but when I compare it with a much bigger csv
and vs a cascading implementation the cascading time was a lot better (28
minutes vs 1 hour and 30 minutes!!)
My question is if its fine to put all these functions (9 operators) on a
single map?
Its a better way to do it?
Thanks
--
*Cornelio*
*Cornelio*