Hi,
I am using now hadoop version 0.21.0.
AYK, this version supports to use MultipleOutputs class to reduce
outputs in several files.
but, in my case, there is nothing in files. (just empty files)
here is my code.
main class)
....
MultipleOutputs.addNamedOutput(job,
FeederConfig.INSERT_OUTPUT_NAME, TextOutputFormat.class, Text.class,
Text.class);
MultipleOutputs.addNamedOutput(job,
FeederConfig.DELETE_OUTPUT_NAME, TextOutputFormat.class, Text.class,
Text.class);
MultipleOutputs.addNamedOutput(job,
FeederConfig.UPDATE_OUTPUT_NAME, TextOutputFormat.class, Text.class,
Text.class);
MultipleOutputs.addNamedOutput(job,
FeederConfig.NOTCHANGE_OUTPUT_NAME, TextOutputFormat.class, Text.class,
Text.class);
....
mapper)
nothing to do for this job.
just write keys and values
reducer)
...
multipleOutputs.write(getOutputFileName(code), new Text(key), new
Text(value));
context.write(new Text(key), new Text(value));
...
private String getOutputFileName(String code) {
String retFileName = "";
if (code.equals(EPComparedResult.INSERT.getCode())) {
retFileName = FeederConfig.INSERT_OUTPUT_NAME;
} else if (code.equals(EPComparedResult.DELETE.getCode())) {
retFileName = FeederConfig.DELETE_OUTPUT_NAME;
} else if (code.equals(EPComparedResult.UPDATE.getCode())) {
retFileName = FeederConfig.UPDATE_OUTPUT_NAME;
} else {
retFileName = FeederConfig.NOTCHANGE_OUTPUT_NAME;
}
return retFileName;
}
...
result)
$ hadoop fs -ls output
11/02/07 13:09:13 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
11/02/07 13:09:13 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
Found 4 items
-rw-r--r-- 2 irteam supergroup 0 2011-01-31 19:59
/user/test/output/DELETE-r-00000
-rw-r--r-- 2 irteam supergroup 0 2011-01-31 19:59
/user/test/output/INSERT-r-00000
-rw-r--r-- 2 irteam supergroup 0 2011-01-31 18:53
/user/test/output/_SUCCESS
-rw-r--r-- 2 irteam supergroup 649622 2011-01-31 18:53
/user/test/output/part-r-00000