I have some simple questions that I would like answered to get a
better understanding of what Hadoop/Mapreduce is.
I noticed in the code of the WordCount example:
conf.setInputPath(new Path((String) other_args.get(0)));
conf.setOutputPath(new Path((String) other_args.get(1)));
Does working with hadoop always involve having a set of files in one
directory as input and resulting in a set of files in one directory as
output? Are the names of the files in input and output directory
How do you handle the end result of a set of Mapreduce tasks? If the
result is a set of files do you have to use another Mapreduce task
that doesn't write to file (to the DFS for example) but to a simple
String to display something on a webpage for example? Or do you have
to read the resulting files directly.
If my gigantic set of input files keeps growing, do I have
re-mapreduce to whole input set to get a single result set? Or can I
just Mapreduce the incremental part and use another Mapreduce task to
create a single result of x number of results sets?
Thanks for any help!