FAQ
Hi all,

I have some simple questions that I would like answered to get a
better understanding of what Hadoop/Mapreduce is.

I noticed in the code of the WordCount example:

conf.setInputPath(new Path((String) other_args.get(0)));
conf.setOutputPath(new Path((String) other_args.get(1)));

Does working with hadoop always involve having a set of files in one
directory as input and resulting in a set of files in one directory as
output? Are the names of the files in input and output directory
insignificant?

How do you handle the end result of a set of Mapreduce tasks? If the
result is a set of files do you have to use another Mapreduce task
that doesn't write to file (to the DFS for example) but to a simple
String to display something on a webpage for example? Or do you have
to read the resulting files directly.

If my gigantic set of input files keeps growing, do I have
re-mapreduce to whole input set to get a single result set? Or can I
just Mapreduce the incremental part and use another Mapreduce task to
create a single result of x number of results sets?

Thanks for any help!

--

regards,

Jeroen

Search Discussions

  • Peter W. at Jun 27, 2007 at 8:17 pm
    Jeroen,

    I'm also a noob but making slight progress.

    JobConf will always send Mapreduce output to a specified Path
    but I think if you setOutputValueClass(Text.class) it's possible
    to later change the destination from a file to stream?

    Or, run a separate task with only one reduce which should
    write simplified output to one file, then open as stream.

    Recent threads mention there is no chaining of tasks so
    shell scripting is another way to combine file results.

    Hope that helps,

    Peter W.

    On Jun 27, 2007, at 6:16 AM, Jeroen Verhagen wrote:

    Does working with hadoop always involve having a set of files in one
    directory as input and resulting in a set of files in one directory as
    output? Are the names of the files in input and output directory
    insignificant?

    How do you handle the end result of a set of Mapreduce tasks? If the
    result is a set of files do you have to use another Mapreduce task
    that doesn't write to file (to the DFS for example) but to a simple
    String to display something on a webpage for example? Or do you have
    to read the resulting files directly.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 27, '07 at 1:16p
activeJun 27, '07 at 8:17p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Peter W.: 1 post Jeroen Verhagen: 1 post

People

Translate

site design / logo © 2021 Grokbase