FAQ
Hadoop uses an InputFormat class to parse files and generate key,
value pairs for your Mapper. An InputFormat is any class which extends
the base abstract class:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html

The default InputFormat parse text files generating keys which are
byte offsets and values which are complete lines of text:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html

You can write your own InputFormat and configure your job to use it by
calling setInputFormat() on your Job before submitting it:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/Job.html#setInputFormatClass(java.lang.Class)

Hope that helps.

-Joey

P.S. I moved this over to the mapreduce-user alias since it's
MapReduce specific.
On Thu, May 5, 2011 at 7:31 AM, praveenesh kumar wrote:
Hi,

As we know hadoop mapper takes input as (Key,Value) pairs and generate
intermediate (Key,Value) pairs and usually we give input to our Mapper as a
text file.
How hadoop understand this and parse our input text file into (Key,Value)
Pairs

Usually our mapper looks like  --
*public* *void* map(LongWritable key, Text value,OutputCollector<Text, Text>
outputCollector, Reporter reporter) *throws* IOException {

String word = value.toString();

//Some lines of code

}

So if I pass any text file as input, it is taking every line as VALUE to
Mapper..on which I will do some processing and put it to OutputCollector.
But how hadoop parsed my text file into ( Key,Value ) pair and how can we
tell hadoop what (key,value) it should give to mapper ??

Thanks.


--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedMay 5, '11 at 11:40a
activeMay 5, '11 at 11:40a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Joey Echeverria: 1 post

People

Translate

site design / logo © 2022 Grokbase