all the data from that file will go to the same Mapper. But it doesn't
say -how- the data goes from the file to the mapper. That's controlled
by the RecordReader instance returned by TextInputFormat. You'll need to
write a RecordReader that slurps the entire file in at once.
- Aaron
Ming Yang wrote:
I just did a test by simply extending from TextInputFormat
and override isSplitable(FileSystem fs, Path file) to always
returning false. However, in my mapper, I still see the input
file gets splitted into lines. I did set the input format in
JobConfiguration and isSplitable(...) -> false did get called
during job execution. Is there anything I did wrong or
this is the behavior I should be expecting?
Thanks,
Ming
2007/10/15, Ted Dunning <tdunning@veoh.com>:
and override isSplitable(FileSystem fs, Path file) to always
returning false. However, in my mapper, I still see the input
file gets splitted into lines. I did set the input format in
JobConfiguration and isSplitable(...) -> false did get called
during job execution. Is there anything I did wrong or
this is the behavior I should be expecting?
Thanks,
Ming
2007/10/15, Ted Dunning <tdunning@veoh.com>:
That doesn't quite do what the poster requested. They wanted to pass the
entire file to the mapper.
That requires a custom input format or an indirect input approach (list of
file names in input).
entire file to the mapper.
That requires a custom input format or an indirect input approach (list of
file names in input).
On 10/15/07 9:57 AM, "Rick Cox" wrote:
You can also gzip each input file. Hadoop will not split a compressed
input file (but will automatically decompress it before feeding it to
your mapper).
rick
You can also gzip each input file. Hadoop will not split a compressed
input file (but will automatically decompress it before feeding it to
your mapper).
rick
On 10/15/07, Ted Dunning wrote:
Use a list of file names as your map input. Then your mapper can read a
line, use that to open and read a file for processing.
This is similar to the problem of web-crawling where the input is a list of
URL's.
Use a list of file names as your map input. Then your mapper can read a
line, use that to open and read a file for processing.
This is similar to the problem of web-crawling where the input is a list of
URL's.
On 10/15/07 6:57 AM, "Ming Yang" wrote:
I was writing a test mapreduce program and noticed that the
input file was always broken down into separate lines and fed
to the mapper. However, in my case I need to process the whole
file in the mapper since there are some dependency between
lines in the input file. Is there any way I can achieve this --
process the whole input file, either text or binary, in the mapper?
I was writing a test mapreduce program and noticed that the
input file was always broken down into separate lines and fed
to the mapper. However, in my case I need to process the whole
file in the mapper since there are some dependency between
lines in the input file. Is there any way I can achieve this --
process the whole input file, either text or binary, in the mapper?