FAQ
I stopped a job that was running very slowly, it was running in it's
reduce (phase:reduce) part. However, I still want it's output and I
cannot run this job again. So I have to stick with the intermediate files.

I have a 30GB file map_0.out (found in reducer jobcache) and I want to
read it's contents using an InputFormat. It's not a SequenceFile as I
already tried that out. How do I read this file? I presume it's some
sort of sorted map of Writable key with corresponding Writable values.
(After all, this file was being used directly for the reducer function).

Any help will be greatly appreciated.

Search Discussions

  • Owen O'Malley at Jan 10, 2011 at 5:52 pm
    The intermediate files are called IFiles. The format is trivial and you can
    read the code to see it. The only tricky bit is that you effectively have N
    IFiles concatenated together (one per a reduce).

    -- Owen
  • Ferdy Galema at Jan 11, 2011 at 11:58 am
    Thanks. I succesfully created an InputFormat that uses an IFile.Reader.
    The fact that the files are concatenated did not seem to matter much, I
    could use a single IFile.Reader to read the entire map_0.out file.

    Ferdy.

    Owen O'Malley wrote:
    The intermediate files are called IFiles. The format is trivial and
    you can read the code to see it. The only tricky bit is that you
    effectively have N IFiles concatenated together (one per a reduce).

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJan 10, '11 at 4:38p
activeJan 11, '11 at 11:58a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Ferdy Galema: 2 posts Owen O'Malley: 1 post

People

Translate

site design / logo © 2021 Grokbase