FAQ
This is puzzling me ...

With a mapper producing output of size ~ 400 MB ... which one is supposed
to be faster?

1) output collector: which will write to local file then copy to HDFS since
I don't have reducers.

2) Open a unique local file inside "mapred.local.dir" for each mapper.

I thought of (2), but (1) was actually faster ... can someone explains ?

Thanks,
Mark

Search Discussions

  • Harsh J at May 20, 2011 at 11:24 am
    Mark,
    On Fri, May 20, 2011 at 10:17 AM, Mark question wrote:
    This is puzzling me ...

    With a mapper producing output of size ~ 400 MB ... which one is supposed
    to be faster?

    1) output collector: which will write to local file then copy to HDFS since
    I don't have reducers.
    A regular map-only job does not write to the local FS, it writes to
    the HDFS directly (i.e., a local DN if one is found).

    --
    Harsh J
  • Mark question at May 20, 2011 at 4:07 pm
    I thought it was, because of FileBytesWritten counter. Thanks for the
    clarification.
    Mark
    On Fri, May 20, 2011 at 4:23 AM, Harsh J wrote:

    Mark,
    On Fri, May 20, 2011 at 10:17 AM, Mark question wrote:
    This is puzzling me ...

    With a mapper producing output of size ~ 400 MB ... which one is supposed
    to be faster?

    1) output collector: which will write to local file then copy to HDFS since
    I don't have reducers.
    A regular map-only job does not write to the local FS, it writes to
    the HDFS directly (i.e., a local DN if one is found).

    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 20, '11 at 4:48a
activeMay 20, '11 at 4:07p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Mark question: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase