FAQ
hi,

is a single thread allocated to a single output file when a job is
trying to write multiple output files?

if counts of output files are 10,000, does a hadoop try to create
threads for each output file?

--
Junyoung Kim (juneng603@gmail.com)

Search Discussions

  • Harsh J at Mar 13, 2011 at 5:12 am
    Hello,
    On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim wrote:
    hi,

    is a single thread allocated to a single output file when a job is trying to
    write multiple output files?
    At the lower levels, a data streaming thread is indeed run for every
    OutputStream created for writing on the DFS.

    The map task is generally single threaded unless you multi-thread the
    calls (in which case the record writers are still got in a
    synchronized fashion).
    if counts of output files are 10,000, does a hadoop try to create threads
    for each output file?
    Yes, there should be 10,000 threads 'started' for streaming writes
    (but not all really working at the same time, as per the record writer
    access methods in tasks).

    Please correct me if I'm wrong.

    --
    Harsh J
    www.harshj.com
  • Maha at Mar 15, 2011 at 9:18 pm
    By the way, how do I know if my map task is single threaded (ie. one thread executing for each record ) ? and how to change that into multi-threading ?

    Thank you,
    Maha
    On Mar 12, 2011, at 9:11 PM, Harsh J wrote:

    Hello,
    On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim wrote:
    hi,

    is a single thread allocated to a single output file when a job is trying to
    write multiple output files?
    At the lower levels, a data streaming thread is indeed run for every
    OutputStream created for writing on the DFS.

    The map task is generally single threaded unless you multi-thread the
    calls (in which case the record writers are still got in a
    synchronized fashion).
    if counts of output files are 10,000, does a hadoop try to create threads
    for each output file?
    Yes, there should be 10,000 threads 'started' for streaming writes
    (but not all really working at the same time, as per the record writer
    access methods in tasks).

    Please correct me if I'm wrong.

    --
    Harsh J
    www.harshj.com
  • Maha at Mar 15, 2011 at 9:50 pm
    I found it :)

    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html

    Maha
    On Mar 15, 2011, at 2:18 PM, maha wrote:

    By the way, how do I know if my map task is single threaded (ie. one thread executing for each record ) ? and how to change that into multi-threading ?

    Thank you,
    Maha
    On Mar 12, 2011, at 9:11 PM, Harsh J wrote:

    Hello,
    On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim wrote:
    hi,

    is a single thread allocated to a single output file when a job is trying to
    write multiple output files?
    At the lower levels, a data streaming thread is indeed run for every
    OutputStream created for writing on the DFS.

    The map task is generally single threaded unless you multi-thread the
    calls (in which case the record writers are still got in a
    synchronized fashion).
    if counts of output files are 10,000, does a hadoop try to create threads
    for each output file?
    Yes, there should be 10,000 threads 'started' for streaming writes
    (but not all really working at the same time, as per the record writer
    access methods in tasks).

    Please correct me if I'm wrong.

    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 12, '11 at 10:25a
activeMar 15, '11 at 9:50p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Maha: 2 posts Jun Young Kim: 1 post Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase