FAQ
hi, all.

I have 60 reducers which are generating same output files.

from output-r--00001 to output-r-00059.

under this situation, I want to control the count of output files.

for example, is it possible to concatenate all output files to 10 ?

from output-r-00001 to output-r-00010.

thanks

--
Junyoung Kim (juneng603@gmail.com)

Search Discussions

  • Harsh J at May 12, 2011 at 5:18 am
    Short, blind answer: You could run 10 reducers.

    Otherwise, you'll have to run another job that picks up a few files
    each in mapper and merges them out. But having 60 files shouldn't
    really be a problem if they are sufficiently large (at least 80% of a
    block size perhaps -- you can tune # of reducers to achieve this).
    On Thu, May 12, 2011 at 6:14 AM, Jun Young Kim wrote:
    hi, all.

    I have 60 reducers which are generating same output files.

    from output-r--00001 to output-r-00059.

    under this situation, I want to control the count of output files.

    for example, is it possible to concatenate all output files to 10 ?

    from output-r-00001 to output-r-00010.

    thanks

    --
    Junyoung Kim (juneng603@gmail.com)


    --
    Harsh J
  • Jun Young Kim at May 13, 2011 at 1:38 am
    yes. that is a general solution to control counts of output files.

    however, if you need to control counts of outputs dynamically, how could
    you do?

    if an output file name is 'A', counts of this output files are needed to
    be 5.
    if an output file name is 'B', counts of this output files are needed to
    be 10.

    is it able to be under hadoop?

    Junyoung Kim (juneng603@gmail.com)

    On 05/12/2011 02:17 PM, Harsh J wrote:
    Short, blind answer: You could run 10 reducers.

    Otherwise, you'll have to run another job that picks up a few files
    each in mapper and merges them out. But having 60 files shouldn't
    really be a problem if they are sufficiently large (at least 80% of a
    block size perhaps -- you can tune # of reducers to achieve this).

    On Thu, May 12, 2011 at 6:14 AM, Jun Young Kimwrote:
    hi, all.

    I have 60 reducers which are generating same output files.

    from output-r--00001 to output-r-00059.

    under this situation, I want to control the count of output files.

    for example, is it possible to concatenate all output files to 10 ?

    from output-r-00001 to output-r-00010.

    thanks

    --
    Junyoung Kim (juneng603@gmail.com)
  • Joey Echeverria at May 13, 2011 at 1:58 am
    You can control the number of reducers by calling
    job.setNumReduceTasks() before you launch it.

    -Joey
    On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim wrote:
    yes. that is a general solution to control counts of output files.

    however, if you need to control counts of outputs dynamically, how could you
    do?

    if an output file name is 'A', counts of this output files are needed to be
    5.
    if an output file name is 'B', counts of this output files are needed to be
    10.

    is it able to be under hadoop?

    Junyoung Kim (juneng603@gmail.com)

    On 05/12/2011 02:17 PM, Harsh J wrote:

    Short, blind answer: You could run 10 reducers.

    Otherwise, you'll have to run another job that picks up a few files
    each in mapper and merges them out. But having 60 files shouldn't
    really be a problem if they are sufficiently large (at least 80% of a
    block size perhaps -- you can tune # of reducers to achieve this).

    On Thu, May 12, 2011 at 6:14 AM, Jun Young Kim<juneng603@gmail.com>
    wrote:
    hi, all.

    I have 60 reducers which are generating same output files.

    from output-r--00001 to output-r-00059.

    under this situation, I want to control the count of output files.

    for example, is it possible to concatenate all output files to 10 ?

    from output-r-00001 to output-r-00010.

    thanks

    --
    Junyoung Kim (juneng603@gmail.com)


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 12, '11 at 12:49a
activeMay 13, '11 at 1:58a
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase