FAQ
How can I tell how the map and reduce tasks were spread accross the
cluster? I looked at the jobtracker web page but can't find that info.

Also, can I specify how many map or reduce tasks I want to be launched?
From what I understand is that it's based on the number of input files
passed to hadoop. So if I have 4 files there will be 4 Map taks that
will be launced and reducer is dependent on the hashpartitioner.

Search Discussions

  • Jagaran das at May 26, 2011 at 10:09 pm
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.
  • Mohit Anchlia at May 26, 2011 at 10:30 pm
    I ran a simple pig script on this file:

    -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log

    that orders the contents by name. But it only created one mapper. How
    can I change this to distribute accross multiple machines?
    On Thu, May 26, 2011 at 3:08 PM, jagaran das wrote:
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.
  • James Seigel at May 26, 2011 at 10:42 pm
    have more data for it to process :)

    On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

    I ran a simple pig script on this file:

    -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log

    that orders the contents by name. But it only created one mapper. How
    can I change this to distribute accross multiple machines?
    On Thu, May 26, 2011 at 3:08 PM, jagaran das wrote:
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.
  • Mohit Anchlia at May 26, 2011 at 10:56 pm
    I think I understand that by last 2 replies :) But my question is can
    I change this configuration to say split file into 250K so that
    multiple mappers can be invoked?
    On Thu, May 26, 2011 at 3:41 PM, James Seigel wrote:
    have more data for it to process :)

    On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

    I ran a simple pig script on this file:

    -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

    that orders the contents by name. But it only created one mapper. How
    can I change this to distribute accross multiple machines?
    On Thu, May 26, 2011 at 3:08 PM, jagaran das wrote:
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.
  • James Seigel at May 27, 2011 at 1:04 am
    Set input split size really low, you might get something.

    I'd rather you fire up some nix commands and pack together that file
    onto itself a bunch if times and the put it back into hdfs and let 'er
    rip

    Sent from my mobile. Please excuse the typos.
    On 2011-05-26, at 4:56 PM, Mohit Anchlia wrote:

    I think I understand that by last 2 replies :) But my question is can
    I change this configuration to say split file into 250K so that
    multiple mappers can be invoked?
    On Thu, May 26, 2011 at 3:41 PM, James Seigel wrote:
    have more data for it to process :)

    On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

    I ran a simple pig script on this file:

    -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log

    that orders the contents by name. But it only created one mapper. How
    can I change this to distribute accross multiple machines?
    On Thu, May 26, 2011 at 3:08 PM, jagaran das wrote:
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.
  • Jagaran das at May 27, 2011 at 2:21 am
    If you give really low size files, then the use of "Big Block Size" of Hadoop
    goes away.
    Instead try merging files.

    Hope that helps



    ________________________________
    From: James Seigel <james@tynt.com>
    To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
    Sent: Thu, 26 May, 2011 6:04:07 PM
    Subject: Re: No. of Map and reduce tasks

    Set input split size really low, you might get something.

    I'd rather you fire up some nix commands and pack together that file
    onto itself a bunch if times and the put it back into hdfs and let 'er
    rip

    Sent from my mobile. Please excuse the typos.
    On 2011-05-26, at 4:56 PM, Mohit Anchlia wrote:

    I think I understand that by last 2 replies :) But my question is can
    I change this configuration to say split file into 250K so that
    multiple mappers can be invoked?
    On Thu, May 26, 2011 at 3:41 PM, James Seigel wrote:
    have more data for it to process :)

    On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

    I ran a simple pig script on this file:

    -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log

    that orders the contents by name. But it only created one mapper. How
    can I change this to distribute accross multiple machines?
    On Thu, May 26, 2011 at 3:08 PM, jagaran das wrote:
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.
  • Mohit Anchlia at May 31, 2011 at 7:18 pm
    What if I had multiple files in input directory, hadoop should then
    fire parallel map jobs?

    On Thu, May 26, 2011 at 7:21 PM, jagaran das wrote:
    If you give really low size files, then the use of "Big Block Size" of Hadoop
    goes away.
    Instead try merging files.

    Hope that helps



    ________________________________
    From: James Seigel <james@tynt.com>
    To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
    Sent: Thu, 26 May, 2011 6:04:07 PM
    Subject: Re: No. of Map and reduce tasks

    Set input split size really low,  you might get something.

    I'd rather you fire up some nix commands and pack together that file
    onto itself a bunch if times and the put it back into hdfs and let 'er
    rip

    Sent from my mobile. Please excuse the typos.
    On 2011-05-26, at 4:56 PM, Mohit Anchlia wrote:

    I think I understand that by last 2 replies :)  But my question is can
    I change this configuration to say split file into 250K so that
    multiple mappers can be invoked?
    On Thu, May 26, 2011 at 3:41 PM, James Seigel wrote:
    have more data for it to process :)

    On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

    I ran a simple pig script on this file:

    -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

    that orders the contents by name. But it only created one mapper. How
    can I change this to distribute accross multiple machines?
    On Thu, May 26, 2011 at 3:08 PM, jagaran das wrote:
    Hi Mohit,

    No of Maps - It depends on what is the Total File Size / Block Size
    No of Reducers - You can specify.

    Regards,
    Jagaran



    ________________________________
    From: Mohit Anchlia <mohitanchlia@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, 26 May, 2011 2:48:20 PM
    Subject: No. of Map and reduce tasks

    How can I tell how the map and reduce tasks were spread accross the
    cluster? I looked at the jobtracker web page but can't find that info.

    Also, can I specify how many map or reduce tasks I want to be launched?

    From what I understand is that it's based on the number of input files
    passed to hadoop. So if I have 4 files there will be 4 Map taks that
    will be launced and reducer is dependent on the hashpartitioner.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 26, '11 at 9:48p
activeMay 31, '11 at 7:18p
posts8
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase