Grokbase Groups Pig user March 2011
FAQ
Hi,

We've seen a strange problem where some Pig jobs would just run fewer
mappers concurrently than the mapper capacity. Specifically we have a 10
node cluster and each is configured to have 12 mappers. Normally we have 120
mappers running. But for some Pig jobs it will only have 10 mappers running
(while nothing else is running), and actually appears to be 1 mapper per
node.

We have not noticed the same problem with other non-Pig hadoop job. Anyone
has experienced the same thing and have any explanation or remedy?

Thanks!
Dexin

Search Discussions

  • Dexin Wang at Mar 24, 2011 at 12:45 am
    And the nodes are pretty lightly loaded (~1.0) and there's plenty of free
    memory. Now I'm seeing 2 mappers per node. Very much under-utilized.
    On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang wrote:

    Hi,

    We've seen a strange problem where some Pig jobs would just run fewer
    mappers concurrently than the mapper capacity. Specifically we have a 10
    node cluster and each is configured to have 12 mappers. Normally we have 120
    mappers running. But for some Pig jobs it will only have 10 mappers running
    (while nothing else is running), and actually appears to be 1 mapper per
    node.

    We have not noticed the same problem with other non-Pig hadoop job. Anyone
    has experienced the same thing and have any explanation or remedy?

    Thanks!
    Dexin
  • Alan Gates at Mar 24, 2011 at 12:56 am
    What version of Pig are you using? Starting in 0.8 Pig will combine
    small blocks into a single map. This prevents jobs that actually are
    reading small amounts of data from taking a lot of slots on the
    cluster. You can turn this off by adding -
    Dpig.noSplitCombination=true to your command line.

    Alan.
    On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote:

    And the nodes are pretty lightly loaded (~1.0) and there's plenty of
    free
    memory. Now I'm seeing 2 mappers per node. Very much under-utilized.
    On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang wrote:

    Hi,

    We've seen a strange problem where some Pig jobs would just run fewer
    mappers concurrently than the mapper capacity. Specifically we have
    a 10
    node cluster and each is configured to have 12 mappers. Normally we
    have 120
    mappers running. But for some Pig jobs it will only have 10 mappers
    running
    (while nothing else is running), and actually appears to be 1
    mapper per
    node.

    We have not noticed the same problem with other non-Pig hadoop job.
    Anyone
    has experienced the same thing and have any explanation or remedy?

    Thanks!
    Dexin
  • Dexin Wang at Mar 24, 2011 at 12:59 am
    Thanks Alan!

    We are using 0.79. Also got an answer from #hadoop channel and with this
    quora answer:

    http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency

    <http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency>We
    will look into combining more work in each mapper and/or use Pig 0.8.

    Thanks again for your help.

    Dexin
    On Wed, Mar 23, 2011 at 5:55 PM, Alan Gates wrote:

    What version of Pig are you using? Starting in 0.8 Pig will combine small
    blocks into a single map. This prevents jobs that actually are reading
    small amounts of data from taking a lot of slots on the cluster. You can
    turn this off by adding -Dpig.noSplitCombination=true to your command line.

    Alan.


    On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote:

    And the nodes are pretty lightly loaded (~1.0) and there's plenty of free
    memory. Now I'm seeing 2 mappers per node. Very much under-utilized.

    On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang wrote:

    Hi,
    We've seen a strange problem where some Pig jobs would just run fewer
    mappers concurrently than the mapper capacity. Specifically we have a 10
    node cluster and each is configured to have 12 mappers. Normally we have
    120
    mappers running. But for some Pig jobs it will only have 10 mappers
    running
    (while nothing else is running), and actually appears to be 1 mapper per
    node.

    We have not noticed the same problem with other non-Pig hadoop job.
    Anyone
    has experienced the same thing and have any explanation or remedy?

    Thanks!
    Dexin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 23, '11 at 8:40p
activeMar 24, '11 at 12:59a
posts4
users2
websitepig.apache.org

2 users in discussion

Dexin Wang: 3 posts Alan Gates: 1 post

People

Translate

site design / logo © 2021 Grokbase