Grokbase Groups Pig user January 2011
FAQ
Pig 0.8 executes my script by running six jobs. One of them is identified as
"MAP_ONLY" and it always fails, with the innermost error I can find either
saying "GC overhead limit exceeded" or "Java heap space". I suspect I have a
piece that is too large. How can I get my hands on the actual data it was
processing, so I can ascertain the cause? The task log says "Input records
from tmp1872359169" can I view that data?

Thanks,

Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1 310
437 7300

</pre>
<BR style="font-size:4px;">
<a href = "http://www.sdl.com/innovate"><img src="http://www.sdl.com/images/Innovate2011_emailsignature_final.png" alt="www.sdl.com" border="0"/></a>
<BR>
<font face="arial" size="2"><a href ="http://www.sdl.com/innovate" style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font>
<BR>
<BR>
<font face="arial" size="1" color="#736F6E">
<b>SDL PLC confidential, all rights reserved.</b>
If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents, and we further request that you advise us.<BR>
SDL PLC is a public limited company registered in England and Wales. Registered number: 02675207.<BR>
Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK.
</font>

Search Discussions

  • Dmitriy Ryaboy at Jan 26, 2011 at 11:24 pm
    Greg,
    Pig 8 tells you which job is responsible for which set of operators; you can
    save all the inputs to the map only job by inserting intermediate stores,
    and debug just the map-only job.

    D
    On Wed, Jan 26, 2011 at 2:49 PM, Greg Langmead wrote:

    Pig 0.8 executes my script by running six jobs. One of them is identified
    as
    "MAP_ONLY" and it always fails, with the innermost error I can find either
    saying "GC overhead limit exceeded" or "Java heap space". I suspect I have
    a
    piece that is too large. How can I get my hands on the actual data it was
    processing, so I can ascertain the cause? The task log says "Input records
    from tmp1872359169" can I view that data?

    Thanks,

    Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1
    310
    437 7300

    </pre>
    <BR style="font-size:4px;">
    <a href = "http://www.sdl.com/innovate"><img src="
    http://www.sdl.com/images/Innovate2011_emailsignature_final.png" alt="
    www.sdl.com" border="0"/></a>
    <BR>
    <font face="arial" size="2"><a href ="http://www.sdl.com/innovate"
    style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font>
    <BR>
    <BR>
    <font face="arial" size="1" color="#736F6E">
    <b>SDL PLC confidential, all rights reserved.</b>
    If you are not the intended recipient of this mail SDL requests and
    requires that you delete it without acting upon or copying any of its
    contents, and we further request that you advise us.<BR>
    SDL PLC is a public limited company registered in England and Wales.
    Registered number: 02675207.<BR>
    Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6
    7DY, UK.
    </font>
  • Greg Langmead at Jan 27, 2011 at 4:47 pm
    Thank you, Dmitriy. I do see which relations my map-only job was working on,
    but how do I see which subset of the data a given piece of that Map job was
    working on, e.g. attempt_201101201235_0064_m_000243_0

    If I save the input data by storing it before the Map job runs, I will still
    have the conundrum of identifying which subset of it went to piece 243,
    unless I'm misunderstanding.

    Greg
    On 1/26/11 6:23 PM, "Dmitriy Ryaboy" wrote:

    Greg,
    Pig 8 tells you which job is responsible for which set of operators; you can
    save all the inputs to the map only job by inserting intermediate stores,
    and debug just the map-only job.

    D
    On Wed, Jan 26, 2011 at 2:49 PM, Greg Langmead wrote:

    Pig 0.8 executes my script by running six jobs. One of them is identified
    as
    "MAP_ONLY" and it always fails, with the innermost error I can find either
    saying "GC overhead limit exceeded" or "Java heap space". I suspect I have
    a
    piece that is too large. How can I get my hands on the actual data it was
    processing, so I can ascertain the cause? The task log says "Input records
    from tmp1872359169" can I view that data?

    Thanks,

    Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1
    310
    437 7300

    </pre>
    <BR style="font-size:4px;">
    <a href = "http://www.sdl.com/innovate"><img src="
    http://www.sdl.com/images/Innovate2011_emailsignature_final.png" alt="
    www.sdl.com" border="0"/></a>
    <BR>
    <font face="arial" size="2"><a href ="http://www.sdl.com/innovate"
    style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font>
    <BR>
    <BR>
    <font face="arial" size="1" color="#736F6E">
    <b>SDL PLC confidential, all rights reserved.</b>
    If you are not the intended recipient of this mail SDL requests and
    requires that you delete it without acting upon or copying any of its
    contents, and we further request that you advise us.<BR>
    SDL PLC is a public limited company registered in England and Wales.
    Registered number: 02675207.<BR>
    Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6
    7DY, UK.
    </font>
  • Dmitriy Ryaboy at Jan 27, 2011 at 6:37 pm
    Pass input through a logging UDF (a basic udf that just echoes input to
    stderr)?
    On Thu, Jan 27, 2011 at 8:46 AM, Greg Langmead wrote:

    Thank you, Dmitriy. I do see which relations my map-only job was working
    on,
    but how do I see which subset of the data a given piece of that Map job was
    working on, e.g. attempt_201101201235_0064_m_000243_0

    If I save the input data by storing it before the Map job runs, I will
    still
    have the conundrum of identifying which subset of it went to piece 243,
    unless I'm misunderstanding.

    Greg
    On 1/26/11 6:23 PM, "Dmitriy Ryaboy" wrote:

    Greg,
    Pig 8 tells you which job is responsible for which set of operators; you can
    save all the inputs to the map only job by inserting intermediate stores,
    and debug just the map-only job.

    D
    On Wed, Jan 26, 2011 at 2:49 PM, Greg Langmead wrote:

    Pig 0.8 executes my script by running six jobs. One of them is
    identified
    as
    "MAP_ONLY" and it always fails, with the innermost error I can find
    either
    saying "GC overhead limit exceeded" or "Java heap space". I suspect I
    have
    a
    piece that is too large. How can I get my hands on the actual data it
    was
    processing, so I can ascertain the cause? The task log says "Input
    records
    from tmp1872359169" can I view that data?

    Thanks,

    Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1
    310
    437 7300

    </pre>
    <BR style="font-size:4px;">
    <a href = "http://www.sdl.com/innovate"><img src="
    http://www.sdl.com/images/Innovate2011_emailsignature_final.png" alt="
    www.sdl.com" border="0"/></a>
    <BR>
    <font face="arial" size="2"><a href ="http://www.sdl.com/innovate"
    style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font>
    <BR>
    <BR>
    <font face="arial" size="1" color="#736F6E">
    <b>SDL PLC confidential, all rights reserved.</b>
    If you are not the intended recipient of this mail SDL requests and
    requires that you delete it without acting upon or copying any of its
    contents, and we further request that you advise us.<BR>
    SDL PLC is a public limited company registered in England and Wales.
    Registered number: 02675207.<BR>
    Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire
    SL6
    7DY, UK.
    </font>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 26, '11 at 10:49p
activeJan 27, '11 at 6:37p
posts4
users2
websitepig.apache.org

2 users in discussion

Dmitriy Ryaboy: 2 posts Greg Langmead: 2 posts

People

Translate

site design / logo © 2021 Grokbase