Grokbase Groups Pig dev January 2013
FAQ
James created PIG-3113:
--------------------------

Summary: Shell command execution hangs job
Key: PIG-3113
URL: https://issues.apache.org/jira/browse/PIG-3113
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.1
Reporter: James


Executing a shell command inside a Pig script has the potential to deadlock the job. For example, the following statement will block when somebigfile.txt is sufficiently large:

{code}
%declare input `cat /path/to/somebigfile.txt`
{code}

This happens because PreprocessorContext.executeShellCommand(String) incorrectly uses Runtime.exec(). The sub-process's stderr and stdout streams should be read in a separate thread to prevent p.waitFor() from hanging when the sub-process's output is larger than the output buffer.

Per the Java Process class javadoc: "Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock".

See http://www.javaworld.com/jw-12-2000/jw-1229-traps.html for a correct solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • James (JIRA) at Jan 3, 2013 at 9:52 pm
    [ https://issues.apache.org/jira/browse/PIG-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543322#comment-13543322 ]

    James commented on PIG-3113:
    ----------------------------

    Note that this bug appears in trunk as well.
    Shell command execution hangs job
    ---------------------------------

    Key: PIG-3113
    URL: https://issues.apache.org/jira/browse/PIG-3113
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: 0.8.1
    Reporter: James

    Executing a shell command inside a Pig script has the potential to deadlock the job. For example, the following statement will block when somebigfile.txt is sufficiently large:
    {code}
    %declare input `cat /path/to/somebigfile.txt`
    {code}
    This happens because PreprocessorContext.executeShellCommand(String) incorrectly uses Runtime.exec(). The sub-process's stderr and stdout streams should be read in a separate thread to prevent p.waitFor() from hanging when the sub-process's output is larger than the output buffer.
    Per the Java Process class javadoc: "Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock".
    See http://www.javaworld.com/jw-12-2000/jw-1229-traps.html for a correct solution.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Daniel Dai (JIRA) at Jan 9, 2013 at 5:46 am
    [ https://issues.apache.org/jira/browse/PIG-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547669#comment-13547669 ]

    Daniel Dai commented on PIG-3113:
    ---------------------------------
    From the post, seems we can process the input and output streams before calling waitFor?
    Shell command execution hangs job
    ---------------------------------

    Key: PIG-3113
    URL: https://issues.apache.org/jira/browse/PIG-3113
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: 0.8.1
    Reporter: James

    Executing a shell command inside a Pig script has the potential to deadlock the job. For example, the following statement will block when somebigfile.txt is sufficiently large:
    {code}
    %declare input `cat /path/to/somebigfile.txt`
    {code}
    This happens because PreprocessorContext.executeShellCommand(String) incorrectly uses Runtime.exec(). The sub-process's stderr and stdout streams should be read in a separate thread to prevent p.waitFor() from hanging when the sub-process's output is larger than the output buffer.
    Per the Java Process class javadoc: "Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock".
    See http://www.javaworld.com/jw-12-2000/jw-1229-traps.html for a correct solution.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • James (JIRA) at Jan 15, 2013 at 7:40 pm
    [ https://issues.apache.org/jira/browse/PIG-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554210#comment-13554210 ]

    James commented on PIG-3113:
    ----------------------------

    Sorta. The input and output streams should be processed in separate threads before calling waitFor(). Once waitFor returns, the i/o threads can return their contents for evaluation by executeShellCommand(). I'll see if I can put together a patch, time permitting.
    Shell command execution hangs job
    ---------------------------------

    Key: PIG-3113
    URL: https://issues.apache.org/jira/browse/PIG-3113
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: 0.8.1
    Reporter: James

    Executing a shell command inside a Pig script has the potential to deadlock the job. For example, the following statement will block when somebigfile.txt is sufficiently large:
    {code}
    %declare input `cat /path/to/somebigfile.txt`
    {code}
    This happens because PreprocessorContext.executeShellCommand(String) incorrectly uses Runtime.exec(). The sub-process's stderr and stdout streams should be read in a separate thread to prevent p.waitFor() from hanging when the sub-process's output is larger than the output buffer.
    Per the Java Process class javadoc: "Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock".
    See http://www.javaworld.com/jw-12-2000/jw-1229-traps.html for a correct solution.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedJan 3, '13 at 9:50p
activeJan 15, '13 at 7:40p
posts4
users1
websitepig.apache.org

1 user in discussion

James (JIRA): 4 posts

People

Translate

site design / logo © 2021 Grokbase