FAQ
pig produces errors after a job is said to be 100% done
-------------------------------------------------------

Key: PIG-457
URL: https://issues.apache.org/jira/browse/PIG-457
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Olga Natkovich
Fix For: types_branch


It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Sep 24, 2008 at 9:06 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich reassigned PIG-457:
    ----------------------------------

    Assignee: Shravan Matthur Narayanamurthy
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 26, 2008 at 10:32 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-457:
    -----------------------------------------------

    Status: Patch Available (was: Open)

    There are two issues that this patch tries to address:
    1) Exceptions and traces even after a successful completion:
    Currently, we have the same code path for both the success case & failure case for getting & printing error messages. So this fix breaks the code path to use debug for failures in a successful completion which are solved by retries & to use error for failures in an unsuccessful run.

    2) Shows 100% even if there are failures
    This is a direct result of what hadoop does. It marks the map and reduce tasks as 100 % complete irrespective of their success or failure. In some sense these are unrelated dimensions. Since its better to relate these two, we need to make sure that we don't report 100% complete in case of a failed execution. This is a hack where I check if the progress has become 100% and postpone its display till I am sure that the job has completed successfully.

    There are some other fixes to the completion percentage display logic which displays the percentage completion. In the code as we are chasing a moving target and when we assume that the job is in a particular state & try to do some processing based on that assumption, we might get spurious results. One example is we get the list of running jobs and try to get the progress for each job. While doing this, the state of this job might change from running to something else and its not easy to construct all the possible scenarios into the code. Thus when we try to fetch the progress of a previously running job which has changed state, we will get spurious results. To mitigate this, we make a simple assumption that the job can't regress and if we see such a condition, we ignore it as we know its temporary.

    Another thing that has been introduced into the logic is an exponential delay scheme which will be useful when we are in a job which is not progressing may be due to bag spilling or some udf running. In this case each progress reported is the same for some time. During this time, we can either implement something where we hard limit saying if we don't see progress we don't report it or we can just report the same progress. There are cons with both approaches: for 1) it might seem like the job is stuck or there is processing happening if we don't display anything. for 2)its surely going to fill the screen with something that is not adding any more information. So we try to introduce delays between each batch of same progress display which increase exponentially with each batch completing. Currently the batch size is half the number of retries which is 6 since sleep time is 5 sec now; like trying to have a progress reported every 30 sec but delaying future displays of the same progress using an exponential delay scheme.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 26, 2008 at 10:32 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-457:
    -----------------------------------------------

    Attachment: 457-2.patch
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 26, 2008 at 7:04 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-457:
    ---------------------------

    Attachment: 457-3.patch

    I propose we split this patch into two. The first part I think we all agree on and we want to get it soon. The second part I think we want to review further. I've tried to split out the stuff for the first part and attached it in this 457-3.patch.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 26, 2008 at 7:10 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634971#action_12634971 ]

    Olga Natkovich commented on PIG-457:
    ------------------------------------

    +1
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 26, 2008 at 7:58 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634984#action_12634984 ]

    Shravan Matthur Narayanamurthy commented on PIG-457:
    ----------------------------------------------------

    The patch also removes the fix which resolves 2 where it shows hundred percent even after failures. Also, pls call jc.stop before returning false from the failure case. Also, the LocalJobRunner.compile has the StreamVisitor thingy missing which I think causes the stream tests in local mode to fail. If thats actually the case, i would keep that in too.

    Optionally we might want to keep the check that discards progress which is decreasing because it causes the condition we have there of prog<lastProg to fail. The progress is like x and lastProg is also x. This goes on for a while and suddenly, probably during failure, the prog becomes zero which causes the condition to hold in the next iteration and makes us print the same progress again and again.

    The exponential backoff needs some review I agree
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 26, 2008 at 10:56 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-457:
    ---------------------------

    Attachment: 457-4.patch

    Here's another whack at splitting the patch. I addressed Shravan's request to add in the MRStreaming stuff into LocalLauncher, and put in the jc.stop() calls after failed jobs in both Local and MapReduceLauncher.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 26, 2008 at 11:34 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635066#action_12635066 ]

    Olga Natkovich commented on PIG-457:
    ------------------------------------

    +1
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 27, 2008 at 12:04 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635074#action_12635074 ]

    Alan Gates commented on PIG-457:
    --------------------------------

    PIG-457-4 has been checked in. I'm not closing the bug because this doesn't address the issue of reporting 100% progress and then reporting failures, which is also part of the bug.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 30, 2008 at 5:28 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-457:
    -----------------------------------------------

    Attachment: 457-part2.patch

    Separated the 2nd part
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Oct 1, 2008 at 10:55 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636230#action_12636230 ]

    Olga Natkovich commented on PIG-457:
    ------------------------------------

    I reviewed the current code and the proposed changes.

    The current code looks fine and should be doing the right thing. So it seems like the patch is adding quite a bit of complexity to hide some sort of issues with hadoop progress.

    I winder if the easiest change for now would be to check the progress and if it adds to 100% not to report it until we check that the job successfully finished.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Oct 2, 2008 at 1:05 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636261#action_12636261 ]

    Shravan Matthur Narayanamurthy commented on PIG-457:
    ----------------------------------------------------

    Actually, the final progress check is exactly that. It doesn't report 100% until it is also sure that the job completed successfully. However, there are 2 other changes: 1) to mask some issue with hadoop progress 2) When the issue (1) is masked we would not see any kind of progress if the job is not showing progress after a certain point of time. So instead appearing frozen, there is some logic which displays the same percentage with an exponentially increasing delay between each report of the same progress just to say that we are alive. Once we get a different progress all counters are reset and it resumes normal progress reporting until the progress is blocked again. (Not exactly right but just to simplify understanding. It actually adds exponential delay between a bunch of same progress reports).
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Oct 2, 2008 at 1:15 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636263#action_12636263 ]

    Olga Natkovich commented on PIG-457:
    ------------------------------------

    Yes, I realize that but I question whether we need to add this complexity. My preference is just extract the part of the patch that delays 100% reporting
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Oct 3, 2008 at 6:50 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-457:
    -----------------------------------------------

    Attachment: 457-prog.patch

    just delaying the progress part.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch, 457-prog.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Oct 3, 2008 at 7:00 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636710#action_12636710 ]

    Olga Natkovich commented on PIG-457:
    ------------------------------------

    shravan, we do want to display 100% once the job successfully finishes. I don't the latest version of the patch does that.
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch, 457-prog.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Oct 6, 2008 at 6:14 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-457:
    -----------------------------------------------

    Attachment: 457-prog.patch

    Sorry, missed that one. The new patch with the changes for the progress
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch, 457-prog.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Oct 6, 2008 at 6:14 am
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-457:
    -----------------------------------------------

    Attachment: (was: 457-prog.patch)
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch, 457-prog.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Oct 6, 2008 at 6:08 pm
    [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-457:
    -------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    patch committed; thanks, shravan
    pig produces errors after a job is said to be 100% done
    -------------------------------------------------------

    Key: PIG-457
    URL: https://issues.apache.org/jira/browse/PIG-457
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 457-2.patch, 457-3.patch, 457-4.patch, 457-part2.patch, 457-prog.patch


    It is possible that we get errors for all tasks even the ones we retried. Need to look at the code that handles detecting end of processing and producing errors.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 24, '08 at 6:16p
activeOct 6, '08 at 6:08p
posts19
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 19 posts

People

Translate

site design / logo © 2022 Grokbase