Grokbase Groups Pig dev August 2008
FAQ
progress reported on every tuple
--------------------------------

Key: PIG-357
URL: https://issues.apache.org/jira/browse/PIG-357
Project: Pig
Issue Type: Improvement
Affects Versions: types_branch
Reporter: Olga Natkovich
Fix For: types_branch


Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.

We might want to go to similar model.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Sam Pullara (JIRA) at Aug 4, 2008 at 6:35 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619634#action_12619634 ]

    Sam Pullara commented on PIG-357:
    ---------------------------------

    I would do this using a timer rather than a absolute number of tuples due to the vagaries of how long processing might take. Maybe every 10s? You could check a boolean to see if the timer went off every tuple, if so report and reset timer.
    progress reported on every tuple
    --------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 4, 2008 at 6:39 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619635#action_12619635 ]

    Olga Natkovich commented on PIG-357:
    ------------------------------------

    I agree that it is a better solution but it adds more complexity to the system. We might choose to do this later. The count while not optimal worked reasonably well in the past
    progress reported on every tuple
    --------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 11, 2008 at 4:27 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates reassigned PIG-357:
    ------------------------------

    Assignee: Alan Gates
    progress reported on every tuple
    --------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Alan Gates
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 11, 2008 at 6:15 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-357:
    -------------------------------

    Priority: Critical (was: Major)
    progress reported on every tuple
    --------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Alan Gates
    Priority: Critical
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 12, 2008 at 4:41 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-357:
    ---------------------------

    Priority: Major (was: Critical)

    I tried commenting out all reporting and in runs of 200 million rows on 25 machines it made absolutely no measurable difference. We may still need to fix this, but it isn't an immediate performance issue.
    progress reported on every tuple
    --------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Alan Gates
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 18, 2008 at 11:26 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-357:
    -------------------------------

    Priority: Minor (was: Major)

    Lowering priority since it is not causing performance issues
    progress reported on every tuple
    --------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Alan Gates
    Priority: Minor
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 4, 2008 at 11:34 pm
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-357:
    ---------------------------

    Summary: PERFORMANCE: progress reported on every tuple (was: progress reported on every tuple)
    PERFORMANCE: progress reported on every tuple
    ---------------------------------------------

    Key: PIG-357
    URL: https://issues.apache.org/jira/browse/PIG-357
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Alan Gates
    Priority: Minor
    Fix For: types_branch


    Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
    We might want to go to similar model.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 4, '08 at 5:39p
activeSep 4, '08 at 11:34p
posts8
users1
websitepig.apache.org

1 user in discussion

Alan Gates (JIRA): 8 posts

People

Translate

site design / logo © 2022 Grokbase