Grokbase Groups Pig dev August 2010
FAQ
Optimize scalar to use distributed cache
-----------------------------------------

Key: PIG-1548
URL: https://issues.apache.org/jira/browse/PIG-1548
Project: Pig
Issue Type: Improvement
Components: impl
Reporter: Daniel Dai
Assignee: Thejas M Nair
Fix For: 0.8.0


Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. This puts a huge load to namenode. We should use distributed cache for scalar file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Daniel Dai (JIRA) at Aug 18, 2010 at 12:52 am
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-1548:
    ----------------------------

    Summary: Optimize scalar to consolidate the part file (was: Optimize scalar to use distributed cache )
    Description: Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode. (was: Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. This puts a huge load to namenode. We should use distributed cache for scalar file.)
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Thejas M Nair
    Fix For: 0.8.0


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Thejas M Nair (JIRA) at Sep 2, 2010 at 5:30 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Thejas M Nair reassigned PIG-1548:
    ----------------------------------

    Assignee: Richard Ding (was: Thejas M Nair)
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 12:32 am
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-1548:
    ------------------------------

    Attachment: PIG-1458.patch


    Results of test-patch:

    {code}
    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 3 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

    {code}
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1458.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 12:32 am
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-1548:
    ------------------------------

    Status: Patch Available (was: Open)
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1458.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 6:08 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-1548:
    ------------------------------

    Attachment: PIG-1548.patch
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 6:08 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-1548:
    ------------------------------

    Attachment: (was: PIG-1458.patch)
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 10:42 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-1548:
    ------------------------------

    Attachment: PIG-1548_1.patch

    The patch excludes some multiquery cases where more information is needed to correlate and determine the files to consolidate. We'll consider those cases in a separate jira.
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch, PIG-1548_1.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Thejas M Nair (JIRA) at Sep 3, 2010 at 10:45 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906152#action_12906152 ]

    Thejas M Nair commented on PIG-1548:
    ------------------------------------

    Looks good. +1
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch, PIG-1548_1.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 11:33 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-1548:
    ------------------------------

    Status: Resolved (was: Patch Available)
    Hadoop Flags: [Reviewed]
    Resolution: Fixed

    patch committed to both trunk and 0.8 branch.
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch, PIG-1548_1.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Sep 5, 2010 at 5:30 am
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906321#action_12906321 ]

    Daniel Dai commented on PIG-1548:
    ---------------------------------

    Patch break TestFRJoin2.testConcatenateJobForScalar3. Comment out TestFRJoin2.testConcatenateJobForScalar3 temporarily.
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch, PIG-1548_1.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Thejas M Nair (JIRA) at Sep 9, 2010 at 7:41 pm
    [ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907750#action_12907750 ]

    Thejas M Nair commented on PIG-1548:
    ------------------------------------

    bq. Patch break TestFRJoin2.testConcatenateJobForScalar3. Comment out TestFRJoin2.testConcatenateJobForScalar3 temporarily.

    Created PIG-1603 to address this .
    Optimize scalar to consolidate the part file
    --------------------------------------------

    Key: PIG-1548
    URL: https://issues.apache.org/jira/browse/PIG-1548
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Reporter: Daniel Dai
    Assignee: Richard Ding
    Fix For: 0.8.0

    Attachments: PIG-1548.patch, PIG-1548_1.patch


    Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 18, '10 at 12:46a
activeSep 9, '10 at 7:41p
posts12
users1
websitepig.apache.org

1 user in discussion

Thejas M Nair (JIRA): 12 posts

People

Translate

site design / logo © 2022 Grokbase