Grokbase Groups Pig dev August 2010
FAQ
[ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

niraj rai updated PIG-103:
--------------------------

Attachment: conf_tmp_dir.patch

This patch is to make the pig temp directory for the intermediate data configurable.
Shared Job /tmp location should be configurable
-----------------------------------------------

Key: PIG-103
URL: https://issues.apache.org/jira/browse/PIG-103
Project: Pig
Issue Type: Improvement
Components: impl
Environment: Partially shared file:// filesystem (eg NFS)
Reporter: Craig Macdonald
Assignee: niraj rai
Fix For: 0.8.0

Attachments: conf_tmp_dir.patch


Hello,
I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
Any suggestions of property names?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • niraj rai (JIRA) at Aug 5, 2010 at 6:11 pm
    [ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    niraj rai updated PIG-103:
    --------------------------

    Status: Patch Available (was: Open)
    Shared Job /tmp location should be configurable
    -----------------------------------------------

    Key: PIG-103
    URL: https://issues.apache.org/jira/browse/PIG-103
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Environment: Partially shared file:// filesystem (eg NFS)
    Reporter: Craig Macdonald
    Assignee: niraj rai
    Fix For: 0.8.0

    Attachments: conf_tmp_dir.patch


    Hello,
    I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
    However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
    So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
    Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
    Any suggestions of property names?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • niraj rai (JIRA) at Aug 9, 2010 at 5:31 pm
    [ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    niraj rai updated PIG-103:
    --------------------------

    Status: Open (was: Patch Available)
    Shared Job /tmp location should be configurable
    -----------------------------------------------

    Key: PIG-103
    URL: https://issues.apache.org/jira/browse/PIG-103
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Environment: Partially shared file:// filesystem (eg NFS)
    Reporter: Craig Macdonald
    Assignee: niraj rai
    Fix For: 0.8.0

    Attachments: conf_tmp_dir.patch


    Hello,
    I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
    However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
    So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
    Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
    Any suggestions of property names?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • niraj rai (JIRA) at Aug 9, 2010 at 5:33 pm
    [ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    niraj rai updated PIG-103:
    --------------------------

    Attachment: conf_tmp_dir_2.patch

    Implemented the review recommendations.
    Shared Job /tmp location should be configurable
    -----------------------------------------------

    Key: PIG-103
    URL: https://issues.apache.org/jira/browse/PIG-103
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Environment: Partially shared file:// filesystem (eg NFS)
    Reporter: Craig Macdonald
    Assignee: niraj rai
    Fix For: 0.8.0

    Attachments: conf_tmp_dir.patch, conf_tmp_dir_2.patch


    Hello,
    I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
    However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
    So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
    Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
    Any suggestions of property names?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • niraj rai (JIRA) at Aug 9, 2010 at 5:33 pm
    [ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    niraj rai updated PIG-103:
    --------------------------

    Status: Patch Available (was: Open)
    Shared Job /tmp location should be configurable
    -----------------------------------------------

    Key: PIG-103
    URL: https://issues.apache.org/jira/browse/PIG-103
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Environment: Partially shared file:// filesystem (eg NFS)
    Reporter: Craig Macdonald
    Assignee: niraj rai
    Fix For: 0.8.0

    Attachments: conf_tmp_dir.patch, conf_tmp_dir_2.patch


    Hello,
    I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
    However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
    So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
    Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
    Any suggestions of property names?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Aug 9, 2010 at 10:29 pm
    [ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-103:
    -----------------------------

    Status: Resolved (was: Patch Available)
    Hadoop Flags: [Reviewed]
    Resolution: Fixed

    The patch committed to the trunk. Thanks Niraj.
    Shared Job /tmp location should be configurable
    -----------------------------------------------

    Key: PIG-103
    URL: https://issues.apache.org/jira/browse/PIG-103
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Environment: Partially shared file:// filesystem (eg NFS)
    Reporter: Craig Macdonald
    Assignee: niraj rai
    Fix For: 0.8.0

    Attachments: conf_tmp_dir.patch, conf_tmp_dir_2.patch


    Hello,
    I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
    However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
    So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
    Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
    Any suggestions of property names?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Aug 11, 2010 at 8:58 pm
    [ https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Richard Ding updated PIG-103:
    -----------------------------

    Tags: documentation
    Shared Job /tmp location should be configurable
    -----------------------------------------------

    Key: PIG-103
    URL: https://issues.apache.org/jira/browse/PIG-103
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Environment: Partially shared file:// filesystem (eg NFS)
    Reporter: Craig Macdonald
    Assignee: niraj rai
    Fix For: 0.8.0

    Attachments: conf_tmp_dir.patch, conf_tmp_dir_2.patch


    Hello,
    I'm investigating running pig in an environment where various parts of the file:// filesystem are available on all nodes. I can tell hadoop to use a file:// file system location for it's default, by seting fs.default.name=file://path/to/shared/folder
    However, this creates issues for Pig, as Pig writes it's job information in a folder that it assumes is a shared FS (eg DFS). However, in this scenario /tmp is not shared on each machine.
    So /tmp should either be configurable, or Hadoop should tell you the actual full location set in fs.default.name?
    Straightforward solution is to make "/tmp/" a property in src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
    Any suggestions of property names?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 5, '10 at 6:11p
activeAug 11, '10 at 8:58p
posts7
users1
websitepig.apache.org

1 user in discussion

Richard Ding (JIRA): 7 posts

People

Translate

site design / logo © 2023 Grokbase