Grokbase Groups Pig dev August 2010
FAQ
add support for multiple filesystems
------------------------------------

Key: PIG-1564
URL: https://issues.apache.org/jira/browse/PIG-1564
Project: Pig
Issue Type: Improvement
Reporter: Andrew Hitchcock


Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Andrew Hitchcock (JIRA) at Aug 25, 2010 at 1:52 am
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Hitchcock updated PIG-1564:
    ----------------------------------

    Attachment: PIG-1564-1.patch

    At the moment you can not say read from S3N and write to HDFS in the one job (or even read from 1 S3N bucket and write to another).

    The essence of this patch is a change to the way HDataStorage works. Previously it mapped to 1 Hadoop FileSystem object, which basically limited jobs to a single FileSystem. The change is now that it is a wrapper around all Hadoop FileSystems, returning the correct one based upon the prefix of the path being requested.

    Another small change was that previously Pig assumed the default home directory was '/user/<usename>' on the default file system. This directory does not necessarily always exist, so I made this configurable with a new property "pig.initial.fs.name".
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Aug 25, 2010 at 1:52 am
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Hitchcock updated PIG-1564:
    ----------------------------------

    Status: Patch Available (was: Open)
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jeff Zhang (JIRA) at Aug 25, 2010 at 2:00 am
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902276#action_12902276 ]

    Jeff Zhang commented on PIG-1564:
    ---------------------------------

    Andrew, could you add unit test for your patch ?
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Aug 26, 2010 at 12:14 am
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902720#action_12902720 ]

    Andrew Hitchcock commented on PIG-1564:
    ---------------------------------------

    Hi Jeff,

    Before I add a unit test I'd like confirmation that I'm going about this the right way. In my previous patch (PIG-1505) it was mentioned that HDataStorage was deprecated and this patch has some changes to HDataStorage and related classes. When is the planned deprecation for HDataStorage and is there anything that needs to be modified in addition to HDataStorage for this to work?

    I'd also note that this is a trunk rebase of a patch that we currently have in production with Pig 0.3 and Pig 0.6.

    Thanks,
    Andrew
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Aug 26, 2010 at 5:44 pm
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902952#action_12902952 ]

    Richard Ding commented on PIG-1564:
    -----------------------------------

    Hi Andrew,

    HDataStorage is a thin layer on top of Hadoop FileSystem. Since moving its local mode to Hadoop local mode, Pig no longer needs this layer. We intends to remove it in the feature.

    On Pig reading data from one file system and writing it to another, this feature is supported since Pig 0.7.

    -Richard
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 26, 2010 at 10:56 pm
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903128#action_12903128 ]

    Alan Gates commented on PIG-1564:
    ---------------------------------

    We do intend to remove it, though at the moment there is no other way to access HDFS for UDFs. So before we can officially deprecate it we need to come up with a replacement.

    Andrew, as Richard points out, as of Pig 0.7 load and store functions no longer use HDataStorage. Do you still see this patch as being useful just for UDFs? Or are load and store functions the only use cases for it?
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Aug 27, 2010 at 11:56 pm
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903721#action_12903721 ]

    Andrew Hitchcock commented on PIG-1564:
    ---------------------------------------

    Hi all,

    I think this patch is still useful. With current Pig trunk you can't CD between different filesystems. Example:

    grunt> pwd
    hdfs://ip-10-218-57-248.ec2.internal:9000/user/hadoop
    grunt> cd s3://anhi-test-data/
    2010-08-27 23:53:10,522 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. This file system object (hdfs://ip-10-218-57-248.ec2.internal:9000) does not support access to the request path 's3://anhi-test-data/' You possibly called FileSystem.get(conf) when you should of called FileSystem.get(uri, conf) to obtain a file system supporting your path.
    Details at logfile: /home/hadoop/pig_1282952081120.log

    This patch fixes that issue.

    Andrew
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dmitriy V. Ryaboy (JIRA) at Aug 28, 2010 at 12:16 am
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903728#action_12903728 ]

    Dmitriy V. Ryaboy commented on PIG-1564:
    ----------------------------------------

    Andrew, does 'fs -cd s3://anhi-test-data/' work?

    The cd command is also deprecated (though not marked as such) :)
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Aug 28, 2010 at 12:28 am
    [ https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903733#action_12903733 ]

    Andrew Hitchcock commented on PIG-1564:
    ---------------------------------------

    Nope:

    grunt> fs -cd s3://anhi-test-data/
    cd: Unknown command


    Does that require a specific version of Hadoop to work (since it appears to be sending the call to Hadoop code)?
    add support for multiple filesystems
    ------------------------------------

    Key: PIG-1564
    URL: https://issues.apache.org/jira/browse/PIG-1564
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1564-1.patch


    Currently you can't run Pig scripts that read data from one file system and write it to another. Also, Grunt doesn't support CDing from one directory to another on different file systems.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 25, '10 at 1:48a
activeAug 28, '10 at 12:28a
posts10
users1
websitepig.apache.org

1 user in discussion

Andrew Hitchcock (JIRA): 10 posts

People

Translate

site design / logo © 2022 Grokbase