FAQ
Create symbolic links in HDFS
-----------------------------

Key: HADOOP-4044
URL: https://issues.apache.org/jira/browse/HADOOP-4044
Project: Hadoop Core
Issue Type: New Feature
Components: dfs
Reporter: dhruba borthakur
Assignee: dhruba borthakur


HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Joydeep Sen Sarma (JIRA) at Sep 3, 2008 at 5:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628072#action_12628072 ]

    Joydeep Sen Sarma commented on HADOOP-4044:
    -------------------------------------------

    +1. good primitive to have.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 4, 2008 at 6:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628272#action_12628272 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    The FileStatus object will indicate whether a path is a symbolic link. The NameNode will have a new INode type called INodeSymbolicLink. It will store the contents of the symbolic link in the INodeSymbolicLink object. This information will also be stored in the fsimage.

    The FileSystem.open() and FileSystem.create() calls will detect if the pathname is a symbolic link. If so, it transparently will open the path pointed to by the symbolic link.

    Symbolic links can be either relative or absolute. if the symbolic link is a complete URI, then nothign needs to be done. If it is a relative pathname, the dfs-client side code will invoke makeAbsolute to make it a full pathname.




    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 4, 2008 at 10:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628486#action_12628486 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------

    A great use of symbolic links might be to enhance archives to link to their content.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 5, 2008 at 6:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628691#action_12628691 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    After a discussion with Joydeep, we came up with the idea that that FileSystem.open() will transparently resolve a symbolic link if the symbolic link is a relative or if it is of the form hdfs://bar/foo.txt. This means that a HDFS symbolic link can point only to other hdfs pathnames.

    In future, if the symbolic link is of the form file://dir/foo.txt, then the FileSystem API can return it to higher layers for interpretation. This will allow an HDFS symbolic link to refer to a pathname that is not a HDFS pathname. I do not plan to implement this feature in the short term.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 5, 2008 at 8:09 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628717#action_12628717 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    This means that a HDFS symbolic link can point only to other hdfs pathnames.
    What's the point of this restriction? Why not permit symbolic links to arbitrary URIs?
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 5, 2008 at 8:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628735#action_12628735 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------

    I think it will result in a cleaner design/implementation to handle this at FileSystem level (thus handling arbitrary URIs).
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 7, 2008 at 8:00 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628943#action_12628943 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------
    This means that a HDFS symbolic link can point only to other hdfs pathnames.
    @Doug : I meant to say that "A HDFS symbolic link that does not have a full URI (absolute with schema, etc) will point to a path in the same HDFS instance".

    @Raghu: Yes, it should be handed at the FileSystem level.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur

    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 7, 2008 at 9:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-4044:
    -------------------------------------

    Attachment: symLink1.patch

    This patch demonstrates the proposed changes to the FileSystem API. I would like early-comments to the changes to the public API before I implement the support for symlinks in HDFS.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 8, 2008 at 6:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629246#action_12629246 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------

    I think a better API might be:

    {code}
    public FSDataInputStream open(Path f, int bufferSize) throws IOException {
    FileSystem fs = this;
    FileStatus stat = fs.getFileStatus(f);
    for (stat.isSymLink()) {
    f = stat.getSymLink();
    fs = f.getFileSystem(getConf());
    stat = fs.getStatus(f);
    }
    return fs.openData(stat, bufferSize);
    }

    public abstract FSDataInputStream openData(FileStatus stat, int bufferSize) throws IOException;
    {code}

    We could, for back-compatibility, have openData() default to calling open() for one release, but I think most if not all FileSystem implementations are included in Hadoop, so we're mostly concerned about client back-compatibilty here, not implementation back-compatibility.

    Also, FileStatus#getSymLink() should return a Path, not a String, no?
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 8, 2008 at 6:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629255#action_12629255 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------

    Both have extra RPC (or system call in case of LocalFS) for open of a normal file (common case). How about something like :

    {code}
    public FSDataInputStream open(Path f, int bufferSize) throws IOException {
    FileSystem fs = this;
    for (int r = 0; r < maxRecursion; r++) {
    try {
    return fs.open(f, bufferSize, ..);
    catch (SymLinkNotFollowedException e) {
    f = e.getSymLink();
    fs = f.getFileSystem(getConf());
    }
    }
    }
    {code}
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 8, 2008 at 7:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629263#action_12629263 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    Both have extra RPC (or system call in case of LocalFS) for open of a normal file (common case).
    Exceptions should not be used for normal program control. If the extra RPC is a problem, then a FileSystem can implement open() directly itself as an optimization. LocatedBlocks could be altered to optionally contain a symbolic link. LocalFileSystem can also override open(), since the native implementation already does the right thing. Would the code above do something reasonable on S3 & KFS, or would we need to override open() for those too? If we end up overriding it everywhere then it's probably not worth having a default implementation and having openData(), but we should rather just require that everyone implement open() to handle links.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 8, 2008 at 7:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629266#action_12629266 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------
    If the extra RPC is a problem, then a FileSystem can implement open() directly itself as an optimization.
    That imposes restriction on what SymLink can point to.
    Exceptions should not be used for normal program control.
    agreed. we could of course propose another FS.open() that returns something more than an FSInputStream.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 8, 2008 at 8:12 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629271#action_12629271 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    That imposes restriction on what SymLink can point to.
    I don't follow what you mean here.
    we could of course propose another FS.open() that returns something more than an FSInputStream.
    Yes, I drafted this and deleted it, since it's rather hairy and not clearly an advantage to anything but HDFS. Here it is:

    {code}
    public class FileOpening {
    public FileOpening(boolean isLink, Path link) { ... }
    public boolean isLink() { ... }
    public Path getLink() { ... }
    }

    public FileOpening getFileOpening(Path f) throws IOException {
    FileStatus stat = getFileOpening(f);
    return new FileOpening(stat.isLink(), stat.getLink());
    }

    public FSDataInputStream open(Path f, int bufferSize) throws IOException {
    FileSystem fs = this;
    FileOpening opening = fs.getFileOpening(f);
    for (opening.isSymLink()) {
    f = opening.getSymLink();
    fs = f.getFileSystem(getConf());
    opening = fs.getOpening(f);
    }
    return fs.openData(f, opening, bufferSize);
    }

    public abstract FSDataInputStream openData(Path f, FileOpening opening, int bufferSize)
    throws IOException;
    {code}

    HDFS would override getFileOpening() to return a subclass that also contains block locations. Phew! Would anyone leverage this but HDFS?
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 8, 2008 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629272#action_12629272 ]

    rangadi edited comment on HADOOP-4044 at 9/8/08 1:13 PM:
    --------------------------------------------------------------
    That imposes restriction on what SymLink can point to.
    or may be not.. if an implementation (e.g. DistributedFileSystem) has all the information to instantiate another file system.

    One alternative is to have open() return {{'SymLinkFSInputStream extends FSInputStream'}} when a link is not followed by the implementation. A read on it will return bytes representinng the link (and it can have getSymLink() method).

    was (Author: rangadi):
    That imposes restriction on what SymLink can point to.
    or may be not.. if an implementation (e.g. DistributedFileSystem) has all the information to instantiate another file system.

    One alternative is to have open() return '{{ SymLinkFSInputStream extends FSInputStream }}' when a link is not followed by the implementation. A read on it will return bytes representinng the link (and it can have getSymLink() method).
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 8, 2008 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629272#action_12629272 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------
    That imposes restriction on what SymLink can point to.
    or may be not.. if an implementation (e.g. DistributedFileSystem) has all the information to instantiate another file system.

    One alternative is to have open() return '{{ SymLinkFSInputStream extends FSInputStream }}' when a link is not followed by the implementation. A read on it will return bytes representinng the link (and it can have getSymLink() method).
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 8, 2008 at 8:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629284#action_12629284 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------

    We can sort of consider FileSystem to be at similar level as a VFS in Linux. In order to handle similar extensions in future as well, I think this might be good time for an implementation to take a object (wrapping arguments) and return a object that contains relevant info. Main goal is not to require actual implementation API changes when something like this is added.


    symlinks need to be handled for file system operations as well... create(), getListing, etc.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Pete Wyckoff (JIRA) at Sep 8, 2008 at 9:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629291#action_12629291 ]

    Pete Wyckoff commented on HADOOP-4044:
    --------------------------------------

    keep in mind that we will want to implement as close to posix as possible an api for this in libhdfs: https://issues.apache.org/jira/browse/HADOOP-4118. We may not expose all the functionality of general URIs, but for the case of hdfs only, I think the API should be as simple as possible.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sanjay Radia (JIRA) at Sep 9, 2008 at 4:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629524#action_12629524 ]

    Sanjay Radia commented on HADOOP-4044:
    --------------------------------------

    There are the following 4 kinds of symbolic links:

    * dot relative - relative to the directory in which the symbolic link exists
    ** these symbolic links can be processed in the NN without kicking it back to the client
    * volume root relative - this is relative to the root of the NN's root
    ** these symbolic links can be processed in the NN without kicking it back to the client
    * Relative to another file system
    ** these are essentially a symbolic mount. Good use case are remote NNs or Hadoop Archives
    ** these symbolic links needs to be kicked back to the client to be processed on the client side.
    * Relative to the root of the client's root (ie to where the client's / points to)
    ** The main use case for these is where I have a symbolic link t a genric location (say /tmp) that is best
    picked from the client's environment.
    ** these symbolic links needs to be kicked back to the client to be processed on the client side.
    ** I believe we can avoid this last one for the first implementation since I don't think the use case is strong.



    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 9, 2008 at 7:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629575#action_12629575 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------

    Right, we should optimize dot-relative and volume-root-relative links so that they're resolved with a single RPC.

    In all cases, with HDFS links into the same filesystem, we should make just a single RPC to its namenode on open. We used to have an open() RPC, but that was replaced with getBlockLocations() when we noticed that open was no different than getBlockLocations(). With symlinks, we should probably add an open() call again.

    Open() could return a struct with 3 fields: isLink, linkTarget and blockLocations. Dot-relative and volume-root-relative links can be resolved on the NN, as follows:
    file type|isLink|linkTarget|blockLocations||
    regular|false|null|non-null|
    dot relative link|true|resolved path|non-null|
    volume-root relative link|true|resolved path|non-null|
    link to other filesys|true|foreign path|null|
    Does that make sense?


    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sanjay Radia (JIRA) at Sep 9, 2008 at 7:50 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629584#action_12629584 ]

    Sanjay Radia commented on HADOOP-4044:
    --------------------------------------

    Volume relative symbolic links (e.g. dot relative links) can be handled _within_ the NN. However, whenever a symbolic link with a remote target is crossed the client side needs to handle it - I am calling this a kickback.
    Note that a symbolic link can occur during an open, create, chmod or any other operation that supplies a path name. Further it can occur _anywhere_ in the path.
    Hence:
    open("/foo/bar/zoo")
    bar may be a symbolic link to a remote volume.
    A getFileStatus("/foo/bar/zoo") cannot return a status of "is symbolic link": zoo is not a symbolic link; bar is the

    Any operation that involves a path name, can issue a kickback which says: "I crossed a symbolic link whose target is another file system. I processed a part of the path and the remaining path is xxx; please resolve the remaining path xxx at the target of the symbolic link."
    In the above example the xxx (rest of path to process) is "zoo".

    One way to provide this kickback is to use an exception. (I consider an alternate below)
    _The issue raised in the other comments in this Jira is whether or not an exception is suitable here - i.e. is this normal or abnormal operation of the NN?_
    Following a *remote* symbolic link is not normal for a NN; it is not normal for a NN to recursively call another *remote* NN hence the exception is a reasonable way to deal with this situation.
    The FileSystem interface should clearly NOT throw such an exception because following symbolic links are a normal part of that interface; in this case I am not suggesting that the FileSystem throws an exception, merely the NN throws that exception.

    Many NameServices handle remote mounts as exceptions or kickbacks. For example the spring name service at Sun has optional trust relationship between name servers. If a symbolic link was to a name server with which it had trust relation then then it followed the symbolic link recursively (using DNS terminology) and otherwise it did a kickback via an exception and the client side followed the links iteratively (using DNS terminology again).

    An alternate way (really owen's suggestion) is to have open return a "FD" style object that contains the kickback information inside the FD object. The problem is that we will need a similar object for all operations that involve a pathname: open, stat, chmod, rename, etc. (Owen please comment this).
    Hence I feel the use of the exception is a better way to do this.

    BTW I have a document on the above that I started on a few months ago at Yahoo but it is not completed. Dhruba suggested that I attach it above; will do so over the next week or so.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 9, 2008 at 8:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629609#action_12629609 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    a symbolic link can occur during an open, create, chmod [ ... ] anywhere in the path.
    Thanks for clarifying. I had not considered all of these cases.
    I am not suggesting that the FileSystem throws an exception, merely the NN throws that exception.
    That's less of an issue, indeed.
    The problem is that we will need a similar object for all operations that involve a pathname: open, stat, chmod, rename, etc.
    Could we use a common base class for the return value of all of these RPC methods? The base class could have a "symLink" flag and fields that the client could check. Is that better than exceptions?
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at Sep 9, 2008 at 11:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629656#action_12629656 ]

    Robert Chansler commented on HADOOP-4044:
    -----------------------------------------

    Is there a presumption that if (as in Sanjay's example) /foo/bar is a link to "X" that /foo/bar/zoo is exactly "X/zoo"? This might be problematic if X is a reference to another system, introducing a requirement for interfaces everywhere to have a "root" and "remaining" parameters. Or maybe as a practical matter this is unimportant. In any case the rules need to be clear. Is X="/a/b" different from X="/a/b/" when resolving /foo/bar/zoo? Are there cases where "/foo/bar/zoo" is not permitted as a path on the local system, requiring that path syntax checking be done late just in case /foo/bar is a link?


    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Sep 10, 2008 at 2:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629680#action_12629680 ]

    Konstantin Shvachko commented on HADOOP-4044:
    ---------------------------------------------

    sanjay> merely the NN throws that exception.

    May be a symlink exception from the NN is not bad, but wrapping the SymLink data inside the exception seems to be a bad way of returning data back to the client. I mean this part of the code proposed above does not look right
    {code}
    catch (SymLinkNotFollowedException e) {
    f = e.getSymLink();
    }
    {code}
    I would definitely prefer to have a common base class returned as Doug proposes.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Sep 10, 2008 at 2:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629681#action_12629681 ]

    Konstantin Shvachko commented on HADOOP-4044:
    ---------------------------------------------

    Another observation is that usually symbolic links are local (Volume relative) but they can point to mounted namespaces. So may be we should support 2 extra inode types:
    - symlink: volume relative only
    - mount points.
    That of course does not eliminate problems discussed, but introduces a clear distinction between volume local and remote file system symlinks.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 10, 2008 at 5:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629714#action_12629714 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------
    catch (SymLinkNotFollowedException e) { [...]
    Just to clarify. the above was only mentioned as a way for _DFSClient_ to inform _FileSystem_ to overcome limitation of return type for open(). Not as an exception thrown by NameNode to client.

    Of course, exception is not necessary if open and friends are modified to return class/struct etc.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 10, 2008 at 10:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630015#action_12630015 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    I would like to propose that the symbolic links point only to files (and not to directories). Why this restriction:

    1. Symbolic links that point to directories are more like mount-points.
    2. My current use-case (HADOOP-4058) does not require symbolic links to point to directories.
    3. Directories in HDFS are more like object-references to file(s), they do not yet have all POSIX semantics associated with them. The clients do not parse components of the directory path, instead sends the entire pathname to the server for a lookup.
    4. Avoiding symbolic links that point to directories keeps the code simple (till we really need it). An open(/foo/bar/zoo) where bar is a symbolic link does not arise.

    In this proposal, a symbolic link can still be relative or absolute. No kickback (via exception or otherwise) is needed to implement this proposal. Also, this proposal does not preclude implementing the "symbolic links to directories" in the future.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 10, 2008 at 10:57 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630023#action_12630023 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------
    In this proposal, a symbolic link can still be relative or absolute. No kickback (via exception or otherwise) is needed to implement this proposal.
    Does it mean extra RPC as in the patch? If not, how does open() look?
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 10, 2008 at 10:59 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630026#action_12630026 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    symbolic links point only to files (and not to directories).
    Isn't that hard to control? How would we stop someone from replacing the target of a symlink with a directory?

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Allen Wittenauer (JIRA) at Sep 10, 2008 at 11:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630028#action_12630028 ]

    Allen Wittenauer commented on HADOOP-4044:
    ------------------------------------------

    symbolic links to directories allows for admins to move content around without disrupting user processes. without symbolic links to directories, we'd end up creating a symbolic link for each file.

    that seems extremely counter-productive and more likely to cause the name node significant amount of grief.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 11, 2008 at 8:12 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630123#action_12630123 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed.

    @Doug: By definition, the filesystem has no control about the target of a symbolic link. The user could make it point to a non-existent path too. There will be no enforcement of whether the target is a directory or not.

    @Allen: This proposal should not introduce additional grief for namenode. If you replace a file with a symbolic link, it still occupies one inode. No additional inodes are necessary. I agree that admins might like symbolic links to directories, but maybe we can do that at a later date when the user-case becomes more concrete. Does it sound acceptable?

    The simplicity for the filesystem is that no "kickbacks" while traversing paths are needed.

    {noformat}

    /**
    * Opens an FSDataInputStream at the indicated Path.
    * This does not follow symbolic links.
    * @param f the file name to open
    * @param bufferSize the size of the buffer to be used.
    */
    public abstract FSDataInputStream openfs(Path f, int bufferSize)
    throws IOException;

    /**
    * Opens an FSDataInputStream at the indicated Path.
    * If the specified path is a symbolic link, then open the
    * target of the symbolic link.
    * @param f the file to open
    */
    public FSDataInputStream open(Path f) throws IOException {
    return open(f, getConf().getInt("io.file.buffer.size", 4096));
    }

    /**
    * Opens an FSDataInputStream at the indicated Path.
    * @param f the file to open
    * @param bufferSize the size of the buffer to be used.
    */
    public FSDataInputStream open(Path f, int bufferSize)
    throws IOException {
    FileSystem fs = this;
    FSDataInputStream in = fs.openfs(f, bufferSize);
    while (in.isSymlink()) {
    // construct new path pointed to by symlink
    Path newpath = new Path(in.getSymlink());
    if (!newpath.isAbsolute()) {
    newpath = new Path(f.getParent(), newpath);
    }
    in.close();
    f = newpath;
    fs = f.getFileSystem(getConf());
    LOG.warn("XXX Opening symlink at " + f);
    in = fs.openfs(f, bufferSize);
    }
    return in;
    }


    public class FSDataInputStream extends DataInputStream
    implements Seekable, PositionedReadable, StreamType {
    ....
    }


    /** Types of streams */
    interface StreamType {
    /**
    * Is this stream a symbolic link?
    */
    public boolean isSymlink() throws IOException;

    /**
    * Return the contents of the symlink
    */
    public String getSymlink() throws IOException;
    }

    {noformat}
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 11, 2008 at 8:16 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-4044:
    -------------------------------------

    Attachment: symLink1.patch

    This patch implements supporting symbolic links to files (not directories). This is attached to discuss/demonstrate the code-simplicity of this proposal.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 11, 2008 at 4:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630239#action_12630239 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    There will be no enforcement of whether the target is a directory or not.
    To be clear, under your proposal, if one does link hdfs:///foo/bar to s3:///bar, and s3:///bar/baz exists, if one tries to list hdfs:///foo/bar/baz one will get a FileNotFound exception. Is that right? If so, it seems like a major loss in functionality.


    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 11, 2008 at 4:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630242#action_12630242 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------

    bq. @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed.

    Ok, this is one of the options for mentioned earlier that works just for open(). As the earlier discussions in the Jira shows, symlinks is kind of fundamental change to filesystem and affects various parts. Just like open(), once various options for general handling of links is discussed here, I don't think patch will be much more complicated than what you have attached.

    Symlinks have been used for many decades and I don't think there any lack of use cases. I don't think HADOOP-4058 should alone be the criteria for this Jira.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 11, 2008 at 4:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630242#action_12630242 ]

    rangadi edited comment on HADOOP-4044 at 9/11/08 9:23 AM:
    ---------------------------------------------------------------

    bq. @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed.

    Ok, this is one of the options mentioned earlier that works just for open(). As the discussion in the Jira shows, symlinks is kind of fundamental change to filesystem and affects various parts. Just like open(), once various options for general handling of links is discussed here, I don't think patch will be much more complicated than what you have attached.

    Symlinks have been used for many decades and I don't think there any lack of use cases. I don't think HADOOP-4058 should alone be the criteria for this Jira.


    was (Author: rangadi):
    bq. @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed.

    Ok, this is one of the options for mentioned earlier that works just for open(). As the earlier discussions in the Jira shows, symlinks is kind of fundamental change to filesystem and affects various parts. Just like open(), once various options for general handling of links is discussed here, I don't think patch will be much more complicated than what you have attached.

    Symlinks have been used for many decades and I don't think there any lack of use cases. I don't think HADOOP-4058 should alone be the criteria for this Jira.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 11, 2008 at 4:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630246#action_12630246 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    @Doug: Your example scenario is correct. It could be a major loss in functionality. But maybe we can put in the additional complexity and reap the additional benefits at a later date. I am assuming that the need for the additional functionality can wait for some time.

    @Raghu: I agree that synlinks have many general purpose use-cases. But there are also file system systems (very similar to Hadoop :-))that do not support symlinks/hardlinks at all.

    For symlinks supported by various OS: the OS calls the file system with a ??lookup?? call for every path component. The filesystem resolves only one piece of the path component at a time. In HDFS, the FileSystem is passed in the entire path. Thus, the technique to resolve partial paths becomes messy and does not have any precedence. That's why I am proposing that we delay this part of the functionality.

    Comments/feedback welcome.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 11, 2008 at 6:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630279#action_12630279 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------
    I agree that synlinks have many general purpose use-cases. But there are also file system systems (very similar to Hadoop )that do not support symlinks/hardlinks at all.
    yes. They probably don't have symlinks proposed in this jira either. I mainly wanted say once interface is there, implementation of real symlinks is not much more complicated than the attached patch. I could be wrong...
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Joydeep Sen Sarma (JIRA) at Sep 11, 2008 at 6:32 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630292#action_12630292 ]

    Joydeep Sen Sarma commented on HADOOP-4044:
    -------------------------------------------

    wrt Allen's comment - i think it's not without merit. he's probably thinking about moving things within the same hdfs instance - in which case having to have a symlink per leaf is going to lead to doubling of inodes (for the subtree being moved).

    Couple of orthogonal ideas:
    1. would it not be easy to write a new file system client that does component by component lookup? (instead of DFSClient behavior). this would not seem to require any namenode changes.
    2. support symlinks only within the same hdfs instance. the namenode can resolve things internally - no changes to the client.

    then we can combine these two approaches. The namenode can return specific error if it finds the path to be resolved has reference to external fs (using work done in #2). The regular DFSClient can revert to special client file system that does component by component lookup (work done in #1) in case of this error. this will prevent having to incur the overhead of component by component lookup unless one hits a symlink that crosses fs boundary.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 11, 2008 at 8:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630354#action_12630354 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    would it not be easy to write a new file system client that does component by component lookup?
    Sure, that's possible, but it would result in lots more RPCs per file operation. We don't want to multiply the load on the namenode in this way. So this should only involve one RPC per foreign link traversed rather than per component.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Joydeep Sen Sarma (JIRA) at Sep 11, 2008 at 9:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630390#action_12630390 ]

    Joydeep Sen Sarma commented on HADOOP-4044:
    -------------------------------------------

    Doug - that's what i was proposing. doing it in a way where the component by component lookup is only done when sym links are encountered.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 11, 2008 at 9:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630395#action_12630395 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------
    doing it in a way where the component by component lookup is only done when sym links are encountered.
    +1. this was implicit in most of the proposals I think.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 11, 2008 at 10:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630409#action_12630409 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------
    component by component lookup is only done when sym links are encountered.
    It depends on what you mean by "component-by-component". If, in hdfs://nn/foo/bar/baz/boo, baz is a symbolic link to a different filesystem, then we should only make a single RPC to nn to resolve this link, not one for /foo, one for /foo/bar, etc.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 11, 2008 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630421#action_12630421 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    So folks,

    Proposal1 : Do you think that it is a good idea to put in a subset of the full symlink functionality (without support for symlinks to directories) as a first cut? And then, if a use case arises, we can do the full symlink functionality at a later date.

    Proposal 2: Proposal 1 + if the symlink is for a pathname in the same namenode, then the namenode will resolve the symlink transparently. This will allow symbolic links to directories that point to paths inside the same namenode.

    Please respond by putting +1/-1/0 for the two proposals. The reason I am not implementing the full set of functionality right now is because 0.19 cutoff data is only a few days away!
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 12, 2008 at 12:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630428#action_12630428 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------

    -1 for the reasons for not doing the full symlinks : It will miss 0.19 and/or it is too complex.

    -0 for Proposal1
    -1 for Proposal2
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Sep 12, 2008 at 12:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630437#action_12630437 ]

    Doug Cutting commented on HADOOP-4044:
    --------------------------------------

    What use cases are there for filesystem-internal symlinks? The really interesting use cases are cross-filesytem, since that could make archives transparent, and also permit a volume-monting-like style, permitting a single namespace to span multiple clusters, potentially making the namenode less of a bottleneck.

    Without some strong use cases I'm -1 for any implementation that does not support cross-filesystem links.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Bryan Duxbury (JIRA) at Sep 12, 2008 at 12:38 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630440#action_12630440 ]

    Bryan Duxbury commented on HADOOP-4044:
    ---------------------------------------

    We would use filesystem-internal symlinks as a sort of fast-copy in situations where parts of our dataset are actually processed and others are merely copied. Technically, a hard link or copy-on-write scheme (ala GFS) would be more suited, but we could make do with symlinks.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Allen Wittenauer (JIRA) at Sep 12, 2008 at 12:52 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630444#action_12630444 ]

    Allen Wittenauer commented on HADOOP-4044:
    ------------------------------------------

    -1 to both, because I think we're rushing this without a significant amount of thought as to what a multi-filesystem directory structure should look like.

    In my mind, cross-filesystem "symlinks" really don't seem like "real" symlinks to me, but a sort of hacked version of autofs support. How do you determine if a file is "real" or not? How do I "undo" a symlink? What does this look like from an audit log perspective? Is this really just a "short cut" to providing the equivalent of mount?

    If we ignore what I said above :) , I'm trying to think of a use case where I wouldn't want to symlink an entire dir. A single file? Really?

    I think it is much more interesting to provide a symlink to a directory. I can have "data/v1.423" and then have a symlink called LATEST that points to the latest version of a given versioned data set.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Sep 12, 2008 at 1:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630447#action_12630447 ]

    Konstantin Shvachko commented on HADOOP-4044:
    ---------------------------------------------

    I am -1 on partial implementations, because your case of file-only symlinks can be handled on the application level.

    In fact path resolution as Joydeep proposed can be handled outside the file system without checking each path component.
    We just need to throw FileNotFoundException with the exact path prefix that has not been found.
    E.g. you want to open a file {{path = /dir0/link/dir1/file}} where {{link}} is a file containing a symlink {{/dir3/dir4/}}.
    If you call {{open(path)}} a {{FileNotFoundException}} will be thrown with the message "File is not found: {{/dir0/link/dir1/}}".
    Now the application reads contents of file {{/dir0/link}}, creates a new {{path = /dir3/dir4/dir1/file}}, and calls {{open(path)}} again.
    An optimization to this can be that name-node throws a new {{DirectoryNotFoundException("Directory is not found: /dir0/link")}}
    then the client knows for sure that "link" does exist, but is a file rather than a directory, and therefore can be a symlink.
    Seems to me
    - this suites your case,
    - requires minimal changes to the fs internals and
    - does not affect existing hdfs behavior when links are not used.
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 12, 2008 at 3:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630470#action_12630470 ]

    Raghu Angadi commented on HADOOP-4044:
    --------------------------------------

    Also this is useful and and important enough feature to deserve an exception for 0.19.x release if 0.20.0 is too late or has too much other stuff.
    In my mind, cross-filesystem "symlinks" really don't seem like "real" symlinks to me, but a sort of hacked version of autofs support.
    If we think of Hadoop FileSystem as VFS, and HDFS, S3, LocalFS etc as various filesystems, then cross FS symlinks look more like real symlinks (like link in ext3 linking to a directory in NTFS). Yes, they are still not very real since a symlink in LocalFS can not point to "hdfs://users" (actually it is possible to handle those to certain extent).
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sanjay Radia (JIRA) at Sep 15, 2008 at 3:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631045#action_12631045 ]

    Sanjay Radia commented on HADOOP-4044:
    --------------------------------------

    Symlink vs mount distinction.
    -------------------------------------------
    The reason the two concepts merge in Hadoop (while they are distinct in other systems) is because Hadoop has file URIs that can point to arbitrary files/dirs in arbitrary volumes. This allows a mount to be merely specified by a URI of the target (as oppose to server name, protocol etc.).

    I think Allen has a valid point that remote symlinks to dirs are more useful than symlinks to files, especially for moving load (storage load or operational load) off to other name servers.

    @JoyDeep's component by component look up file system.
    He proposes a two step approach -- if error occurs in following a symlink, then do component by component look up. If one is willing to return an error on a symlink then it equally easy to send more information with the the error ; the error information can denote the remaining path and one can easily implement symlinks efficiently.


    Implementing a subset
    ------------------------------
    We should do a design that works for symliks to dirs or files; then implementing a subset to files is fine.

    The use case for symlinks to directories is strong:
    * HADOOP-4058 is made stronger using symlinks to dirs as Allen points out. Offloading entire dirs to a remote volume is as useful if not more useful than offloading individual files.
    * Symlinks to archive file systems (har://...) is useful and helps make the archive transparent.
    * Filesystems that have symlinks to files also have symlinks to dirs. (My own use of symlinks in Unix is mostly to dirs rather than files.)
    * Most file systems have a mount operation and its use case is well established.
    Mounts, as pointed above, is best achieved in Hadoop as a symlink to dirs due to Hadoop's URI names. A separate API for mounts is not needed.

    I am a little puzzled about why folks think kickbacks are hard. It will take a few discussions to get the internal interfaces right but it is not that complicated. Hence it is hard to do this for 0.19 but we can do it for 0.20.
    Dhruba, looks like Facebook wants this subset feature in 0.19 - other wise it is best to wait for .20 and do it right.

    My vote is that if we can determine that the API for symlinks to files is strict subset then we can go ahead and do the subset in 0.19. Otherwise wait till 0.20 for the full solution.
    (I think we can do the dot relative symlinks later since Facebook's use case does not need it in 0.19).

    We also need to determine the semantics of other file system operations besides open.
    A good way to proceed is to create a table with a column for the full solution and another for the symlinks-to- files subset.
    Q If the symlink target happens to be a directory (or later gets changed to a dir) do we throw an exception?
    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Sep 15, 2008 at 5:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631071#action_12631071 ]

    dhruba borthakur commented on HADOOP-4044:
    ------------------------------------------

    +1 to Sanjay's proposal. I am all for symlinks to any arbitrary path (dirs or files). My initial proposal was to gauge the feeling of the community on figuring out whether it is better to get symlinks to files into 0.19 and then continue to work to make symlinks to any arbitrary path in 0.20. The code is not difficult, but there will some changes to existing interfaces for DistributedFileSystem.java.

    "kickbaks" are hard, not from a design point of view but for coding point of view....especially because we do not want to throw UnresolvedPathException all the way back to the client. This means, the NameNode has to send back the "kickback" information as part of the result code for almost every RPC. This, in turn, means that almost all APIs in ClientProtocol will have to be modified.

    Create symbolic links in HDFS
    -----------------------------

    Key: HADOOP-4044
    URL: https://issues.apache.org/jira/browse/HADOOP-4044
    Project: Hadoop Core
    Issue Type: New Feature
    Components: dfs
    Reporter: dhruba borthakur
    Assignee: dhruba borthakur
    Attachments: symLink1.patch, symLink1.patch


    HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 29, '08 at 6:29a
activeJun 16, '09 at 8:09a
posts192
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase