FAQ
Path should use URI syntax
--------------------------

Key: HADOOP-571
URL: http://issues.apache.org/jira/browse/HADOOP-571
Project: Hadoop
Issue Type: Improvement
Components: fs
Reporter: Doug Cutting


The following changes are proposed:
1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
2. Path's constructor should accept URI-formatted strings & a configuration.
3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.

It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.

(From a brainstorm with Owen.)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Doug Cutting (JIRA) at Nov 13, 2006 at 7:51 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting reassigned HADOOP-571:
    -----------------------------------

    Assignee: Doug Cutting
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting

    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Nov 15, 2006 at 11:05 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Work on HADOOP-571 started by Doug Cutting.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting

    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Nov 20, 2006 at 9:43 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Attachment: uri.patch

    Here's a first version of this. It passes all unit tests except TestCopyFiles. I haven't yet figured out why that one fails.

    A few notes:

    Path is now a wrapper on a URI. For back-compatibility, Path differs from URI in a number of ways. Path.toString() returns an unescaped string, and 'new Path(String)' does not expect an escaped string, unlike the corresponding URI methods. Thus, one can easily construct Paths containing characters like question marks (used in globbing). A Path's URI never has a query or fragment part. Path directories are always normalized as follows: // and \\ are replaced with /, and terminal slashes are removed.

    A FileSystem is now named by a URI containing only a scheme and authority. The local filesystem is thus now named "file:///", and an HDFS filesystem is named something like "hdfs://namenode:50002".

    Configuration properties are used to map from URI scheme to FileSystem implementation. A FileSystem is named by fs.<scheme>.impl. A FileSystem instance is cached for each unique scheme and authority string. Instances are constructed using a no-arg constructor, then the initialize(URI,Configuration) method is called. FileSystem implementations typically check that a Path provided to them indeed belongs to them, and thereafter typically ignore the scheme and authority of the Path's URI, using the URI's path to determine the file.

    In general, a file is now identified by a Path and a Configuration (to determine the default FileSystem). Thus, to operate on a Path, one typically does something like 'path.getFileSystem(conf).open(path)'. We could add convenience methods for things like this to Path, but I have not yet done that.

    In MapReduce, input formats are modified to generate and consume fully-qualified paths. This makes the FileSystem parameter in most InputFormat and OutputFormat methods redundant.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Attachments: uri.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Owen O'Malley (JIRA) at Nov 21, 2006 at 9:09 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=comments#action_12451774 ]

    Owen O'Malley commented on HADOOP-571:
    --------------------------------------

    +1

    after removing the broken tests in TestPath.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Attachments: uri.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Nov 21, 2006 at 11:20 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Attachment: uri2.patch

    Here's a slightly improved version, incorporating Owen's comments. All unit tests now pass on Linux, but TestPath fails on Windows.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Attachments: uri.patch, uri2.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Nov 22, 2006 at 10:14 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Attachment: uri3.patch

    Another update. This now works correctly on Windows. However the contrib/streaming unit tests now fail. My suspicion is that streaming uses path.toString() on paths, then process the strings to create new paths. Such code will likely break, since path syntax has changed. It's better to manipulate Paths with Path methods...
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Attachments: uri.patch, uri2.patch, uri3.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Dec 1, 2006 at 10:38 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Fix Version/s: 0.9.0
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.9.0

    Attachments: uri.patch, uri2.patch, uri3.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Dec 1, 2006 at 10:40 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Fix Version/s: 0.10.0
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Dec 5, 2006 at 9:38 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=comments#action_12455763 ]

    Tom White commented on HADOOP-571:
    ----------------------------------

    While working on HADOOP-574 I came across what looks like a small bug in uri3.patch.

    Here's a failing testcase (add to TestPath#testDots):

    assertEquals(new Path("/foo/bar", ".").toString(), "/foo/bar");

    The actual value is "/foo/bar/" (note trailing slash, and the irony of HADOOP-778!).

    I think it is easy to fix. Change the Path(Path parent, Path child) constructor to use the private initialize method as follows:

    URI resolvedUri = parentUri.resolve(child.uri);
    initialize(resolvedUri.getScheme(), resolvedUri.getAuthority(), normalizePath(resolvedUri.getPath()));

    Cheers,
    Tom
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Dec 8, 2006 at 10:21 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Attachment: uri4.patch

    Here's a version that applies to the current trunk, fixing the problem that Tom mentioned, but that still fails the streaming unit tests.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch, uri4.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Dec 12, 2006 at 8:32 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Attachment: uri5.patch

    Here (finally) is a version that passes all unit tests.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch, uri4.patch, uri5.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Dec 12, 2006 at 8:32 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Status: Patch Available (was: In Progress)

    Unless there are objections, I'll commit this soon.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch, uri4.patch, uri5.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Hadoop QA (JIRA) at Dec 12, 2006 at 8:46 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=comments#action_12457873 ]

    Hadoop QA commented on HADOOP-571:
    ----------------------------------

    +1, http://issues.apache.org/jira/secure/attachment/12347043/uri5.patch applied and successfully tested against trunk revision r486289
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch, uri4.patch, uri5.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Dec 12, 2006 at 11:02 pm
    [ http://issues.apache.org/jira/browse/HADOOP-571?page=all ]

    Doug Cutting updated HADOOP-571:
    --------------------------------

    Status: Resolved (was: Patch Available)
    Resolution: Fixed

    I just committed this.
    Path should use URI syntax
    --------------------------

    Key: HADOOP-571
    URL: http://issues.apache.org/jira/browse/HADOOP-571
    Project: Hadoop
    Issue Type: Improvement
    Components: fs
    Reporter: Doug Cutting
    Assigned To: Doug Cutting
    Fix For: 0.10.0

    Attachments: uri.patch, uri2.patch, uri3.patch, uri4.patch, uri5.patch


    The following changes are proposed:
    1. Add a factory/registry of FileSystem implementations. Given a protocol, hostname and port, it should be possible to get a FileSystem implementation.
    2. Path's constructor should accept URI-formatted strings & a configuration.
    3. A new Path method should be added: FileSystem.getFileSystem(). This returns the filesystem named in the path or the default configured filesystem.
    4. Most methods which currently take FileSystem and Path parameters can be changed to take only Path.
    5. Many FileSystem operations (create, open, delete, list, etc.) can become convenience methods on Path.
    6. A URLStreamHandler can be defined in terms of the FileSystem API, so that URLs for any protocol with a registered FileSystem implementation can be accessed with a java.net.URL, permitting FileSystem implementations to be used on the classpath, etc.
    It is tempting to try to replace Path with java.net.URL, but URL's methods are insufficient for mapreduce. We require directory listings, random access, location hints, etc., which are not supported by existing URLStreamHandler implementations. But we can expose all FileSystem implementations for access with java.net.URL.
    (From a brainstorm with Owen.)
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 2, '06 at 9:51p
activeDec 12, '06 at 11:02p
posts15
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Doug Cutting (JIRA): 15 posts

People

Translate

site design / logo © 2022 Grokbase