FAQ
S3 listSubPaths bug
-------------------

Key: HADOOP-1061
URL: https://issues.apache.org/jira/browse/HADOOP-1061
Project: Hadoop
Issue Type: Bug
Components: fs
Affects Versions: 0.12.0, 0.11.2
Reporter: Mike Smith
Priority: Critical


I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.

I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);

instead of

S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);

in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!

But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.

Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?

example:

Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Mike Smith (JIRA) at Mar 4, 2007 at 10:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477874 ]

    Mike Smith commented on HADOOP-1061:
    ------------------------------------

    As a temporary solution I replaced the listSubPath method of Jets3tFileSystemStore class with this new mathod, which simply check if there is any tail after delimiter and also does not pass zero to s3Service.listObjects:

    public Set<Path> listSubPaths(Path path) throws IOException {
    try {
    String prefix = pathToKey(path);
    if (!prefix.endsWith(PATH_DELIMITER)) {
    prefix += PATH_DELIMITER;
    }
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    Set<Path> prefixes = new TreeSet<Path>();
    for (int i = 0; i < objects.length; i++) {
    if(objects[i].getKey().replaceAll(prefix,"").indexOf(PATH_DELIMITER)<0)
    prefixes.add(keyToPath(objects[i].getKey()));
    }
    prefixes.remove(path);
    return prefixes;
    } catch (S3ServiceException e) {
    if (e.getCause() instanceof IOException) {
    throw (IOException) e.getCause();
    }
    throw new S3Exception(e);
    }
    }
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical

    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 5, 2007 at 1:19 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478031 ]

    Tom White commented on HADOOP-1061:
    -----------------------------------

    Mike,

    I agree we should use the listObjects call that doesn't take a maxListingLength. JetS3t retrieves them in batches of 1000 behind the scenes for us. (Note that listDeepSubPaths uses the correct form of the method.)

    Does dropping the URL encoding in Jets3tFileSystemStore cause any problems with root ("/") - I seem to remember it may have. However, it's wrong to double encode, so I would suggest removing the encoding from Jets3tFileSystemStore and seeing if the tests pass. If they don't we should go from there (rather than using double encoding because it happens to work).

    Would you be able to create a patch for this? Thanks!
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical

    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mike Smith (JIRA) at Mar 5, 2007 at 8:27 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mike Smith updated HADOOP-1061:
    -------------------------------

    Attachment: 1061-hadoop.patch

    Tom,

    I made the patch that fixes the listSubPaths. Also, I removed the enconding from Jets3tFileSystemStore and there was no problem and it worked fine and there was no problem with root '/'. But, I didn't add that in the patch because that change is not backward compatible. If someone removes the encoding, then the previous built files are not readable anymore. It would be nice to make a tool to migrate the double encoded files to single '/' files when we make a patch to remove encoding from Jets3tFileSystemStore .

    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 5, 2007 at 10:01 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478174 ]

    Tom White commented on HADOOP-1061:
    -----------------------------------

    S3 files need to be written with a version number of the client library that wrote them (as suggested in HADOOP-930). If we did this now, then we can detect when there is a mismatch and fail fast (and informatively). While it would be possible (but tricky) to support both versions, I don't feel that we should do that since there is a workaround for data migration: copy your S3 data from the old file system to a local file or HDFS (on EC2 preferably, but this isn't necessary) using the old version of Hadoop, then copy it back to a new S3 file system using a new version of Hadoop. I'd be happy to write this.

    (I'm not saying that version-aware code will never be needed, just that it isn't yet since not that many people are using this feature yet.)

    Thoughts?
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aled Jones at Mar 5, 2007 at 10:15 pm

    -----Original Message-----
    From: Tom White (JIRA)
    Sent: 05 March 2007 22:01
    To: hadoop-dev@lucene.apache.org
    Subject: [jira] Commented: (HADOOP-1061) S3 listSubPaths bug


    [
    https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atl
    assian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_1247817
    4 ]
    Tom White commented on HADOOP-1061:
    -----------------------------------

    S3 files need to be written with a version number of the
    client library that wrote them (as suggested in HADOOP-930).
    If we did this now, then we can detect when there is a
    mismatch and fail fast (and informatively). While it would be
    possible (but tricky) to support both versions, I don't feel
    that we should do that since there is a workaround for data
    migration: copy your S3 data from the old file system to a
    local file or HDFS (on EC2 preferably, but this isn't
    necessary) using the old version of Hadoop, then copy it back
    to a new S3 file system using a new version of Hadoop. I'd be
    happy to write this.

    (I'm not saying that version-aware code will never be needed,
    just that it isn't yet since not that many people are using
    this feature yet.)

    Thoughts?
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL:
    https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch


    I had problem with the -ls command in s3 file system. It
    was returning inconsistence number of "Found Items" if you
    rerun it different times and more importantly it returns
    recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t
    library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix,
    PATH_DELIMITER); instead of S3Object[] objects =
    s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0); in
    listSubPaths of Jets3tFileSystemStore class (line 227)!
    This change will let GET REST request to have a "max-key"
    paramter with default value of 1000! It seems s3 GET request
    is sensetive to this paramater!
    But, the recursive problem is because the GET request
    doesn't execute the delimiter constraint correctly. The
    response contains all the keys with the given prefix but they
    don't stop at the path_delimiter. You can simply test this by
    making couple folder on hadoop s3 filesystem and run -ls. I
    followed the generated GET request and it looks all fine but
    it is not executed correctly at the s3 server side.I still
    don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we
    need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be
    encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore
    class. Then, Jets3t will reencode this to make the REST
    request. And it will be rewritten as
    %252Fuser%252Froot%252Ffolder, so the the generated folder on
    the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the
    amazon side. Wouldn't be better to skip the encoding part on
    Hadoop. This strange structure might be the reason that the
    s3 doesn't stop at the path_delimiter.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
    ###########################################

    This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
    For more information, connect to http://www.f-secure.com/

    ************************************************************************
    This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored.

    Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free.
  • Doug Cutting (JIRA) at Mar 5, 2007 at 10:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478181 ]

    Doug Cutting commented on HADOOP-1061:
    --------------------------------------

    We could start adding the version number property with this patch, so that one might write an automatic migration tool later. Adding the version number ASAP is a good idea anyway.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dimple Joseph at Mar 5, 2007 at 11:35 pm
    -------------------------------------------------------------------------------------------

    The contents of this message, together with any attachments, are intended only
    for the use of the individual or entity to which they are addressed and may
    contain information that is confidential and exempt from disclosure. If you are
    not the intended recipient, you are hereby notified that any dissemination,
    distribution, or copying of this message, or any attachment, is strictly prohibited.
    If you have received this message in error, please notify the original sender
    immediately by telephone or by return E-mail and delete this message, along
    with any attachments, from your computer. Thank you.
  • Tom White (JIRA) at Mar 6, 2007 at 9:50 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478330 ]

    Tom White commented on HADOOP-1061:
    -----------------------------------

    Yes, the patch should include

    * Fixing listObjects call
    * Removing URL encoding.
    * Adding version numbering metadata to files. This should use the S3 metadata feature so we can detect mismatches (and throw an exception) before reading the data.
    * Adding filesystem type metadata to files. Again using S3 metadata. This is easy to do at the same time as version and is a pre-requisite for HADOOP-930.

    I'll have a go at this.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 6, 2007 at 9:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-1061:
    ------------------------------

    Attachment: hadoop-1061-v2.patch

    Patch for review.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Mar 6, 2007 at 9:27 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478586 ]

    Doug Cutting commented on HADOOP-1061:
    --------------------------------------

    Should we check not just the FS version, but also the name and type?

    Also, to be clear, this is not back-compatible, right? Is that what we want? Perhaps instead, if something has none of the metadata properties set, we could interpret it as the prior version, for back-compatibility, so folks can still read their S3 data. We might later drop that, as long-term, we might make it so that objects with none of these properties set are treated as simple HTTP resources.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 6, 2007 at 9:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478590 ]

    Tom White commented on HADOOP-1061:
    -----------------------------------

    No this is not back-compatible. It's not actually that hard to make it back-compatible - so I'll produce another patch. I'll check for name and type too.

    (I was thinking a similar thing about the long-term support for HTTP.)
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mike Smith (JIRA) at Mar 6, 2007 at 11:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478625 ]

    Mike Smith commented on HADOOP-1061:
    ------------------------------------

    Tom,

    I checked the second patch and it works fine and detects the version mismatch. Also, it seems "not stopping at the path_delimiter" problem is fixed after removing double encoding.


    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 11, 2007 at 9:56 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-1061:
    ------------------------------

    Attachment: hadoop-1061-v3.patch
    It's not actually that hard to make it back-compatible
    It's a bit harder than I thought when I wrote this, since we have changed the key format. So, you would have to check for both forms of the key before before saying the file or directory doesn't exist. I feel this would complicate the code somewhat.

    I would suggest going with this new patch (v3) which additionally checks name and type. People will have to migrate old data as I originally described above. If this proved problematic, or there was demand, we could write a migration script. (the benefit of this is that it keeps the core S3FileSystem code unencumbered by version migration logic.)
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mike Smith (JIRA) at Mar 15, 2007 at 10:50 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481359 ]

    Mike Smith commented on HADOOP-1061:
    ------------------------------------

    Thanks Tom,

    I wondering if you've got a chance to write any migration script. I do have some data on the old scheme that I cannot afford to lose.

    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 16, 2007 at 9:21 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481537 ]

    Tom White commented on HADOOP-1061:
    -----------------------------------

    I haven't written a migration script yet, but I'll try to do so before 0.13.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 16, 2007 at 9:21 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-1061:
    ------------------------------

    Fix Version/s: 0.13.0
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 1, 2007 at 6:52 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-1061:
    ------------------------------

    Attachment: HADOOP-1061-v4.patch

    This patch (HADOOP-1061-v4.patch) includes MigrationTool to migrate the data written using an older version of S3FileSystem to the latest version. I have successfully tested it on an old filesystem.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 1, 2007 at 6:54 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-1061:
    ------------------------------

    Status: Patch Available (was: Open)
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.12.0, 0.11.2
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 1, 2007 at 7:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492843 ]

    Hadoop QA commented on HADOOP-1061:
    -----------------------------------

    +1

    http://issues.apache.org/jira/secure/attachment/12356557/HADOOP-1061-v4.patch applied and successfully tested against trunk revision r533233.

    Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/97/testReport/
    Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/97/console
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at May 1, 2007 at 4:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492904 ]

    Doug Cutting commented on HADOOP-1061:
    --------------------------------------

    +1 This looks good to me.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at May 2, 2007 at 7:18 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-1061:
    ------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I've just committed this.
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 3, 2007 at 11:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493344 ]

    Hadoop QA commented on HADOOP-1061:
    -----------------------------------

    Integrated in Hadoop-Nightly #77 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/77/)
    S3 listSubPaths bug
    -------------------

    Key: HADOOP-1061
    URL: https://issues.apache.org/jira/browse/HADOOP-1061
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Affects Versions: 0.11.2, 0.12.0
    Reporter: Mike Smith
    Priority: Critical
    Fix For: 0.13.0

    Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch


    I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders.
    I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use :
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
    instead of
    S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0);
    in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater!
    But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter.
    Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!?
    example:
    Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 4, '07 at 9:48p
activeMay 3, '07 at 11:26a
posts23
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase