FAQ
Files written to S3 but never closed can't be deleted
-----------------------------------------------------

Key: HADOOP-865
URL: https://issues.apache.org/jira/browse/HADOOP-865
Project: Hadoop
Issue Type: Bug
Components: fs
Reporter: Bryan Pendleton


I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".

If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Tom White (JIRA) at Jan 8, 2007 at 9:01 pm
    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463128 ]

    Tom White commented on HADOOP-865:
    ----------------------------------

    Do you know if the files (inodes or blocks) got corrupted, or if a block didn't get written? If you still have the files on S3 then it would be really helpful if you could send an S3 directory listing using a regular S3 tool (e.g. http://www.hanzoarchives.com/development-projects/s3-tools/).

    Thanks.

    BTW nice use of S3FileSystem as an infinite disk!

    Tom


    Files written to S3 but never closed can't be deleted
    -----------------------------------------------------

    Key: HADOOP-865
    URL: https://issues.apache.org/jira/browse/HADOOP-865
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Reporter: Bryan Pendleton

    I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
    If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Bryan Pendleton (JIRA) at Jan 8, 2007 at 9:34 pm
    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463135 ]

    Bryan Pendleton commented on HADOOP-865:
    ----------------------------------------

    Looks like it might actually be .crc related.... but, I thought this file hadn't even been closed at the time.

    Not that an -ls /backups/fon1 reflects:
    /backups/fon1/backup.010507-1739.cpio.bz2.gpg <r 1> 1048576
    Yet, there are some .crc files that have been left from previous -rm operations, so there're probably some other middling problems around.

    %2F
    %2Fbackups
    %2Fbackups%2Ffon1
    %2Fbackups%2Ffon1%2F.backup.010507-1736.cpio.bz2.gpg.crc
    %2Fbackups%2Ffon1%2F.backup.010807-1303.cpio.bz2.gpg.crc
    %2Fbackups%2Ffon1%2Fbackup.010507-1739.cpio.bz2.gpg
    block_-3795133870143584439
    block_-8360567787439934597
    block_8856210385271099486

    I'll keep this data around for a little while, in case you think there are any patches that you'd like me to test.

    Files written to S3 but never closed can't be deleted
    -----------------------------------------------------

    Key: HADOOP-865
    URL: https://issues.apache.org/jira/browse/HADOOP-865
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Reporter: Bryan Pendleton

    I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
    If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 8, 2007 at 10:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463140 ]

    Tom White commented on HADOOP-865:
    ----------------------------------

    I think I've spotted the problem: the deleteRaw method throws an IOException if the inode doesn't exist - unlike the DFS or Local implementation. I'll produce a patch - thanks for the offer to test it.

    Tom
    Files written to S3 but never closed can't be deleted
    -----------------------------------------------------

    Key: HADOOP-865
    URL: https://issues.apache.org/jira/browse/HADOOP-865
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Reporter: Bryan Pendleton

    I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
    If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 9, 2007 at 8:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-865:
    -----------------------------

    Attachment: hadoop-865.patch
    Files written to S3 but never closed can't be deleted
    -----------------------------------------------------

    Key: HADOOP-865
    URL: https://issues.apache.org/jira/browse/HADOOP-865
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Reporter: Bryan Pendleton
    Attachments: hadoop-865.patch


    I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
    If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 9, 2007 at 8:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463394 ]

    Tom White commented on HADOOP-865:
    ----------------------------------

    Bryan,

    The patch should be a simple fix for the problem. If you try "hadoop dfs -rm filename" it should now work.

    Note that -rmr doesn't work yet (I will create another patch for this).

    Thanks,

    Tom
    Files written to S3 but never closed can't be deleted
    -----------------------------------------------------

    Key: HADOOP-865
    URL: https://issues.apache.org/jira/browse/HADOOP-865
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Reporter: Bryan Pendleton
    Attachments: hadoop-865.patch


    I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
    If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Jan 9, 2007 at 9:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting resolved HADOOP-865.
    ---------------------------------

    Resolution: Fixed
    Fix Version/s: 0.10.1
    Assignee: Tom White

    I just committed this. Thanks, Tom!
    Files written to S3 but never closed can't be deleted
    -----------------------------------------------------

    Key: HADOOP-865
    URL: https://issues.apache.org/jira/browse/HADOOP-865
    Project: Hadoop
    Issue Type: Bug
    Components: fs
    Reporter: Bryan Pendleton
    Assigned To: Tom White
    Fix For: 0.10.1

    Attachments: hadoop-865.patch


    I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
    If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJan 8, '07 at 7:12p
activeJan 9, '07 at 9:47p
posts7
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Doug Cutting (JIRA): 7 posts

People

Translate

site design / logo © 2022 Grokbase