FAQ
File length not reported correctly after application crash
----------------------------------------------------------

Key: HADOOP-5157
URL: https://issues.apache.org/jira/browse/HADOOP-5157
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.19.0
Reporter: Doug Judd
Fix For: 0.20.0


Our application (Hypertable) creates a transaction log in HDFS. This log is written with the following pattern:

out_stream.write(header, 0, 7);
out_stream.sync()
out_stream.write(data, 0, amount);
out_stream.sync()
[...]

However, if the application crashes and then comes back up again, the following statement

length = mFilesystem.getFileStatus(new Path(fileName)).getLen();

returns the wrong length. Apparently this is because this method fetches length information from the NameNode which is stale. Ideally, a call to getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • dhruba borthakur (JIRA) at Feb 4, 2009 at 10:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670298#action_12670298 ]

    dhruba borthakur commented on HADOOP-5157:
    ------------------------------------------

    The staleness corrects itself if either another writer opens the file for "appending" to it or the hard limit of 1 hour (the lease recovery period) expires. But I agree that your proposal is better. +1

    It introduces additional latency for the getFileSatus() call, but if we do this only for files that have a lease on it (i.e. a writer was writing to this file), then it should be ok.

    Additionally, the current getFileStatus() call does not retrieve block location information from the namenode. This has to be enhanced to return the location of at least the last block of a file.
    File length not reported correctly after application crash
    ----------------------------------------------------------

    Key: HADOOP-5157
    URL: https://issues.apache.org/jira/browse/HADOOP-5157
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.19.0
    Reporter: Doug Judd
    Fix For: 0.20.0


    Our application (Hypertable) creates a transaction log in HDFS. This log is written with the following pattern:
    out_stream.write(header, 0, 7);
    out_stream.sync()
    out_stream.write(data, 0, amount);
    out_stream.sync()
    [...]
    However, if the application crashes and then comes back up again, the following statement
    length = mFilesystem.getFileStatus(new Path(fileName)).getLen();
    returns the wrong length. Apparently this is because this method fetches length information from the NameNode which is stale. Ideally, a call to getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedFeb 2, '09 at 10:14p
activeFeb 4, '09 at 10:20a
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

dhruba borthakur (JIRA): 2 posts

People

Translate

site design / logo © 2022 Grokbase