FAQ
Hi.

I have a process that writes to file on DFS from time to time, using
OutputStream.
After some time of writing, I'm starting getting the exception below, and
the write fails. The DFSClient retries several times, and then fails.

Copying the file from local disk to DFS via CopyLocalFile() works fine.

Can anyone advice on the matter?

I'm using Hadoop 0.18.3.

Thanks in advance.


09/05/25 15:35:35 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/test.bin File
does not exist. Holder DFSClient_-951664265 does not have any open files.

at
org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1172)

at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
)

at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



at org.apache.hadoop.ipc.Client.call(Client.java:716)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

at java.lang.reflect.Method.invoke(Method.java:597)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)

at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
)

Search Discussions

  • Ken Krugler at Dec 8, 2009 at 4:33 pm
    Hi all,

    In searching the mail/web archives, I see occasionally questions from
    people (like me) who run into the LeaseExpiredException (in my case,
    on 0.18.3 while running a 50 server cluster in EMR).

    Unfortunately I don't see any responses, other than Dennis Kubes
    saying that he thought some work had been done in this area of Hadoop
    "a while back". And this was in 2007, so it hopefully doesn't apply to
    my situation.

    I see these LeaseExpiredException errors showing up in the logs around
    the same time as IOException errors, eg:

    java.io.IOException: Stream closed.
    at org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.isClosed(DFSClient.java:2245)
    at org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.writeChunk(DFSClient.java:2481)
    at
    org
    .apache
    .hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
    at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
    at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
    at org.apache.hadoop.fs.FSDataOutputStream
    $PositionCache.write(FSDataOutputStream.java:49)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at org.apache.hadoop.io.SequenceFile
    $BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
    at org.apache.hadoop.io.SequenceFile
    $BlockCompressWriter.sync(SequenceFile.java:1277)
    at org.apache.hadoop.io.SequenceFile
    $BlockCompressWriter.close(SequenceFile.java:1295)
    at org.apache.hadoop.mapred.SequenceFileOutputFormat
    $1.close(SequenceFileOutputFormat.java:73)
    at org.apache.hadoop.mapred.MapTask
    $DirectMapOutputCollector.close(MapTask.java:276)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2216)

    This issue seemed related, but would have been fixed in the 0.18.3
    release.

    http://issues.apache.org/jira/browse/HADOOP-3760

    I saw a similar HBase issue - https://issues.apache.org/jira/browse/HBASE-529
    - but they "fixed" it by retrying a failure case.

    These exceptions occur during "write storms", where lots of files are
    being written out. Though "lots" is relative, e.g. 10-20M.

    It's repeatable, in that it fails on the same step of a series of
    chained MR jobs.

    Is it possible I need to be running a bigger box for my namenode
    server? Any other ideas?

    Thanks,

    -- Ken
    On May 25, 2009, at 7:37am, Stas Oskin wrote:

    Hi.

    I have a process that writes to file on DFS from time to time, using
    OutputStream.
    After some time of writing, I'm starting getting the exception
    below, and
    the write fails. The DFSClient retries several times, and then fails.

    Copying the file from local disk to DFS via CopyLocalFile() works
    fine.

    Can anyone advice on the matter?

    I'm using Hadoop 0.18.3.

    Thanks in advance.


    09/05/25 15:35:35 INFO dfs.DFSClient:
    org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/
    test.bin File
    does not exist. Holder DFSClient_-951664265 does not have any open
    files.

    at
    org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1172)

    at
    org
    .apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:
    1103
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:
    330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown
    Source)

    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:
    890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)

    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at
    org
    .apache
    .hadoop
    .io
    .retry
    .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at
    org
    .apache
    .hadoop
    .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access
    $1800(DFSClient.java:1745
    )

    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream
    $DataStreamer.run(DFSClient.java:1922
    )
    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g






    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g
  • Jason Venner at Dec 8, 2009 at 5:35 pm
    Is it possible that this is occurring in a task that is being killed by the
    framework.
    Sometimes there is a little lag, between the time the tracker 'kills a task'
    and the task fully dies, you could be getting into a situation like that
    where the task is in the process of dying but the last write is still in
    progress.
    I see this situation happen when the task tracker machine is heavily loaded.
    In once case there was a 15 minute lag between the timestamp in the tracker
    for killing task XYZ, and the task actually going away.

    It took me a while to work this out as I had to merge the tracker and task
    logs by time to actually see the pattern.
    The host machines where under very heavy io pressure, and may have been
    paging also. The code and configuration issues that triggered this have been
    resolved, so I don't see it anymore.
    On Tue, Dec 8, 2009 at 8:32 AM, Ken Krugler wrote:

    Hi all,

    In searching the mail/web archives, I see occasionally questions from
    people (like me) who run into the LeaseExpiredException (in my case, on
    0.18.3 while running a 50 server cluster in EMR).

    Unfortunately I don't see any responses, other than Dennis Kubes saying
    that he thought some work had been done in this area of Hadoop "a while
    back". And this was in 2007, so it hopefully doesn't apply to my situation.

    I see these LeaseExpiredException errors showing up in the logs around the
    same time as IOException errors, eg:

    java.io.IOException: Stream closed.
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2245)
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2481)
    at
    org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
    at
    org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
    at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
    at
    org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at
    org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
    at
    org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1277)
    at
    org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1295)
    at
    org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:73)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:276)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)

    This issue seemed related, but would have been fixed in the 0.18.3 release.

    http://issues.apache.org/jira/browse/HADOOP-3760

    I saw a similar HBase issue -
    https://issues.apache.org/jira/browse/HBASE-529 - but they "fixed" it by
    retrying a failure case.

    These exceptions occur during "write storms", where lots of files are being
    written out. Though "lots" is relative, e.g. 10-20M.

    It's repeatable, in that it fails on the same step of a series of chained
    MR jobs.

    Is it possible I need to be running a bigger box for my namenode server?
    Any other ideas?

    Thanks,

    -- Ken


    On May 25, 2009, at 7:37am, Stas Oskin wrote:

    Hi.
    I have a process that writes to file on DFS from time to time, using
    OutputStream.
    After some time of writing, I'm starting getting the exception below, and
    the write fails. The DFSClient retries several times, and then fails.

    Copying the file from local disk to DFS via CopyLocalFile() works fine.

    Can anyone advice on the matter?

    I'm using Hadoop 0.18.3.

    Thanks in advance.


    09/05/25 15:35:35 INFO dfs.DFSClient:
    org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/test.bin
    File
    does not exist. Holder DFSClient_-951664265 does not have any open files.

    at
    org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1172)

    at

    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at

    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )
    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g






    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g




    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Ken Krugler at Dec 8, 2009 at 7:44 pm
    Hi Jason,
    Hi Jason,

    Thanks for the info - it's good to hear from somebody else who's run
    into this :)

    I tried again with a bigger box for the master, and wound up with the
    same results.

    I guess the framework could be killing it - but no idea why. This is
    during a very simple "write out the results" phase, so very high I/O
    but not much computation, and nothing should be hung.

    Any particular configuration values you had to tweak? I'm running this
    in Elastic MapReduce (EMR) so most settings are whatever they provide
    by default. I override a few things in my JobConf, but (for example)
    anything related to HDFS/MR framework will be locked & loaded by the
    time my job is executing.

    Thanks!

    -- Ken
    On Dec 8, 2009, at 9:34am, Jason Venner wrote:

    Is it possible that this is occurring in a task that is being killed
    by the
    framework.
    Sometimes there is a little lag, between the time the tracker 'kills
    a task'
    and the task fully dies, you could be getting into a situation like
    that
    where the task is in the process of dying but the last write is
    still in
    progress.
    I see this situation happen when the task tracker machine is heavily
    loaded.
    In once case there was a 15 minute lag between the timestamp in the
    tracker
    for killing task XYZ, and the task actually going away.

    It took me a while to work this out as I had to merge the tracker
    and task
    logs by time to actually see the pattern.
    The host machines where under very heavy io pressure, and may have
    been
    paging also. The code and configuration issues that triggered this
    have been
    resolved, so I don't see it anymore.

    On Tue, Dec 8, 2009 at 8:32 AM, Ken Krugler <kkrugler_lists@transpac.com
    wrote:
    Hi all,

    In searching the mail/web archives, I see occasionally questions from
    people (like me) who run into the LeaseExpiredException (in my
    case, on
    0.18.3 while running a 50 server cluster in EMR).

    Unfortunately I don't see any responses, other than Dennis Kubes
    saying
    that he thought some work had been done in this area of Hadoop "a
    while
    back". And this was in 2007, so it hopefully doesn't apply to my
    situation.

    I see these LeaseExpiredException errors showing up in the logs
    around the
    same time as IOException errors, eg:

    java.io.IOException: Stream closed.
    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.isClosed(DFSClient.java:2245)
    at
    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.writeChunk(DFSClient.java:2481)
    at
    org
    .apache
    .hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:
    132)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:
    121)
    at
    org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
    at
    org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
    at
    org.apache.hadoop.fs.FSDataOutputStream
    $PositionCache.write(FSDataOutputStream.java:49)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at
    org.apache.hadoop.io.SequenceFile
    $BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
    at
    org.apache.hadoop.io.SequenceFile
    $BlockCompressWriter.sync(SequenceFile.java:1277)
    at
    org.apache.hadoop.io.SequenceFile
    $BlockCompressWriter.close(SequenceFile.java:1295)
    at
    org.apache.hadoop.mapred.SequenceFileOutputFormat
    $1.close(SequenceFileOutputFormat.java:73)
    at
    org.apache.hadoop.mapred.MapTask
    $DirectMapOutputCollector.close(MapTask.java:276)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2216)

    This issue seemed related, but would have been fixed in the 0.18.3
    release.

    http://issues.apache.org/jira/browse/HADOOP-3760

    I saw a similar HBase issue -
    https://issues.apache.org/jira/browse/HBASE-529 - but they "fixed"
    it by
    retrying a failure case.

    These exceptions occur during "write storms", where lots of files
    are being
    written out. Though "lots" is relative, e.g. 10-20M.

    It's repeatable, in that it fails on the same step of a series of
    chained
    MR jobs.

    Is it possible I need to be running a bigger box for my namenode
    server?
    Any other ideas?

    Thanks,

    -- Ken


    On May 25, 2009, at 7:37am, Stas Oskin wrote:

    Hi.
    I have a process that writes to file on DFS from time to time, using
    OutputStream.
    After some time of writing, I'm starting getting the exception
    below, and
    the write fails. The DFSClient retries several times, and then
    fails.

    Copying the file from local disk to DFS via CopyLocalFile() works
    fine.

    Can anyone advice on the matter?

    I'm using Hadoop 0.18.3.

    Thanks in advance.


    09/05/25 15:35:35 INFO dfs.DFSClient:
    org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/
    test.bin
    File
    does not exist. Holder DFSClient_-951664265 does not have any open
    files.

    at
    org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:
    1172)

    at

    org
    .apache
    .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:
    330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown
    Source)

    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:
    890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)

    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at

    org
    .apache
    .hadoop
    .io
    .retry
    .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at

    org
    .apache
    .hadoop
    .io
    .retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at

    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at

    org.apache.hadoop.dfs.DFSClient
    $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access
    $1800(DFSClient.java:1745
    )

    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream
    $DataStreamer.run(DFSClient.java:1922
    )
    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g






    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g




    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g
  • Mehul Sutariya at Dec 12, 2009 at 1:46 am
    Hey Jason,

    I use Hadoop 0.20.1 and I had seen the lease expired exception RecordWriter
    was closed manually, which means I had my customized OutputFormat. So, after
    closing the writer, framework tries to close the writer as well and fails.
    My best guess here is that somewhere in your job, you are closing the writer
    yourself rather than allowing the framework to do so.

    Mehul.
    On Tue, Dec 8, 2009 at 11:43 AM, Ken Krugler wrote:

    Hi Jason,
    Hi Jason,

    Thanks for the info - it's good to hear from somebody else who's run into
    this :)

    I tried again with a bigger box for the master, and wound up with the same
    results.

    I guess the framework could be killing it - but no idea why. This is during
    a very simple "write out the results" phase, so very high I/O but not much
    computation, and nothing should be hung.

    Any particular configuration values you had to tweak? I'm running this in
    Elastic MapReduce (EMR) so most settings are whatever they provide by
    default. I override a few things in my JobConf, but (for example) anything
    related to HDFS/MR framework will be locked & loaded by the time my job is
    executing.

    Thanks!

    -- Ken


    On Dec 8, 2009, at 9:34am, Jason Venner wrote:

    Is it possible that this is occurring in a task that is being killed by
    the
    framework.
    Sometimes there is a little lag, between the time the tracker 'kills a
    task'
    and the task fully dies, you could be getting into a situation like that
    where the task is in the process of dying but the last write is still in
    progress.
    I see this situation happen when the task tracker machine is heavily
    loaded.
    In once case there was a 15 minute lag between the timestamp in the
    tracker
    for killing task XYZ, and the task actually going away.

    It took me a while to work this out as I had to merge the tracker and task
    logs by time to actually see the pattern.
    The host machines where under very heavy io pressure, and may have been
    paging also. The code and configuration issues that triggered this have
    been
    resolved, so I don't see it anymore.

    On Tue, Dec 8, 2009 at 8:32 AM, Ken Krugler <kkrugler_lists@transpac.com
    wrote: Hi all,
    In searching the mail/web archives, I see occasionally questions from
    people (like me) who run into the LeaseExpiredException (in my case, on
    0.18.3 while running a 50 server cluster in EMR).

    Unfortunately I don't see any responses, other than Dennis Kubes saying
    that he thought some work had been done in this area of Hadoop "a while
    back". And this was in 2007, so it hopefully doesn't apply to my
    situation.

    java.io.IOException: Stream closed.
    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2245)
    at

    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2481)
    at

    org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
    at
    org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
    at
    org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
    at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
    at

    org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at

    org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
    at

    org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1277)
    at

    org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1295)
    at

    org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:73)
    at

    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:276)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)

    This issue seemed related, but would have been fixed in the 0.18.3
    release.

    http://issues.apache.org/jira/browse/HADOOP-3760

    I saw a similar HBase issue -
    https://issues.apache.org/jira/browse/HBASE-529 - but they "fixed" it by
    retrying a failure case.

    These exceptions occur during "write storms", where lots of files are
    being
    written out. Though "lots" is relative, e.g. 10-20M.

    It's repeatable, in that it fails on the same step of a series of chained
    MR jobs.

    Is it possible I need to be running a bigger box for my namenode server?
    Any other ideas?

    Thanks,

    -- Ken


    On May 25, 2009, at 7:37am, Stas Oskin wrote:

    Hi.
    I have a process that writes to file on DFS from time to time, using
    OutputStream.
    After some time of writing, I'm starting getting the exception below,
    and
    the write fails. The DFSClient retries several times, and then fails.

    Copying the file from local disk to DFS via CopyLocalFile() works fine.

    Can anyone advice on the matter?

    I'm using Hadoop 0.18.3.

    Thanks in advance.


    09/05/25 15:35:35 INFO dfs.DFSClient:
    org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/test.bin
    File
    does not exist. Holder DFSClient_-951664265 does not have any open
    files.

    at
    org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1172)

    at


    org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
    )

    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



    at org.apache.hadoop.ipc.Client.call(Client.java:716)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
    )

    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
    )

    at java.lang.reflect.Method.invoke(Method.java:597)

    at


    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
    )

    at


    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
    )

    at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

    at


    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
    )

    at


    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
    )

    at


    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
    )

    at


    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
    )


    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g






    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g




    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
    --------------------------------------------
    Ken Krugler
    +1 530-210-6378
    http://bixolabs.com
    e l a s t i c w e b m i n i n g



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 25, '09 at 4:27p
activeDec 12, '09 at 1:46a
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase