FAQ
Anyone have insight on the following message from a near-TRUNK namenode log?

2007-11-26 01:16:23,282 WARN dfs.StateChange - DIR*
NameSystem.startFile: failed to create file
/hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on
client 38.99.77.80 because current leaseholder is trying to recreate file.

It starts for no apparent reason. There is no exception or warning
preceding it in the log, and creation of files in dfs by pertinent code
has been working fine for a good while before this. After it starts,
it repeats every minute until the client-side goes away. While the
client is up, it retries creating the file every 5 minutes or so.

From the client-side, the exception looks like this:

34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
operations: ProcessServerShutdown of XX.XX.XX.XX:60020
34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837
on client XX.XX.XX.XX because curren t leaseholder is trying to recreate
file.
34976 at
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
34977 at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
34980 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
34981 at java.lang.reflect.Method.invoke(Method.java:597)
34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
34984
34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
34986 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
34987 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
34989 at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)

The code is not exotic:

SequenceFile.Writer w = SequenceFile.createWriter(fs,
conf, logfile, HLogKey.class, HLogEdit.class);

(See HLog from hbase around #180 if you want to see more context).

I see the comments in the incomplete HADOOP-2050. There is some
implication that an attempt at a create should be wrapped in a try/catch
that retries any AlreadyBeingCreatedException. Should I be doing this
(It looks like its done in DFSClient)?

Thanks for any insight,
St.Ack

P.S. HADOOP-2283 is issue I've opened to cover this particular topic.

Search Discussions

  • Stack at Nov 27, 2007 at 6:37 pm
    Here's more:

    2007-11-27 05:00:26,331 DEBUG fs.DFSClient - Failed to connect to /38.99.76.30:50010:java.io.IOException: Got error in response to OP_READ_BLOCK
    at org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readFully(DataInputStream.java:152)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
    at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1349)
    at org.apache.hadoop.io.SequenceFile$Reader.(MapFile.java:263)
    at org.apache.hadoop.io.MapFile$Reader.(HStoreFile.java:820)
    at org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:932)
    at org.apache.hadoop.hbase.HStore.compactHStoreFiles(HStore.java:1084)
    at org.apache.hadoop.hbase.HStore.compact(HStore.java:1017)
    at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
    at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
    at org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)


    Am I supposed to retry?

    Will that even make a difference? Every ten minutes or so, client will
    retry getting a reader on either same file or another and gets same error.

    Thanks for any insight,
    St.Ack


    stack wrote:
    Anyone have insight on the following message from a near-TRUNK
    namenode log?

    2007-11-26 01:16:23,282 WARN dfs.StateChange - DIR*
    NameSystem.startFile: failed to create file
    /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on
    client 38.99.77.80 because current leaseholder is trying to recreate
    file.

    It starts for no apparent reason. There is no exception or warning
    preceding it in the log, and creation of files in dfs by pertinent
    code has been working fine for a good while before this. After it
    starts, it repeats every minute until the client-side goes away.
    While the client is up, it retries creating the file every 5 minutes
    or so.

    From the client-side, the exception looks like this:

    34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
    operations: ProcessServerShutdown of XX.XX.XX.XX:60020
    34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
    file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837
    on client XX.XX.XX.XX because curren t leaseholder is trying to
    recreate file.
    34976 at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)

    34977 at
    org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
    34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
    34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
    34980 at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    34981 at java.lang.reflect.Method.invoke(Method.java:597)
    34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
    34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
    34984
    34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
    Method)
    34986 at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

    34987 at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

    34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    34989 at
    org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)

    34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)

    The code is not exotic:

    SequenceFile.Writer w = SequenceFile.createWriter(fs,
    conf, logfile, HLogKey.class, HLogEdit.class);

    (See HLog from hbase around #180 if you want to see more context).

    I see the comments in the incomplete HADOOP-2050. There is some
    implication that an attempt at a create should be wrapped in a
    try/catch that retries any AlreadyBeingCreatedException. Should I be
    doing this (It looks like its done in DFSClient)?

    Thanks for any insight,
    St.Ack

    P.S. HADOOP-2283 is issue I've opened to cover this particular topic.

  • Stack at Nov 27, 2007 at 10:28 pm
    Reading down through the dfs code, at least the
    AlreadyBeingCreatedException should be getting retried. Why am I seeing
    it then in my code (And what to do about the below IOE that has no
    matching complaint in the namenode)?

    Thanks,
    St.Ack

    stack wrote:
    Here's more:

    2007-11-27 05:00:26,331 DEBUG fs.DFSClient - Failed to connect to
    /38.99.76.30:50010:java.io.IOException: Got error in response to
    OP_READ_BLOCK
    at
    org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)

    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)

    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readFully(DataInputStream.java:152)
    at
    org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1360)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
    at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
    at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
    at
    org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:820)

    at org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:932)
    at
    org.apache.hadoop.hbase.HStore.compactHStoreFiles(HStore.java:1084)
    at org.apache.hadoop.hbase.HStore.compact(HStore.java:1017)
    at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
    at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
    at
    org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)



    Am I supposed to retry?
    Will that even make a difference? Every ten minutes or so, client will
    retry getting a reader on either same file or another and gets same
    error.

    Thanks for any insight,
    St.Ack


    stack wrote:
    Anyone have insight on the following message from a near-TRUNK
    namenode log?

    2007-11-26 01:16:23,282 WARN dfs.StateChange - DIR*
    NameSystem.startFile: failed to create file
    /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on
    client 38.99.77.80 because current leaseholder is trying to recreate
    file.

    It starts for no apparent reason. There is no exception or warning
    preceding it in the log, and creation of files in dfs by pertinent
    code has been working fine for a good while before this. After it
    starts, it repeats every minute until the client-side goes away.
    While the client is up, it retries creating the file every 5 minutes
    or so.

    From the client-side, the exception looks like this:

    34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
    operations: ProcessServerShutdown of XX.XX.XX.XX:60020
    34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
    file /hbase/hregion_-1194436719/oldlogfile.log for
    DFSClient_610028837 on client XX.XX.XX.XX because curren t
    leaseholder is trying to recreate file.
    34976 at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)

    34977 at
    org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
    34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
    34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
    34980 at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    34981 at java.lang.reflect.Method.invoke(Method.java:597)
    34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
    34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
    34984
    34985 at
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    34986 at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

    34987 at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

    34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    34989 at
    org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)

    34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)

    The code is not exotic:

    SequenceFile.Writer w = SequenceFile.createWriter(fs,
    conf, logfile, HLogKey.class, HLogEdit.class);

    (See HLog from hbase around #180 if you want to see more context).

    I see the comments in the incomplete HADOOP-2050. There is some
    implication that an attempt at a create should be wrapped in a
    try/catch that retries any AlreadyBeingCreatedException. Should I be
    doing this (It looks like its done in DFSClient)?

    Thanks for any insight,
    St.Ack

    P.S. HADOOP-2283 is issue I've opened to cover this particular topic.

  • Stack at Nov 29, 2007 at 7:25 pm
    The IOE in the below has a matching complaint in the data node. Here
    are the matching client and datanode exceptions:


    CLIENT:

    2007-11-28 21:38:39,391 DEBUG hbase.HStore - started compaction of 3 files in /hbase/compaction.dir/hregion_-1314266377/cookie
    2007-11-28 21:38:39,398 DEBUG fs.DFSClient - Failed to connect to /XX.XX.XX.17:50010:java.io.IOException: Got error in response to OP_READ_BLOCK
    at org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1027)
    at java.io.FilterInputStream.read(FilterInputStream.java:66)
    at java.io.DataInputStream.readByte(DataInputStream.java:248)
    at org.apache.hadoop.hbase.HStoreFile.loadInfo(HStoreFile.java:590)
    at org.apache.hadoop.hbase.HStore.compact(HStore.java:1007)
    at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
    at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
    at org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)



    DATANODE:

    2007-11-28 21:38:39,397 WARN dfs.DataNode - XX.XX.XX.17:50010:Got exception while serving blk_5964910456479636527 to /XX.XX.XX.17:
    java.io.IOException: Block blk_5964910456479636527 is not valid.
    at org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:549)
    at org.apache.hadoop.dfs.FSDataset.getMetaFile(FSDataset.java:466)
    at org.apache.hadoop.dfs.FSDataset.metaFileExists(FSDataset.java:470)
    at org.apache.hadoop.dfs.DataNode$BlockSender.(DataNode.java:897)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
    at java.lang.Thread.run(Thread.java:619)

    2007-11-28 21:38:39,397 ERROR dfs.DataNode - XX.XX.XX.17:50010:DataXceiver: java.io.IOException: Block blk_5964910456479636527 is not valid.
    at org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:549)
    at org.apache.hadoop.dfs.FSDataset.getMetaFile(FSDataset.java:466)
    at org.apache.hadoop.dfs.FSDataset.metaFileExists(FSDataset.java:470)
    at org.apache.hadoop.dfs.DataNode$BlockSender.(DataNode.java:897)
    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
    at java.lang.Thread.run(Thread.java:619)


    How should I recover from this (and from the
    AlreadyBeingCreatedException with no matching namenode complaint in the
    below)?

    Thanks,
    St.Ack



    stack wrote:
    Reading down through the dfs code, at least the
    AlreadyBeingCreatedException should be getting retried. Why am I
    seeing it then in my code (And what to do about the below IOE that has
    no matching complaint in the namenode)?

    Thanks,
    St.Ack

    stack wrote:
    Here's more:

    2007-11-27 05:00:26,331 DEBUG fs.DFSClient - Failed to connect to
    /XX.XX.XX.30:50010:java.io.IOException: Got error in response to
    OP_READ_BLOCK
    at
    org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)

    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)

    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readFully(DataInputStream.java:152)
    at
    org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1360)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
    at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
    at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
    at
    org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:820)

    at org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:932)
    at
    org.apache.hadoop.hbase.HStore.compactHStoreFiles(HStore.java:1084)
    at org.apache.hadoop.hbase.HStore.compact(HStore.java:1017)
    at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
    at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
    at
    org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)



    Am I supposed to retry?
    Will that even make a difference? Every ten minutes or so, client
    will retry getting a reader on either same file or another and gets
    same error.

    Thanks for any insight,
    St.Ack


    stack wrote:
    Anyone have insight on the following message from a near-TRUNK
    namenode log?

    2007-11-26 01:16:23,282 WARN dfs.StateChange - DIR*
    NameSystem.startFile: failed to create file
    /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on
    client 38.99.77.80 because current leaseholder is trying to recreate
    file.

    It starts for no apparent reason. There is no exception or warning
    preceding it in the log, and creation of files in dfs by pertinent
    code has been working fine for a good while before this. After it
    starts, it repeats every minute until the client-side goes away.
    While the client is up, it retries creating the file every 5 minutes
    or so.

    From the client-side, the exception looks like this:

    34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing
    pending operations: ProcessServerShutdown of XX.XX.XX.XX:60020
    34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
    org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
    file /hbase/hregion_-1194436719/oldlogfile.log for
    DFSClient_610028837 on client XX.XX.XX.XX because curren t
    leaseholder is trying to recreate file.
    34976 at
    org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)

    34977 at
    org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
    34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
    34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
    34980 at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    34981 at java.lang.reflect.Method.invoke(Method.java:597)
    34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
    34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
    34984
    34985 at
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    34986 at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

    34987 at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

    34988 at
    java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    34989 at
    org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)

    34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)

    The code is not exotic:

    SequenceFile.Writer w = SequenceFile.createWriter(fs,
    conf, logfile, HLogKey.class, HLogEdit.class);

    (See HLog from hbase around #180 if you want to see more context).

    I see the comments in the incomplete HADOOP-2050. There is some
    implication that an attempt at a create should be wrapped in a
    try/catch that retries any AlreadyBeingCreatedException. Should I
    be doing this (It looks like its done in DFSClient)?

    Thanks for any insight,
    St.Ack

    P.S. HADOOP-2283 is issue I've opened to cover this particular topic.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 27, '07 at 5:58a
activeNov 29, '07 at 7:25p
posts4
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Stack: 4 posts

People

Translate

site design / logo © 2022 Grokbase