Grokbase Groups HBase user June 2011
FAQ
(As a note, this is with CDH3u0 which is based on HBase 0.90.1)

We've been seeing intermittent failures of calls to
LoadIncrementalHFiles. When this happens the node that made the call
will see a FileNotFoundException such as this:

2011-06-23 15:47:34.379566500 java.net.SocketTimeoutException: Call to
s8.XXX/67.215.90.38:60020 failed on socket timeout exception:
java.net.SocketTi
meoutException: 60000 millis timeout while waiting for channel to be
ready for read. ch : java.nio.channels.SocketChannel[connected
local=/67.215.90.51:51605 remo
te=s8.XXX/67.215.90.38:60020]
2011-06-23 15:47:34.379570500 java.io.FileNotFoundException:
java.io.FileNotFoundException: File does not exist:
/hfiles/2011/06/23/14/domainsranked/TopDomainsRan
k.r3v5PRvK/handling/3557032074765091256
2011-06-23 15:47:34.379573500 at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1602)
2011-06-23 15:47:34.379573500 at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1593)

Over on the regionserver that was loading this we see that it attempted
to load and hit a 60 second timeout:

2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.regionserver.Store:
Validating hfile at
hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
for inclusion in store handling region
domainsranked,368449:2011/0/03/23:category:ffffffff:com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b
8.
...
2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /67.215.90.38:50010, add to deadNodes and continue
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/67.215.90.38:42199
remote=/67.215.90.38:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readShort(DataInputStream.java:295)

We suspect this particular problem is a resource contention issue on our
side. However, the loading process proceeds to rename the file despite
the failure:

2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming bulk load file
hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
to
hdfs://namenode.XXX:8020/hbase/domainsranked/d4925aca7852bed32613a509215d42b8/handling/3615917062821145533

And then the LoadIncrementalHFiles tries to load the hfile again:

2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.regionserver.Store:
Validating hfile at
hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
for inclusion in store handling region
domainsranked,368449:2011/05/03/23:category:ffffffff:com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b8.

2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 147 on 60020, call
bulkLoadHFile(hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256,
[B@4224508b, [B@5e23f799) from 67.215.90.51:51856: error:
java.io.FileNotFoundException: File does not exist:
/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256

This eventually leads to the load command failing. It feels to me like
HBase continuing to rename the hfile despite the failure, and then
LoadIncrementalHFiles trying again is a bug. I figured I'd ask here
before opening a ticket for this.

- Adam

Search Discussions

  • Ted Yu at Jun 24, 2011 at 12:00 am
    This is due to the handling of HFile.Reader being wrapped in a try-finally
    block. However, there is no check as to whether the reader operation
    encounters any exception which should determine what to do next.

    Please file a JIRA.

    Thanks Adam.
    On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps wrote:

    (As a note, this is with CDH3u0 which is based on HBase 0.90.1)

    We've been seeing intermittent failures of calls to LoadIncrementalHFiles.
    When this happens the node that made the call will see a
    FileNotFoundException such as this:

    2011-06-23 15:47:34.379566500 java.net.**SocketTimeoutException: Call to
    s8.XXX/67.215.90.38:60020 failed on socket timeout exception:
    java.net.SocketTi
    meoutException: 60000 millis timeout while waiting for channel to be ready
    for read. ch : java.nio.channels.**SocketChannel[connected local=/
    67.215.90.51:51605 remo
    te=s8.XXX/67.215.90.38:60020]
    2011-06-23 15:47:34.379570500 java.io.FileNotFoundException:
    java.io.FileNotFoundException: File does not exist: /hfiles/2011/06/23/14/
    **domainsranked/TopDomainsRan
    k.r3v5PRvK/handling/**3557032074765091256
    2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.**
    DFSClient$DFSInputStream.**openInfo(DFSClient.java:1602)
    2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.**
    DFSClient$DFSInputStream.<**init>(DFSClient.java:1593)

    Over on the regionserver that was loading this we see that it attempted to
    load and hit a 60 second timeout:

    2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.**regionserver.Store:
    Validating hfile at hdfs://namenode.XXX/hfiles/**
    2011/06/23/14/domainsranked/**TopDomainsRank.r3v5PRvK/**handling/3557032074765091256
    for inclusion in store handling region domainsranked,368449:2011/0/**
    03/23:category:ffffffff:com.**zynga.static.fishville.**
    facebook,1305890318961.**d4925aca7852bed32613a509215d42**b
    8.
    ...
    2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.**DFSClient: Failed to
    connect to /67.215.90.38:50010, add to deadNodes and continue
    java.net.**SocketTimeoutException: 60000 millis timeout while waiting for
    channel to be ready for read. ch : java.nio.channels.**SocketChannel[connected
    local=/67.215.90.38:42199 remote=/67.215.90.38:50010]
    at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
    SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.**SocketInputStream.read(**
    SocketInputStream.java:155)
    at org.apache.hadoop.net.**SocketInputStream.read(**
    SocketInputStream.java:128)
    at java.io.BufferedInputStream.**fill(BufferedInputStream.java:**218)
    at java.io.BufferedInputStream.**read(BufferedInputStream.java:**237)
    at java.io.DataInputStream.**readShort(DataInputStream.**java:295)

    We suspect this particular problem is a resource contention issue on our
    side. However, the loading process proceeds to rename the file despite the
    failure:

    2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.**regionserver.Store:
    Renaming bulk load file hdfs://namenode.XXX/hfiles/**
    2011/06/23/14/domainsranked/**TopDomainsRank.r3v5PRvK/**handling/3557032074765091256
    to hdfs://namenode.XXX:8020/**hbase/domainsranked/**
    d4925aca7852bed32613a509215d42**b8/handling/**3615917062821145533

    And then the LoadIncrementalHFiles tries to load the hfile again:

    2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.**regionserver.Store:
    Validating hfile at hdfs://namenode.XXX/hfiles/**
    2011/06/23/14/domainsranked/**TopDomainsRank.r3v5PRvK/**handling/3557032074765091256
    for inclusion in store handling region domainsranked,368449:2011/05/**
    03/23:category:ffffffff:com.**zynga.static.fishville.**
    facebook,1305890318961.**d4925aca7852bed32613a509215d42**b8.

    2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.**HBaseServer: IPC
    Server handler 147 on 60020, call bulkLoadHFile(hdfs://namenode.**
    XXX/hfiles/2011/06/23/14/**domainsranked/TopDomainsRank.**
    r3v5PRvK/handling/**3557032074765091256, [B@4224508b, [B@5e23f799) from
    67.215.90.51:51856: error: java.io.FileNotFoundException: File does not
    exist: /hfiles/2011/06/23/14/**domainsranked/TopDomainsRank.**
    r3v5PRvK/handling/**3557032074765091256

    This eventually leads to the load command failing. It feels to me like
    HBase continuing to rename the hfile despite the failure, and then
    LoadIncrementalHFiles trying again is a bug. I figured I'd ask here before
    opening a ticket for this.

    - Adam
  • Adam Phelps at Jun 24, 2011 at 5:00 pm
    Ok, HBASE-4030 has been opened against this.

    - Adam
    On 6/23/11 5:00 PM, Ted Yu wrote:
    This is due to the handling of HFile.Reader being wrapped in a
    try-finally block. However, there is no check as to whether the reader
    operation encounters any exception which should determine what to do next.

    Please file a JIRA.

    Thanks Adam.

    On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps wrote:

    (As a note, this is with CDH3u0 which is based on HBase 0.90.1)

    We've been seeing intermittent failures of calls to
    LoadIncrementalHFiles. When this happens the node that made the
    call will see a FileNotFoundException such as this:

    2011-06-23 15:47:34.379566500 java.net
    <http://java.net>.__SocketTimeoutException: Call to
    s8.XXX/67.215.90.38:60020 <http://67.215.90.38:60020> failed on
    socket timeout exception: java.net.SocketTi
    meoutException: 60000 millis timeout while waiting for channel to be
    ready for read. ch : java.nio.channels.__SocketChannel[connected
    local=/67.215.90.51:51605 <http://67.215.90.51:51605> remo
    te=s8.XXX/67.215.90.38:60020 <http://67.215.90.38:60020>]
    2011-06-23 15:47:34.379570500 java.io.FileNotFoundException:
    java.io.FileNotFoundException: File does not exist:
    /hfiles/2011/06/23/14/__domainsranked/TopDomainsRan
    k.r3v5PRvK/handling/__3557032074765091256
    2011-06-23 15:47:34.379573500 at
    org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__openInfo(DFSClient.java:1602)
    2011-06-23 15:47:34.379573500 at
    org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.<__init>(DFSClient.java:1593)

    Over on the regionserver that was loading this we see that it
    attempted to load and hit a 60 second timeout:

    2011-06-23 15:45:54,634 INFO
    org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at
    hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
    for inclusion in store handling region
    domainsranked,368449:2011/0/__03/23:category:ffffffff:com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b
    8.
    ...
    2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.__DFSClient:
    Failed to connect to /67.215.90.38:50010
    <http://67.215.90.38:50010>, add to deadNodes and continue
    java.net <http://java.net>.__SocketTimeoutException: 60000 millis
    timeout while waiting for channel to be ready for read. ch :
    java.nio.channels.__SocketChannel[connected
    local=/67.215.90.38:42199 <http://67.215.90.38:42199>
    remote=/67.215.90.38:50010 <http://67.215.90.38:50010>]
    at org.apache.hadoop.net
    <http://org.apache.hadoop.net>.__SocketIOWithTimeout.doIO(__SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net
    <http://org.apache.hadoop.net>.__SocketInputStream.read(__SocketInputStream.java:155)
    at org.apache.hadoop.net
    <http://org.apache.hadoop.net>.__SocketInputStream.read(__SocketInputStream.java:128)
    at java.io.BufferedInputStream.__fill(BufferedInputStream.java:__218)
    at java.io.BufferedInputStream.__read(BufferedInputStream.java:__237)
    at java.io.DataInputStream.__readShort(DataInputStream.__java:295)

    We suspect this particular problem is a resource contention issue on
    our side. However, the loading process proceeds to rename the file
    despite the failure:

    2011-06-23 15:46:54,657 INFO
    org.apache.hadoop.hbase.__regionserver.Store: Renaming bulk load
    file
    hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
    to
    hdfs://namenode.XXX:8020/__hbase/domainsranked/__d4925aca7852bed32613a509215d42__b8/handling/__3615917062821145533

    And then the LoadIncrementalHFiles tries to load the hfile again:

    2011-06-23 15:46:55,684 INFO
    org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at
    hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
    for inclusion in store handling region
    domainsranked,368449:2011/05/__03/23:category:ffffffff:com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b8.

    2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.__HBaseServer:
    IPC Server handler 147 on 60020, call
    bulkLoadHFile(hdfs://namenode.__XXX/hfiles/2011/06/23/14/__domainsranked/TopDomainsRank.__r3v5PRvK/handling/__3557032074765091256,
    [B@4224508b, [B@5e23f799) from 67.215.90.51:51856
    <http://67.215.90.51:51856>: error: java.io.FileNotFoundException:
    File does not exist:
    /hfiles/2011/06/23/14/__domainsranked/TopDomainsRank.__r3v5PRvK/handling/__3557032074765091256

    This eventually leads to the load command failing. It feels to me
    like HBase continuing to rename the hfile despite the failure, and
    then LoadIncrementalHFiles trying again is a bug. I figured I'd ask
    here before opening a ticket for this.

    - Adam

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 23, '11 at 11:41p
activeJun 24, '11 at 5:00p
posts3
users2
websitehbase.apache.org

2 users in discussion

Adam Phelps: 2 posts Ted Yu: 1 post

People

Translate

site design / logo © 2022 Grokbase