FAQ
Hi Devs



We ran into one issue while splitting HLogs due to EOFException. (0.90.6
version).



Due to some reason the DNs were not able to connect to NN (network
fluctuation) and the master was splitting the logs.

While parsing the hlog, we get the length and we expect it might be 0. (No
problem here).



But in this scenario the DFSClient throws an EOFException as all DNs are not
able to connect to NN and due to that the while reading we get an
EOFException.



In this specific case we just return and the master considers the split to
be successful. This leads to data loss.



May be this can be fixed from HDFS side, I would like to know can we throw
an IOException in this case and make the split log to be retried as we have
a retry logic now.



I was going through HBASE-2643 as part of which EOFException was handled and
dint find this scenario in that.

Please provide your suggestions.



Regards

Ram

Search Discussions

  • Yuzhihong at Feb 21, 2012 at 3:09 pm
    Can you provide stack trace for this issue ?

    Thanks


    On Feb 21, 2012, at 8:52 AM, "Ramkrishna.S.Vasudevan" wrote:

    Hi Devs



    We ran into one issue while splitting HLogs due to EOFException. (0.90.6
    version).



    Due to some reason the DNs were not able to connect to NN (network
    fluctuation) and the master was splitting the logs.

    While parsing the hlog, we get the length and we expect it might be 0. (No
    problem here).



    But in this scenario the DFSClient throws an EOFException as all DNs are not
    able to connect to NN and due to that the while reading we get an
    EOFException.



    In this specific case we just return and the master considers the split to
    be successful. This leads to data loss.



    May be this can be fixed from HDFS side, I would like to know can we throw
    an IOException in this case and make the split log to be retried as we have
    a retry logic now.



    I was going through HBASE-2643 as part of which EOFException was handled and
    dint find this scenario in that.

    Please provide your suggestions.



    Regards

    Ram
  • Ramkrishna.S.Vasudevan at Feb 21, 2012 at 3:45 pm
    Please find the logs in HRegionServer. Did you mean this Ted?

    2012-02-18 00:44:38,808 INFO org.apache.hadoop.hbase.util.FSUtils: Finished
    lease recover attempt for
    hdfs://158-1-130-13:9000/hbase/.logs/linux1,20020,1329487401169/linux1%3A200
    20.1329492399793
    2012-02-18 00:44:38,808 WARN
    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
    hdfs://158-1-130-13:9000/hbase/.logs/linux1,20020,1329487401169/linux1%3A200
    20.1329492399793 might be still open, length is 0
    2012-02-18 00:44:38,811 WARN
    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Could not open
    hdfs://158-1-130-13:9000/hbase/.logs/linux1,20020,1329487401169/linux1%3A200
    20.1329492399793 for reading. File is emptyjava.io.EOFException


    Regards
    Ram

    -----Original Message-----
    From: yuzhihong@gmail.com
    Sent: Tuesday, February 21, 2012 8:39 PM
    To: dev@hbase.apache.org
    Cc: dev@hbase.apache.org; rama krishna
    Subject: Re: Handling EOFexception while splitlog

    Can you provide stack trace for this issue ?

    Thanks



    On Feb 21, 2012, at 8:52 AM, "Ramkrishna.S.Vasudevan"
    wrote:
    Hi Devs



    We ran into one issue while splitting HLogs due to EOFException. (0.90.6
    version).



    Due to some reason the DNs were not able to connect to NN (network
    fluctuation) and the master was splitting the logs.

    While parsing the hlog, we get the length and we expect it might be 0. (No
    problem here).



    But in this scenario the DFSClient throws an EOFException as all DNs are not
    able to connect to NN and due to that the while reading we get an
    EOFException.



    In this specific case we just return and the master considers the split to
    be successful. This leads to data loss.



    May be this can be fixed from HDFS side, I would like to know can we throw
    an IOException in this case and make the split log to be retried as we have
    a retry logic now.



    I was going through HBASE-2643 as part of which EOFException was handled and
    dint find this scenario in that.

    Please provide your suggestions.



    Regards

    Ram

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedFeb 21, '12 at 2:56p
activeFeb 21, '12 at 3:45p
posts3
users2
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase