FAQ
Hello, it seems the HDFS in my cluster is corrupt. This is the output
from hadoop fsck:
Total size: 9196815693 B
Total dirs: 17
Total files: 157
Total blocks: 157 (avg. block size 58578443 B)
********************************
CORRUPT FILES: 157
MISSING BLOCKS: 157
MISSING SIZE: 9196815693 B
********************************
Minimally replicated blocks: 0 (0.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 0.0
Missing replicas: 0
Number of data-nodes: 1
Number of racks: 1

It seems to say that there is 1 block missing from every file that was
in the cluster..

I'm not sure how to proceed so any guidance would be much appreciated.
My primary
concern is recovering the data.

thanks

Search Discussions

  • Lohit at Mar 10, 2009 at 12:35 am
    How many Datanodes do you have.
    From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others?
    Also, I see that your default replication is set to 1. Can you check if your datanodes are up and running.
    Lohit



    ----- Original Message ----
    From: Mayuran Yogarajah <[email protected]>
    To: [email protected]
    Sent: Monday, March 9, 2009 5:20:37 PM
    Subject: HDFS is corrupt, need to salvage the data.

    Hello, it seems the HDFS in my cluster is corrupt. This is the output from hadoop fsck:
    Total size: 9196815693 B
    Total dirs: 17
    Total files: 157
    Total blocks: 157 (avg. block size 58578443 B)
    ********************************
    CORRUPT FILES: 157
    MISSING BLOCKS: 157
    MISSING SIZE: 9196815693 B
    ********************************
    Minimally replicated blocks: 0 (0.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 0 (0.0 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 1
    Average block replication: 0.0
    Missing replicas: 0
    Number of data-nodes: 1
    Number of racks: 1

    It seems to say that there is 1 block missing from every file that was in the cluster..

    I'm not sure how to proceed so any guidance would be much appreciated. My primary
    concern is recovering the data.

    thanks
  • Mayuran Yogarajah at Mar 10, 2009 at 5:15 pm

    lohit wrote:
    How many Datanodes do you have.
    From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others?
    Also, I see that your default replication is set to 1. Can you check if your datanodes are up and running.
    Lohit

    There is only one data node at the moment. Does this mean the data is
    not recoverable?
    The HD on the machine seems fine so I'm a little confused as to what
    caused the HDFS to
    become corrupted.

    M
  • Raghu Angadi at Mar 10, 2009 at 6:45 pm

    Mayuran Yogarajah wrote:
    lohit wrote:
    How many Datanodes do you have.
    From the output it looks like at the point when you ran fsck, you had
    only one datanode connected to your NameNode. Did you have others?
    Also, I see that your default replication is set to 1. Can you check
    if your datanodes are up and running.
    Lohit

    There is only one data node at the moment. Does this mean the data is
    not recoverable?
    The HD on the machine seems fine so I'm a little confused as to what
    caused the HDFS to
    become corrupted.
    The block files usually don't disappear easily. Check on the datanode if
    you find any files starting with "blk". Also check datanode log to see
    what happened there... may be use started on a different directory or
    something like that.

    Raghu.
  • Mayuran Yogarajah at Mar 10, 2009 at 7:20 pm

    Raghu Angadi wrote:
    The block files usually don't disappear easily. Check on the datanode if
    you find any files starting with "blk". Also check datanode log to see
    what happened there... may be use started on a different directory or
    something like that.

    Raghu.
    There are indeed blk files:
    find -name 'blk*' | wc -l
    158

    I didn't see anything out of the ordinary in the datanode log.

    At this point is there anything I can do to recover the files? Or do I
    need to reformat
    the data node and load the data in again ?

    thanks
  • Mayuran Yogarajah at Mar 11, 2009 at 6:52 pm

    Mayuran Yogarajah wrote:
    Raghu Angadi wrote:
    The block files usually don't disappear easily. Check on the datanode if
    you find any files starting with "blk". Also check datanode log to see
    what happened there... may be use started on a different directory or
    something like that.

    Raghu.
    There are indeed blk files:
    find -name 'blk*' | wc -l
    158

    I didn't see anything out of the ordinary in the datanode log.

    At this point is there anything I can do to recover the files? Or do I
    need to reformat
    the data node and load the data in again ?

    thanks
    Sorry to resend this but I didn't receive a response and wanted to know
    how to proceed.
    Is it possible to recover the data at this stage? Or is it gone ?

    thanks
  • Raghu Angadi at Mar 11, 2009 at 7:10 pm
    Mayuran,

    It takes very long for a lot of iterations if we have to go through each
    debugging step, one at a time. May be a jira is a good place.

    - Run fsck with blocks option.

    - Check if those ids match with ids in file names found by 'find'.

    - Check which directory are these files in.. and verify if that matches
    with datanode configured directory

    You are saying there is nothing wrong in the log files, but does it
    imply that datanode sees those 157 missing blocks? May be you should
    post the log or verify that yourself. If DN is working correctly
    according to you, then you should not have 100% of blocks missing.

    There are many possibilities, it not easy for me list the the right one
    in your case without much info or list all possible conditions.

    Raghu.

    Mayuran Yogarajah wrote:
    Mayuran Yogarajah wrote:
    Raghu Angadi wrote:
    The block files usually don't disappear easily. Check on the datanode if
    you find any files starting with "blk". Also check datanode log to see
    what happened there... may be use started on a different directory or
    something like that.

    Raghu.
    There are indeed blk files:
    find -name 'blk*' | wc -l
    158

    I didn't see anything out of the ordinary in the datanode log.

    At this point is there anything I can do to recover the files? Or do I
    need to reformat
    the data node and load the data in again ?

    thanks
    Sorry to resend this but I didn't receive a response and wanted to know
    how to proceed.
    Is it possible to recover the data at this stage? Or is it gone ?

    thanks

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 10, '09 at 12:21a
activeMar 11, '09 at 7:10p
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase