How to recover a very large RAID6

By: | Comments: No Comments

Posted in categories: Computer Tips, Work related

RAID6 is supposed to have two sets of redundancy that could survive a simultaneous failure of two drives, and it can go directly on automatic rebuilding if you have hot swaps.

However, you must have known that the RAID controller and the back plane are still playing dictatorship for a RAID group. It could happen that due to a malfunctioning controller, a back plane, or a unstable hard drive that spitting erroneous signal to the bus and fools the controller that multiple drives are bad all together.

If such thing happens, in most of time you might be lucky enough that when you reboot the system, the raid might appear as carrying “foreign” configuration. In this case, you may want to try importing the “foreign” first. If success, all your stuff should be back.

If it fails, the RAID controller vendor might tell you that you have to seek professional data recovery service or try some very high risk tricks that may actually cause you to lose all data.

DO NOT LISTEN TO THEM.

At this point, nothing of your data is missing.

This could happen when a conclusive disk failure happens, the RAID card abandon the first drive and attempt to reconstruct on a hot spare, and more drives fail and render the raid configuration inconsistent.

The worst case, you are expanding your raid group and this happens, which leaves you portion of the drives in one configuration and another half in a different.

This does not mean that your data are gone. They are still there. Even if you totally do not have the knowledge of your RAID drive order, strip size, etc. It is still fine.

Here is what you should do:

Step 1
Write down the disk numbers that forms you original RAID and your new RAID, just in case you might forget.
Step 2
Make the RAID controller to clear the “foreign” setting. Do not worry. The RAID setting is only saved in the first sector or last sector of your drives. Erasing it won’t alter your data.
Step 3
Make all your drives from the melted RAID single drive RAID0 or standalone virtual drives, REMEMBER to create them as READONLY, just in case you accidentally write over them. Export them to your computer.
Step 4
Mount READONLY the drives according to their numbers on your operating system, preferably linux.
Step 5
Use a tool that can read the binary directly, for example hexedit in linux, to browse the drives, finding out how much the reconstruct has been going. Usually new drives are wiped all zeros. So if you jump to some point you are sure that your data should have reach but your reconstruct has not, it will show continuously non-zero pages for disks that forms the original drives, and many pages zeros for new drives. Try to move the viewing point by 1/2 up or done, you will be able to rapidly separate the original disks and the being-expanded-to disks, and precisely the location of done-reconstruct. Suppose you have a 7x 4TB RAID6 to be expanded to 12x 4TB RAID6, and you find that the new drives were written to 700GB, you will know that 7000GB data had been reconstructed,
or the same location will be at about 1400GB on the old drives. Do the math and find the location in the old drives, you will see same content of pages which means that you are correct. Take notes of the precise location and disk numbers.
Step 6
By trunk size 256kB, dd several 10GBs out of each drive, name them orderly, from the sure-after rebuild and sure-before rebuild part, into image files.
Step 7
Install a virtualbox in your linux system. Install a Windows 7 in the virtualbox.
Step 8
Download and install “ReclaiMe Free RAID recovery”, install it into your virtual machine Windows 7.
Step 9
Export the volume that contains your image files to the virtual machine.
Step 10
Run ReclaiMe, open the network location contains the exported volume, read in all the pre-rebuild part of the disk images. Select Other RAIDS->Start RAID6.
It will run for a few minutes to a few hours, and find out a map like this:

Block size is 256.0 KB , same as 512 sectors.
The data starts at sector (LBA) 0 (this is often called “offset” or “start offset”).

Block map is as follows:

1 2 3 4 5 6 7 8 9 10 11 12 P ?
14 15 16 17 18 19 20 21 22 23 24 P ? 13
27 28 29 30 31 32 33 34 35 36 P ? 25 26
40 41 42 43 44 45 46 47 48 P ? 37 38 39
53 54 55 56 57 58 59 60 P ? 49 50 51 52
66 67 68 69 70 71 72 P ? 61 62 63 64 65
79 80 81 82 83 84 P ? 73 74 75 76 77 78
92 93 94 95 96 P ? 85 86 87 88 89 90 91
105 106 107 108 P ? 97 98 99 100 101 102 103 104
118 119 120 P ? 109 110 111 112 113 114 115 116 117
131 132 P ? 121 122 123 124 125 126 127 128 129 130
144 P ? 133 134 135 136 137 138 139 140 141 142 143
P ? 145 146 147 148 149 150 151 152 153 154 155 156
? 157 158 159 160 161 162 163 164 165 166 167 168 P

This tells you the RAID block size, disk order, and parities orientation.
Save this into a file.
Step 11
Do the same thing to the after-rebuild part of the disk image, save the calculated RAID configuration file.
Step 12
Now you need to prepare a big enough hard drive or RAID volume that is possible to contain TWICE of all the information from the “after-rebuild” part. Have it mounted on your linux.
Step 13
Use dd to dump the ENTIRE data of the after-rebuild part of each drives, write them into image files
Step 14
Use soft RAID, to assemble the disk images of the above dumped into a read-only RAID6, according to the after-rebuild configuration obtained from Step 11.
Step 15
The above step should include the partition information and the filesystem information for at least the first partition. Use dd to dump the partition table, save it.
Step 16
Do not try to activate the filesystem, unless your calculation shows that there is at least one complete file system. Now dd the entire RAID to a giant dump file.
Step 17
Use Hexedit to browse to the very end of the giant dump file, trim ending to eliminate any know not rebuilt data part.
Step 18
Select the drives that forms the pre-rebuild RAID, umount them, delete the RAID0 volumes, construct RAID6 using them in READONLY mode. Be sure to use the drive order obtained in step 10.  Do not attempt to mount the file system since the leading big trunk of data are already destroyed by the rebuilding process.
Step 19
Use hexedit to precisely locate the location of the last few sectors of the big dump file obtained from Step 17 on the READONLY RAID6 with calculated location.  Verify it by observation.
Step 20
Use dd to read data from the READONLY RAID6 from the overlapping spot to another giant dump file.
Step 21
Use hexedit to examine that the two dump files are continuous and the total size matches the entire size of the original RAID6.  Then joint the two giant dump file into one.
Step 22
Now you need a third set of hard drives with the same number and specs and construct a new RAID6 in RW.
Step 23
dd the giant jointed dump file obtained in Step 21 onto the new RAID6.  It is OK if the RAID is attempting automatic initialization.
Step 24
Now you can use gparted to examine if you have got the proper partition table back. You should.  If not, you must have done something wrong in previous steps.
Step 25
Once you are sure that the partition information are correct, you can test mounting them.  It will attempt to fix/flush meta.  If you are lucky enough, it will be mounted and you will be able to see your data.  Then umount it.  If it does not mount, your file system must have been corrupted.
Step 26
Whatever you were able or not able to mount the volume in Step 25, you should now umount it and attempt a repair of the file system.  If you are fortunate enough, you will be able to get back most of your data.  If not, you may end up nothing or have many of your data in /lost+found.
Step 27
Once you are done, and is satisfied of what you have got back, you can remove the drive arrays you have been using to hold the giant dump files.

Be the first to comment!

Leave a Reply