RAID System Diagnosis.
Our assessment will determine the fault conditions that have caused your system failure and prevented a rebuild. Duplicate copies of each of storage volumes are created access issues and faults resolved.
RAID Systems Needing Config Rebuild.
Once the data description that constitute the RAID logical volume has been identified, critical data will then be extracted to alternative media. During the extraction process any damaged data structures are corrected, data integrity checked and a file listing produced for review.
RAID File Recovery and System Restore.
Once your data and file listings have been recovered your system will require. rebuild , recommissioning and system restore. This may require cooperation between our technical support team and your IT support team to ensure data is accessible to the end user. Once you have a full system restore the Datlabs technical team will undertake a system review to ensure you have the necessary contingency management and disaster recovery systems in place.
Typical RAID Problems:
RAID Single Drive Failure.
Do you have a single RAID hard drive failure, or a multiple hard drive failure. A single hard drive failure is entirely self repairable however the RAID may remain operating in a degraded mode.
Most RAID manufacturers provide a utility that will quickly add a new hard drive to the disk array and restore your RAID configuration to its original state and function as normal.
Multiple Drive Failure.
In cases where a RAID array has more than one physical hard drive failure, it is almost impossible to perform an effective RAID recovery without the proper professional level data recovery tools. In essence, in order to repair a multi-drive RAID failure, you will typically have to rebuild at least one of the hard drives from scratch, make it functional, and then re-add it to the array while ensuring the data is absolutely intact.
Datlabs Professional RAID Recovery.
When more than one hard disk in your array fails physically, it is almost always necessary to rebuild those hard disks from scratch in order to begin piecing back together the total disk set. This is rarely a job for IT support This is a job that must be performed under at least Class 1000 clean room circumstances, to ensure that the hard drives are not damaged during the recovery process.
RAID Hard Disk Drive Failure
A RAID array is essentially a number of hard drives across which data is stored or replicated for the purpose of improving system performance, security or a combination of both. RAID arrays are usually configured and managed as a part of a maintenance regime either automatic or manual that militates against the possibility of data loss. The greatest risks to the data stored on a RAID array are hard disk drive failure, malware attack or poor maintenance procedures.
RAID Hard Disk Drive Replacement
For many arrays a drive that is accumulating errors will be forced out of service and its data reconstructed across the remaining good drives available to the array controller.
In this case the data on the failing hard disk drive must be rebuilt from the parity data on the remaining active drives and written to a hot spare. Post-failure replacement takes considerably longer due to the calculations that must take place in order to rebuild the data. To militate against the risk of drive failures, you should always try to ensure that the RAID array you have is first of all capable of actually performing a rebuild and also that it has compatible hot swap hard drives or replacements to rebuild to ! A rigorous data back-up regime is also a must do for any server system.
Scheduled system rebuilds are normally better undertaken when system downtime can be tolerated as they can take a considerable time where the stored data volumes are relatively large. On most RAID configured systems, rebuilds can be prioritised against other system related activities such that the rebuild will occur in preference to operational demands.
RAID Single Hard Disk Drive failure
Following the failure of a hard drive within the RAID Array, the system may still be accessible however its subsequent operation without fault tolerance/redundancy means it is left vulnerable to a catastrophic system failure. In this case all current data should be backed up before any rebuild is attempted. It is also probable that the contemporary hard drives making up the RAID volume are now at consequential risk of failure.
You should also be aware that a RAID rebuild process is generally IO intensive and can put a greater workload on potentially problematic hard disks within the volume/s. Under these circumstances a re-configuration of applications may not be wise i.e. if your rebuild fails you may end up with more failed hard disk drives than you bargained for.
RAID Controller Problems
RAID controllers manage data storage, access and the maintenance of your multi disk system. Implementations of RAID controllers include Mylex, Adaptec, Compaq, HP and IBM. These implementations can rebuild a failed data volume from a hot standby drive or a replacement drive through a hot swap. A rebuild will however fail if two disk volumes fail simultaneously or if part of the native configuration is actually stored on a single failed volume. RAID’s can also fail as a result of the following situations and frequently a combination of one or more of them:
- Malfunctioned Controller
- Raid rebuild error or volume reconstruction problem
- Missing RAID partition
- Multiple disk failure in off-line state resulting in loss of RAID volume
- Wrong replacement of good disk element belonging to a working raid volume
- Power Surge
- Data Deletion or reformat
- Virus Attack
- Loss of RAID configuration settings or system registry
- Inadvertent reconfiguration of RAID volume
- Loss of RAID disk access after system or application upgrade.
RAID 5 Bungled Rebuild
Datlabs technicians are frequently engaged by customers who have inadvertently bungled the RAID5 rebuild process. Once a mistake has been made it is not obvious that there is no longer a simple means of rebuilding the RAID and restoring the stored data and operating system. The damage occurs if one removes several disks from the RAID5 array, then plugs them back in a different order, and then performs a RAID 5 rebuild. The RAID 5 rebuild, sometimes called a re-synch, re-calculates and rewrites the XOR parity blocks of the array. A rebuild is executed automatically once the drive is removed and re-inserted, or after a power failure.The damage caused can be explained as :-
If you have problem with your RAID server, some of the processes listed below may help you to minimize further loss of data or at least increase your chance of successful recovery with the right expert.
|With drives swapped|
|After the rebuild|
P is the original parity and X the new parity.
Recovering RAID Configuration tables
In the above case a software recovery will not be possible. A manual recovery can be accomplished but only by very experienced and capable technicians such as at the Datlabs workshops and Laboratories and there is sufficient and relevant data available to rebuild the contents of the array.
In the example a recovery requires knowledge of the original and current block sizes and disk order. Datalbs engineers are able to reverse engineer the configuration by iterative means without compromising the stored data.
Further RAID Problems
At default the RAID controller will instigate the rebuild automatically and in fact will exacerbate the problem. The rebuild in progress will destroy areas of stripped data and by the time the effects are apparent it can be too late for remedial work to be effective.
Restoring Drives in the array:
Without detailed knowledge of the disk drive order it is easy to mistakenly pull out a disk that is not the failed one. When this occurs the failed array will in fact be missing two disks and not just one. A two disk failure situation is beyond the auto recovery capabilities of the RAID 5 configuration.
In the majority of cases it is possible to bring the array back to life by re-inserting drives in a specific order however it is essential that drives are labelled corresponding to their original port in order to avoid further cock-ups and also identify, remove, and label the faulty drive.
Be aware however that Datlabs recommend that in ALL cases if you submit a failed RAID 5 array for data recovery and rebuild . IF YOU MESS UP the order in which you insert the disks you will get an enormous number of zeros added and mixed into the data. This sort of damage is generally fatal to subsequent rebuild and recovery attempts.
Frequently Asked Questions about Faulty RAID’s
Here are a few general questions and answers that you may find of interest.
Some answers depend on the capability of your RAID and controller, however you will get the general theme which is
“if you havent done this before and don’t understand how a RAID system works, then dont do it !”
Datlabs recommends that any actions with a RAID system are only undertaken by fully trained and competent technicians and with caution.
Can I delete my RAID array and create a new one without data loss?
Do not delete an array unless it is absolutely necessary. If for whatever crazy reason you are contemplating deleting an array, then back up the data in the array and also verify that this this data back-up can be restored.
Before deleting the array . Datlabs advice : don’t even think about it !
How can I find out what RAID levels are configured on my system?
You can generally see identify your configuration using the System Manager. Typically right-click an array (shown as a “virtual disk” in Array Manager) and select Properties to see what RAID level the array is. You can make RAID arrays easier to identify by naming them based on the RAID level and the physical disks they contain.
Do all drives in a RAID array need to be the same size?
It is recommended for continuity and safety purposes that all drives are of the same capacity and manufacture. In general all drives in an array do not have to be the same size as all drives in the array will default to the smallest drive in the array however you can see the dangers that are evident with installed drives of different capacity, in a fault situation.
Can I hot swap a drive in a RAID configuration?
If your system supports hot-swap-able drives (the ability to replace or insert a drive without powering down the system), you can replace a failed drive in a RAID array with a good drive that is the same size or larger than the other drives in the array. You can also insert spare drives to be configured into arrays or used as hot spares. When you add or replace a drive in an array, the RAID array begins to rebuild using the new drive.
NOTE: Never pull an active drive from an array unless it is placed in a failed state, out of service or prepared for removal.
Can I upgrade controllers without data loss?
A Data Loss situation will occur if you initialize a new controller that stores the configuration data differently than the controller it is replacing.
Think about this ! This is really not a good idea is it ? Get expert advice and assistance.
How do hot spares work?
A hot spare is a drive that is on standby in case another drive fails. Depending on how the array is configured, the drive is either picked up automatically and the array is rebuilt, or you manually select the drive and rebuild the array. Most systems ship with the automatic rebuild feature enabled. When a drive fails, the array rebuilds automatically using the hot spare. This is assuming that automatic rebuild is enabled
Note : If automatic rebuild is disabled, you must manually start the rebuild process. During a rebuild you may notice degraded performance on the drives.
How do I replace a drive?
If you introduce a new drive into the same slot where a bad drive was located, the fallback will generally be automatic (assuming that automatic rebuild is enabled on the system). In other words, a new drive inserted into the same slot as a previously bad drive acts as a dedicated hot spare for that array.
What is the rebuild rate?
In RAID 1, 5, 10, arrays, you can rebuild a failed drive by re-creating the data that was stored on the drive before it failed. The rebuild rate is the percentage of the compute cycles dedicated to rebuilding failed drives. A rebuild rate of 100 per cent means that the system is totally dedicated to rebuilding the failed drive, while a zero per cent rebuild rate means that the rebuild occurs only when the system is not doing anything else.
What are stripe size and width?
Disk striping, which enables data to be written across multiple hard drives, partitions each drive into stripes that can vary in size . The stripes are interleaved, and the combined storage space consists of stripes from each drive. Stripe width is the number of disks involved in an array where striping is implemented. For example, a four-disk array with disk striping has a stripe width of four. Stripe size is the length of the interleaved data segments that a RAID controller writes across multiple drives. Disk striping enhances performance because multiple drives are accessed simultaneously, but it does not provide data redundancy.
Is disk spanning the same thing as RAID?
No. Disk spanning combines multiple drives and displays them in the operating system as one drive. For example, four 1 TB hard drives that are spanned appear as one 4 TB drive in the operating system. Disk spanning alone provides no data protection.