RAID System Technical Support Team.
RAID configured disk arrays provide redundancy, capacity and performance but once they fail they are difficult to rebuild and restore which if not professionally dealt with can have adverse consequences for both business operations and finances. Our first priority is to understand the situation you and your business finds itself in and then assist with a plan of action that militates against further damages and uncertainty. Following the initial consultancy the team will agree with you a data recovery plan and provide clear estimates of costs and timescales.
RAID Fault Diagnosis.
To rebuild a RAID system our technicians will need to remedy both hardware faults and corrupt software structures. The diagnosis process therefore begins with our technicians determining the fault conditions that have contributed to the system failure and have prevented a successful rebuild.
Firstly our technicians create individual duplicate copies of each of the hard disks in the faulty system, repairing faulty hard disk drives in the process. Once a full or adequate complement of all storage devices has been created our technicians will examine each device to determine RAID type, disk sequence and fault information data produced during fault conditions.
Data Recovery from RAID Systems needing Config Rebuild.
Once the data description that constitute the RAID logical volume has been identified, critical data will then be extracted to alternative media. During the extraction process any damaged data structures are repaired and data integrity checked and a file listing produced for review. Given the volumes of data stored on RAID Systems this process can be time consuming.
RAID File Recovery and System Restore.
Once your data and file listings have been recovered your system will require. rebuild , recommissioning and system restore. This may require cooperation between our technical support team and your IT support team to ensure data is accessible to the end user. Once you have a full system restore the Datlabs technical team will undertake a system review to ensure you have the necessary contingency management and disaster recovery systems in place.
RAID Recovery Offices.
When a RAID Hard Disk Drive exhibits as faulty it is essential that it it is examined in a clean air and ESD compliant workshop, particularly when the hard drive concerned forms an integral part of a striped volume. It is neither commercially viable or practical to have leading edge workshops and hard disk drive repair facilities and capabilities in all major cities. Datlabs clean air laboratory and workshop facilities are strategically located in Manchester. This RAID recovery centre of excellence is served by easy access to motorway and rail networks and is the home of Industry leading technicians. Using our dedicated express courier service your failed server can be transported and worked on within hours of your first call.
Resolving RAID System Faults.
Our Data Recovery Technical team successfully rebuild failed multi hard disk drive RAID configured server systems. We rebuild failed Windows, Linux, Dell, IBM, HP, Fujitsu and many other system platforms. If you have a RAID system failure, failed back-up, corrupt file system or corrupt RAID configuration table, then our technical team has seen them all many times before. Rebuilding and recovering data from faulty RAID configured servers and storage systems is what we do !
RAID Rebuild and Data Recovery.
If your organisation has been disrupted by a failed RAID system with a consequential loss of data our RAID data recovery experts are on standby 24 x 7 to assist in getting your business back up and running as quickly as possible.
We provide data recovery and system restore solutions for all RAID levels, and we are experienced at dealing with everything from a RAID controller problem to power surges and virus attacks.
- Repair failed hard disk drives to make them operational.
- Clone all your system hard drives and volumes.
- Restore, rebuild and configure system and file structures.
- Extract, check and test your operational data.
- Rebuild server system and restore data.
- Validate operational capability.
Typical RAID Problems:
RAID Hard Disk Drive Failure
A RAID array is essentially a number of hard drives across which data is stored or replicated for the purpose of improving system performance, security or a combination of both. RAID arrays are usually configured and managed as a part of a maintenance regime either automatic or manual that militates against the possibility of data loss. The greatest risks to the data stored on a RAID array are hard disk drive failure, malware attack or poor maintenance procedures.
RAID Hard Disk Drive Replacement
For many arrays a drive that is accumulating errors will be forced out of service and its data reconstructed across the remaining good drives available to the array controller.
In this case the data on the failing hard disk drive must be rebuilt from the parity data on the remaining active drives and written to a hot spare. Post-failure replacement takes considerably longer due to the calculations that must take place in order to rebuild the data. To militate against the risk of drive failures, you should always try to ensure that the RAID array you have is first of all capable of actually performing a rebuild and also that it has compatible hot swap hard drives or replacements to rebuild to ! A rigorous data back-up regime is also a must do for any server system.
Scheduled system rebuilds are normally better undertaken when system downtime can be tolerated as they can take a considerable time where the stored data volumes are relatively large. On most RAID configured systems, rebuilds can be prioritised against other system related activities such that the rebuild will occur in preference to operational demands.
RAID Single Hard Disk Drive failure
Following the failure of a hard drive within the RAID Array, the system may still be accessible however its subsequent operation without fault tolerance/redundancy means it is left vulnerable to a catastrophic system failure. In this case all current data should be backed up before any rebuild is attempted. It is also probable that the contemporary hard drives making up the RAID volume are now at consequential risk of failure.
You should also be aware that a RAID rebuild process is generally IO intensive and can put a greater workload on potentially problematic hard disks within the volume/s. Under these circumstances a re-configuration of applications may not be wise i.e. if your rebuild fails you may end up with more failed hard disk drives than you bargained for.
RAID Controller Problems
RAID controllers manage data storage, access and the maintenance of your multi disk system. Implementations of RAID controllers include Mylex, Adaptec, Compaq, HP and IBM. These implementations can rebuild a failed data volume from a hot standby drive or a replacement drive through a hot swap. A rebuild will however fail if two disk volumes fail simultaneously or if part of the native configuration is actually stored on a single failed volume. RAID’s can also fail as a result of the following situations and frequently a combination of one or more of them:
- Malfunctioned Controller
- Raid rebuild error or volume reconstruction problem
- Missing RAID partition
- Multiple disk failure in off-line state resulting in loss of RAID volume
- Wrong replacement of good disk element belonging to a working raid volume
- Power Surge
- Data Deletion or reformat
- Virus Attack
- Loss of RAID configuration settings or system registry
- Inadvertent reconfiguration of RAID volume
- Loss of RAID disk access after system or application upgrade.
RAID 5 Bungled Rebuild
Datlabs technicians are frequently engaged by customers who have inadvertently bungled the RAID5 rebuild process. Once a mistake has been made it is not obvious that there is no longer a simple means of rebuilding the RAID and restoring the stored data and operating system. The damage occurs if one removes several disks from the RAID5 array, then plugs them back in a different order, and then performs a RAID 5 rebuild. The RAID 5 rebuild, sometimes called a re-synch, re-calculates and rewrites the XOR parity blocks of the array. A rebuild is executed automatically once the drive is removed and re-inserted, or after a power failure.The damage caused can be explained as :-
If you have problem with your RAID server, some of the processes listed below may help you to minimize further loss of data or at least increase your chance of successful recovery with the right expert.
|With drives swapped|
|After the rebuild|
P is the original parity and X the new parity.
Recovering RAID Configuration tables
In the above case a software recovery will not be possible. A manual recovery can be accomplished but only by very experienced and capable technicians such as at the Datlabs workshops and Laboratories and there is sufficient and relevant data available to rebuild the contents of the array.
In the example a recovery requires knowledge of the original and current block sizes and disk order. Datalbs engineers are able to reverse engineer the configuration by iterative means without compromising the stored data.
Further RAID Problems
At default the RAID controller will instigate the rebuild automatically and in fact will exacerbate the problem. The rebuild in progress will destroy areas of stripped data and by the time the effects are apparent it can be too late for remedial work to be effective.
Restoring Drives in the array:
Without detailed knowledge of the disk drive order it is easy to mistakenly pull out a disk that is not the failed one. When this occurs the failed array will in fact be missing two disks and not just one. A two disk failure situation is beyond the auto recovery capabilities of the RAID 5 configuration.
In the majority of cases it is possible to bring the array back to life by re-inserting drives in a specific order however it is essential that drives are labelled corresponding to their original port in order to avoid further cock-ups and also identify, remove, and label the faulty drive.
Be aware however that Datlabs recommend that in ALL cases if you submit a failed RAID 5 array for data recovery and rebuild . IF YOU MESS UP the order in which you insert the disks you will get an enormous number of zeros added and mixed into the data. This sort of damage is generally fatal to subsequent rebuild and recovery attempts.
Frequently Asked Questions about Faulty RAID’s
Here are a few general questions and answers that you may find of interest.
Some answers depend on the capability of your RAID and controller, however you will get the general theme which is
“if you havent done this before and don’t understand how a RAID system works, then dont do it !”
Datlabs recommends that any actions with a RAID system are only undertaken by fully trained and competent technicians and with caution.
Can I delete my RAID array and create a new one without data loss?
Do not delete an array unless it is absolutely necessary. If for whatever crazy reason you are contemplating deleting an array, then back up the data in the array and also verify that this this data back-up can be restored.
Before deleting the array . Datlabs advice : don’t even think about it !
How can I find out what RAID levels are configured on my system?
You can generally see identify your configuration using the System Manager. Typically right-click an array (shown as a “virtual disk” in Array Manager) and select Properties to see what RAID level the array is. You can make RAID arrays easier to identify by naming them based on the RAID level and the physical disks they contain.
Do all drives in a RAID array have to be the same size?
It is recommended for continuity and safety purposes that all drives are of the same capacity and manufacture . In general all drives in an array do not have to be the same size as all drives in the array will default to the smallest drive in the array however you can see the dangers that are evident with installed drives of different capacity , in a fault situation.
Can I hot swap a drive in a RAID configuration?
If your system supports hot-swappable drives (the ability to replace or insert a drive without powering down the system), you can replace a failed drive in a RAID array with a good drive that is the same size or larger than the other drives in the array. You can also insert spare drives to be configured into arrays or used as hot spares. When you add or replace a drive in an array, the RAID array begins to rebuild using the new drive.
NOTE: Never pull an active drive from an array unless it is placed in a failed state, out of service or prepared for removal.
Can I upgrade controllers without data loss?
A Data Loss situation will occur if you initialize a new controller that stores the configuration data differently than the controller it is replacing.
Think about this ! This is really not a good idea is it ? Get expert advice and assistance.
How do hot spares work?
A hot spare is a drive that is on standby in case another drive fails. Depending on how the array is configured, the drive is either picked up automatically and the array is rebuilt, or you manually select the drive and rebuild the array. Most systems ship with the automatic rebuild feature enabled. When a drive fails, the array rebuilds automatically using the hot spare. This is assuming that automatic rebuild is enabled
Note : If automatic rebuild is disabled, you must manually start the rebuild process. During a rebuild you may notice degraded performance on the drives.
How do I replace a failed drive?
If you introduce a new drive into the same slot where a bad drive was located, the failback will generally be automatic (assuming that automatic rebuild is enabled on the system). In other words, a new drive inserted into the same slot as a previously bad drive acts as a dedicated hot spare for that array.
What is the rebuild rate?
In RAID 1, 5, 10, arrays, you can rebuild a failed drive by re-creating the data that was stored on the drive before it failed. The rebuild rate is the percentage of the compute cycles dedicated to rebuilding failed drives. A rebuild rate of 100 per cent means that the system is totally dedicated to rebuilding the failed drive, while a 0 per cent rebuild rate means that the rebuild occurs only when the system is not doing anything else.
What are stripe size and width?
Disk striping, which enables data to be written across multiple hard drives, partitions each drive into stripes that can vary in size . The stripes are interleaved, and the combined storage space consists of stripes from each drive. Stripe width is the number of disks involved in an array where striping is implemented. For example, a four-disk array with disk striping has a stripe width of four. Stripe size is the length of the interleaved data segments that a RAID controller writes across multiple drives. Disk striping enhances performance because multiple drives are accessed simultaneously, but it does not provide data redundancy.
Is disk spanning the same thing as RAID?
No. Disk spanning combines multiple drives and displays them in the operating system as one drive. For example, four 1 TB hard drives that are spanned appear as one 4 TB drive in the operating system. Disk spanning alone provides no data protection.