RAID System Diagnosis.
Our RAID system diagnosis will pin point and correct any underlying fault conditions that have caused your system failure. Typically hard drive faults such as bad sectors or controller card electrical component failures are identified and resolved.
File System Recovery.
Having corrected the physical elements of the system our technicians will duplicate (clone) each hard drive and attempt to reconstruct your damaged File System (without compromising your original data.) in order to facilitate access to your data files.
File Listing and Data Integrity.
The next step is for our technicians to produce a list of available files and folders and also a statement of their condition ready for you to review.
Data and System Restore.
Once you have agreed the veracity of the available system data our technicians will recover your data to new storage media and make arrangements for its return. Restoration may involve cooperation between our technical support team and your IT support team to ensure data is accessible to the end user.
Typical RAID Failure Problems:
RAID System Hard Drive Drive Failures
A RAID array is essentially a number of hard drives across which data is stored or replicated for the purpose of improving system performance, security or a combination of both. RAID arrays are usually configured and managed as a part of a maintenance regime either automatic or manual that militates against the possibility of data loss. The greatest risks to the data stored on a RAID array are hard disk drive failure, malware attack or poor maintenance procedures.
RAID System Single Drive Failure.
Do you have a single RAID hard drive failure, or a multiple hard drive failure. A single hard drive failure is entirely self repairable however the RAID may remain operating in a degraded mode. Most RAID manufacturers provide a utility that will quickly add a new hard drive to the disk array and restore your RAID configuration to its original state and function as normal.
Following the failure of a hard drive within the RAID Array, the system may still be accessible however its subsequent operation without fault tolerance or redundancy means it remains vulnerable to a catastrophic system failure. In this case all current data should be backed up before any rebuild is attempted. It is also probable that the contemporary hard drives making up the RAID volume are now at consequential risk of failure.
You should also be aware that a RAID rebuild process is generally IO intensive and can put a greater workload on potentially problematic hard disks within the volume/s. Under these circumstances a re-configuration of applications may not be wise i.e. if your rebuild fails you may end up with more failed hard disk drives than you bargained for.
RAID Hard Drive Replacement
For many arrays a drive that is accumulating errors will be forced out of service and its data reconstructed across the remaining good drives available to the array controller.
In this case the data on the failing hard drive must be rebuilt from the parity data on the remaining active drives and written to the standby drive Post-failure replacement takes considerably longer due to the calculations that must take place in order to rebuild the data. To militate against the risk of drive failures, you should always try to ensure that the RAID array you have is first of all capable of actually performing a rebuild and also that it has compatible hot swap hard drives or replacements to rebuild to ! A rigorous data back-up regime is also a must do for any server system.
Scheduled system rebuilds are normally better undertaken when system downtime can be tolerated as they can take a considerable time where the stored data volumes are relatively large. On most RAID configured systems, rebuilds can be prioritized against other system related activities such that the rebuild will occur in preference to operational demands.
Raid System Multiple Drive Failure.
In cases where a RAID array has more than one physical hard drive failure, it is almost impossible to perform an effective RAID recovery without the proper professional level data recovery tools. In essence, in order to repair a multi-drive RAID failure, you will typically have to rebuild at least one of the hard drives from scratch, make it functional, and then re-add it to the array while ensuring the data is absolutely intact.
RAID System Controller Problems
RAID controllers manage data storage, access and the maintenance of your multi disk system. Implementations of RAID controllers include Mylex, Adaptec, Compaq, HP and IBM. These implementations can rebuild a failed data volume from a standby drive or a replacement drive. A rebuild will however fail if two disk volumes fail simultaneously or if part of the native configuration is actually stored on a single failed volume. RAID’s can also fail as a result of the following situations and frequently a combination of one or more of them:
- Malfunctioned Controller
- Raid rebuild error or volume reconstruction problem
- Missing RAID partition
- Multiple disk failure in off-line state resulting in loss of RAID volume
- Wrong replacement of good disk element belonging to a working raid volume
- Power Surge
- Data Deletion or reformat
- Virus Attack
- Loss of RAID configuration settings or system registry
- Inadvertent reconfiguration of RAID volume
- Loss of RAID disk access after system or application upgrade.
RAID 5 Bungled Rebuild
Datlabs technicians are frequently engaged by customers who have inadvertently bungled the RAID 5 rebuild process. Once a mistake has been made it is not obvious that there is no longer a simple means of rebuilding the RAID and restoring the stored data and operating system. The damage occurs if one removes several disks from the RAID 5 array, then plugs them back in a different order, and then performs a RAID 5 rebuild. The RAID 5 rebuild, sometimes called a re-synch, re-calculates and rewrites the XOR parity blocks of the array. A rebuild is executed automatically once the drive is removed and re-inserted, or after a power failure.The damage caused can be explained as :-
If you have problem with your RAID server, some of the processes listed below may help you to minimize further loss of data or at least increase your chance of successful recovery with the right expert.
|With drives swapped|
|After the rebuild|
P is the original parity and X the new parity.
Recovering RAID System Configuration Tables
In the above case a software recovery will not be possible. A manual recovery can be accomplished but only by very experienced and capable technicians such as at the Datlabs workshops and Laboratories and there is sufficient and relevant data available to rebuild the contents of the array.
In the example a recovery requires knowledge of the original and current block sizes and disk order. Datalbs engineers are able to reverse engineer the configuration by iterative means without compromising the stored data.
Further RAID Problems
At default the RAID controller will instigate the rebuild automatically and in fact will exacerbate the problem. The rebuild in progress will destroy areas of stripped data and by the time the effects are apparent it can be too late for remedial work to be effective.
Restoring Drives in a RAID Array:
Without detailed knowledge of the disk drive order it is easy to mistakenly pull out a disk that is not the failed one. When this occurs the failed array will in fact be missing two disks and not just one. A two disk failure situation is beyond the auto recovery capabilities of the RAID 5 configuration.
In the majority of cases it is possible to bring the array back to life by re-inserting drives in a specific order however it is essential that drives are labelled corresponding to their original port in order to avoid further cock-ups and also identify, remove, and label the faulty drive.
Be aware however that Datlabs recommend that in ALL cases if you submit a failed RAID 5 array for data recovery and rebuild . IF YOU MESS UP the order in which you insert the disks you will get an enormous number of zeros added and mixed into the data. This sort of damage is generally fatal to subsequent rebuild and recovery attempts.
RAID Frequently Asked Questions
Here are a few general questions and answers that you may find of interest.
Some answers depend on the capability of your RAID and controller, however you will get the general theme which is
“if you haven’t done this before and don’t understand how a RAID system works, then dont do it !”
Datlabs recommends that any actions with a RAID system are only undertaken by fully trained and competent technicians and with caution.
Can I delete my RAID array and create a new one without data loss?
Do not delete an array unless it is absolutely necessary. If for whatever crazy reason you are contemplating deleting an array, then back up the data in the array and also verify that this this data back-up can be restored.
Before deleting the array . Datlabs advice : don’t even think about it !
How can I find out what RAID levels are configured on my system?
You can generally see identify your configuration using the System Manager. Typically right-click an array (shown as a “virtual disk” in Array Manager) and select Properties to see what RAID level the array is. You can make RAID arrays easier to identify by naming them based on the RAID level and the physical disks they contain.
Do all drives in a RAID array need to be the same size?
It is recommended for continuity and safety purposes that all drives are of the same capacity and manufacture. In general all drives in an array do not have to be the same size as all drives in the array will default to the smallest drive in the array however you can see the dangers that are evident with installed drives of different capacity, in a fault situation.
Can I hot swap a drive in a RAID configuration?
If your system supports hot-swap-able drives (the ability to replace or insert a drive without powering down the system), you can replace a failed drive in a RAID array with a good drive that is the same size or larger than the other drives in the array. You can also insert spare drives to be configured into arrays or used as hot spares. When you add or replace a drive in an array, the RAID array begins to rebuild using the new drive.
NOTE: Never pull an active drive from an array unless it is placed in a failed state, out of service or prepared for removal.
Can I upgrade controllers without data loss?
A Data Loss situation will occur if you initialize a new controller that stores the configuration data differently than the controller it is replacing.
Think about this ! This is really not a good idea is it ? Get expert advice and assistance.
How do hot spares work in a RAID System?
A hot spare is a drive that is on standby in case another drive fails. Depending on how the array is configured, the drive is either picked up automatically and the array is rebuilt, or you manually select the drive and rebuild the array. Most systems ship with the automatic rebuild feature enabled. When a drive fails, the array rebuilds automatically using the hot spare. This is assuming that automatic rebuild is enabled
Note : If automatic rebuild is disabled, you must manually start the rebuild process. During a rebuild you may notice degraded performance on the drives.
How do I replace a drive?
If you introduce a new drive into the same slot where a bad drive was located, the fallback will generally be automatic (assuming that automatic rebuild is enabled on the system). In other words, a new drive inserted into the same slot as a previously bad drive acts as a dedicated hot spare for that array.
What is the rebuild rate?
In RAID 1, 5, 10, arrays, you can rebuild a failed drive by re-creating the data that was stored on the drive before it failed. With a RAID there is a factor called the “rebuild rate” This is essentially the amount of system resource available to the task of rebuilding failed drives. 100 % means that the system is dedicated to rebuilding a failed drive, whilst a zero per cent means that the rebuild occurs only when the system is not doing other tasks.
What are stripe size and width?
The term “stripe” refers to a block of data that can be written to a group of hard drives as part of a RAID system. Stripes can vary in size are can generally be configured for optimum performance. The term “stripe width” is defined as the number of disks across which striping is implemented. For example, a four-disk array with disk striping has a value of four. The term “stripe size” is the actually length of the interleaved data segments that a RAID controller writes across multiple drives. Striping data across a number of hard disks means that the data within a file can be accessed simultaneously thereby improving access times and system performance. The striping of data alone however does not provide redundancy.
Is Disk Spanning the same thing as a RAID?
The simple answer is No. Disk spanning uses a number of physical hard drives and presents them to the operating system as a single unit of storage. For example four spanned 1 TB hard drives appear as one 4 TB drive to the operating system. Disk spanning alone provides no data protection against the failure of a hard drive and the potential consequential loss of data.