DELL Poweredge RAID Failure.
Datlabs RAID Rapid Response Team are available 24 x 7 and have unparalleled experience in DELL PowerEdge RAID configured Servers with success rates second to non .
Typically we have rebuilt and recovered:
- Failure of one or more hard drives
- Server hardware failure
- Faulty PERC controllers
- Incorrectly Rebuilt Arrays
- Systems Reconfigured in error.
When a drive fails, all logical drives that are in the same array are affected. Each logical drive in an array might be using a different fault-tolerance method, so each logical drive can be affected differently.
- RAID 0 configurations do not tolerate drive failure. If any physical drive in the array fails, all RAID 0 logical drives in the same array also fail.
- RAID 1 and RAID 10 configurations tolerate multiple drive failures if no failed drives are mirrored to one another.
- RAID 5 configurations tolerate one drive failure.
- RAID 50 configurations tolerate one failed drive in each parity group.
- RAID 6 configurations tolerate two failed drives at a given time.
- RAID 60 configurations tolerate two failed drives in each parity group.
- RAID 1 (ADM) and RAID 10 (ADM) configurations.
Before replacing drives ensure that the array has a current valid backup.
Confirm that the replacement drive is of the same type as the degraded drive (either SAS or SATA and either hard drive or solid state drive) also that the replacement drive has a capacity equal to or larger than the capacity of the smallest drive in the array. The controller will immediately fails if a drive is introduced that has insufficient capacity.
Systems with External Data Storage.
Ensure that the server is the first unit to be powered down and the last unit to be powered up. Taking this precaution ensures that the system does not, erroneously, mark the drives as failed when the server is powered up.
Replacing RAID Hard Disk Drives.
The most common reason for replacing a drive is that it has failed. However, another reason is to enhance the storage capacity of the system.
In fault-tolerant configuration, hot-plug hard drives can be replaced when server is ON, but in case of a Non-hot-plug hard drive, it should be replaced when server is OFF.
For systems that support hot-pluggable drives, if user replace a failed drive that belongs to a fault-tolerant configuration while the system power is on, all drive activity in the array pauses for 1 or 2 seconds while the new drive is initializing. When the drive is ready, Data Recovery to the replacement drive begins automatically. For systems that support non-hot-pluggable drives, if you replace a drive belonging to a fault-tolerant configuration while the system power is off, a POST message appears when the system is next powered up. This message prompts user to press the F1 key to start automatic Data Recovery.
If user do not enable automatic data recovery, the logical volume remains in a ready-to-recover condition and the same POST message appears whenever the system is restarted. Automatic data recovery (rebuild) When user replace a drive in an array, the controller uses the fault-tolerance information on the remaining drives in the array to reconstruct the missing data (the data that was originally on the replaced drive) and then write the data to the replacement drive. This process is called automatic Data Recovery or Rebuild.
If fault tolerance is compromised, the controller cannot reconstruct the data, and the data is likely to be lost permanently without the services of a professional data recovery provider.
Time Required For A Rebuild.
The time required for a rebuild depends on several factors:
- The priority that the rebuild is given over normal I/O operations
- The amount of I/O activity during the rebuild operation.
- The average bandwidth capability (MBps) of the drives.
- The availability of drive cache.
- The brand, model, and age of the drives.
- The amount of unused capacity on the drives.
- For RAID 5 and RAID 6, the number of drives in the array.
- The strip size of the logical volume.
- Firmware versions of the Smart Array Controller and Hard Disk Drive.
A System could be unprotected against hard disk drive failure for an extended period during rebuild and upgrade. When possible, perform rebuild during periods of low system utilization.