NAS RAID Failure.
NAS RAID 5 Failure Probability.
Most NAS boxes have a Linux operating system and a proprietary RAID controller installed. Common NAS RAID 5 configurations are single hard disk drive fault tolerant. If two hard drives in this type of NAS RAID array fail at the same time then the system will fail to rebuild. So what is the probability of two hard disk drives in a RAID 5 array failing at the same time? Ought we be concerned ? and are the newer NAS products with Dual Disk Redundancy worth considering at a reduced capacity pay off?
NAS RAID 5 Single Hard Drive Failure Probability.
Depending on the choice of hard disk drive, its build type, design and specifications then a typical NAS ready hard disk drive such as a Western Digital Red will have a nominal Mean Time Between Failure (MTBF) specification of say 1,000,000 hours which represents a chance of failure of 1% per year. With three hard disk drives of similar type and specification installed in a RAID 5 configured NAS with similar utilisation then the probability of a single hard drive failure can be shown to be circa 3%. per annum.
NAS RAID 5 Hard Drive Failure.
With three drives in a RAID 5 then a rough calculation for a simultaneous hard disk drive failing is about one in a hundred thousand. The total probability of a double failure is 3 in a million per year. In other words the likelihood of simultaneous Hard Disk Drive failure in a multi disk drive NAS RAID can be shown to be circa three times that of a single drive failure. Actual in-situ probability calculations will vary considerably dependant on RAID controller type, batch, environment, electrical noise, ambient operating temperature et. al. and in all cases will be greater than 3 in a million per annum.
NAS RAID 5 Rebuild Failure.
The issue of failure of the system to rebuild with a failed hard disk drive lies with the RAID controller and how it is configured. If you configure your controller for no errors on rebuild then your rebuild will stop after a drive failure at the first point it encounters an un-correctable bad sector read error on the remaining “good” drives. If the first drive that failed was readable on that sector a full recovery of the stored data ought to be possible using professional data recovery services.
A single unreadable sector isn’t unusual among the tens of millions of sectors on a modern drive. If the sector has never been written to, there is no occasion for the drive electronics or the operating system to be aware it is bad. If the operating system tried to write to it, the drive would automatically remap the sector and no damage would be done – not even a log entry. But that one bad sector means the RAID array will not rebuild successfully no matter where on the disk it is if one other drive has already been failed.
Revisiting the arbitrary MTBF calculation and extrapolating a RAID 5 failure MTBF in the knowledge that over half of hard disk drives have a remap bad sector in their first year of operation yields an MTBF figure of 5% per annum.
Hard Disk Drive Cooling.
Hard Disk Drives operating at an ambient temperature of 10 or 20 degrees cooler than the top end specification will perform better. New to market NAS ready hard disk drives such as the WD Red have an install operating temperature range of 0-70 degrees and are designed to run considerably cooler than their predecessors, so manufacturers are now designing and delivering products better suited to always on situations in less than ideal environments.
Hard Disk Drive Hot Swapping.
A hot swap spare will militate against the probability of RAID system downtime. Paradoxically however the availability of a hot swap spare will not necessarily protect against a failed rebuild scenario. On balance however it is better to hedge your bets and install a hot swap than have nothing at all.
NAS RAID System Maintenance.
Maintenance personnel can militate against the possibility of failure by periodically undertaking a read and write of the entire drive surface periodically and replace any drives with un-correctable block errors. It is better to configure your controller to ignore bad sectors during rebuild.
NAS RAID Dual Drive Redundancy.
When a hard drive fails in a RAID configured system the overall propensity towards system failure is significant. Two hard disk drives can fail at the same time and this discussion indicates the likely hood of complete system failure following a single hard drive failure is an increased probability. Whilst enabling Dual Drive Redundancy does give you less of the overall available storage space this facility does provide a much needed insurance against prolonged system downtime and also potential valuable data loss. The relatively new to market “designed for NAS” hard disk drives such as the Western Digital RED come in adequately large capacity to cater for DDR under demanding cost /performance criteria making DDR a viable option. Our advice however is to enable your system Dual Drive Redundancy during your initial set-up NOT as an afterthought.