Reducing Energy Overhead in Server Rooms:
A study relating to temperature vs reliability in server rooms and racks indicates that the effects on system reliability are smaller than often assumed i.e. ; .
- For specific reliability issues namely DRAM failures and node outages, there is no direct correlation relating to relatively higher temperatures.
- For those error conditions that show a correlation i.e. latent sector errors in disks and disk failures, the correlation is much weaker than expected,
- For (device internal) temperatures below 50C, errors tend to grow linearly with temperature, rather than exponentially, as existing models suggest.
The results indicate that data centres could save on expensive server cooler costs without making significant sacrifices in system reliability.” Raising what is termed the set point temperature can reduce carbon footprint. It’s estimated that data centres could potential save circa 5 % in energy costs for every degree north of the set point.
There are several reasons for caution however as warmer set points may allow less time to recover from a cooling failure. Another issue is managing fan activity; fans tend to kick in as the temperature rises, nullifying gains from turning down the cooling. The study indicated that heat may be less important than temperature fluctuation in reducing hardware failures. “Even failure conditions, such as node outages, that did not show a correlation with temperature, did show a clear correlation with the variability in temperature,” the authors wrote. “Efforts in controlling such factors might be more important in keeping hardware failure rates low, than keeping temperatures low.”
Russ Comments :
“The maxim that electronic equipment failure rates double for every 10C temperature increase has been around for some time however given current “climate change” imperative a broader study is needed to reduce data storage industry carbon footprint further. “