Failure Trends in a Large Disk Drive Population
2007
It is
estimated that over 90% of all new information produced in the world is
being stored on magnetic media, most of it on hard disk drives. Despite
their importance, there is relatively little published work on the
failure patterns of disk drives, and the key factors that affect their
lifetime. Most available data are either based on extrapolation from
accelerated aging experiments or from relatively modest sized field
studies. Moreover, larger population studies rarely have the
infrastructure in place to collect health signals from components in
operation, which is critical information for detailed failure analysis.
We present
data collected from detailed observations of a large disk drive
population in a production Internet services deployment. The population
observed is many times larger than that of previous studies. In addition
to presenting failure statistics, we analyze the correlation between
failures and several parameters generally believed to impact longevity.
Our
analysis identifies several parameters from the drive’s self monitoring
facility (SMART) that correlate highly with failures. Despite this high
correlation, we conclude that models based on SMART parameters alone are
unlikely to be useful for predicting individual drive failures.
Surprisingly, we found that temperature and activity levels were much
less correlated with drive failures than previously reported.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI