Our Blog

Hard Disk Error Detection using S.M.A.R.T

Posted by:

Problem24/7 Network Engineers faced a hard disk problem while building RAID array for one of their servers. One of the disks had a few bad sectors that caused a crash during Raid build. ‘n’ numbers of hard disk plugin had to be called ‘n’ number of times.

24/7 Network Engineers faced a hard disk problem while building RAID array for one of their servers. One of the disks had a few bad sectors that caused a crash during Raid build. ‘n’ numbers of hard disk plugin had to be called ‘n’ number of times.

S.M.A.R.T OverviewS.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) is a monitoring system for computer hard disk drives that provides the detection and reporting of various indicators of reliability. The inability to read some sectors is not always an indication that a drive is about to fail. One way that unreadable sectors may be created, even when the drive is functioning within specification, is through a sudden power failure while the drive is writing. Also, even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area. The predicted failure may be catastrophic or may be something as subtle as the inability to write to a certain sector, or perhaps slower performance than manufacturer’s declared minimum is often used for early detection of potential future failures. Implementing effective S.M.A.R.T. monitoring with Nagios offers the following benefits:

S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) is a monitoring system for computer hard disk drives that provides the detection and reporting of various indicators of reliability. The inability to read some sectors is not always an indication that a drive is about to fail. One way that unreadable sectors may be created, even when the drive is functioning within specification, is through a sudden power failure while the drive is writing. Also, even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area. The predicted failure may be catastrophic or may be something as subtle as the inability to write to a certain sector, or perhaps slower performance than manufacturer’s declared minimum is often used for early detection of potential future failures. Implementing effective S.M.A.R.T. monitoring with Nagios offers the following benefits:

  • Fast detection of storage subsystem problems
  • Early detection of potential future failures
  • Reduced risk of unexpected downtime

Solution24/7 Network Engineers wrote Nagios compatible wrapper plugin to validate the health of attached disks. Plugins were called and scanned for any of attached hard disk errors. If found any errors, plugins were removed from the production server to avoid data loss and probable system crash.

24/7 Network Engineers wrote Nagios compatible wrapper plugin to validate the health of attached disks. Plugins were viewed and scanned for any of attached hard disk errors. After scanning, affected plugins were removed from the production server to avoid any data loss and probable system crash.

0


Add a Comment

Time limit is exhausted. Please reload CAPTCHA.

# #