Saturday, June 26, 2021

ZFS Diagnose and Replace Bad Disk

 Discovering a Bad Disk

zpool status

I check my ZFS pools regularly. Proxmox is nice, because it will send you emails, but other systems might not. Still, I make it a curiosity once a week to manually check the size and health of my storage pools.

~$ zfs list
NAME    USED  AVAIL     REFER  MOUNTPOINT
tank  8.74T  1.81T     8.53T  /tank
~$ zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 2.20M in 0 days 00:00:10 with 0 errors on Sat Jun 26 04:06:30 2021
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            wwn-0x5000c50079975b01  ONLINE       0     0     3
            wwn-0x5000039fe3cf4293  ONLINE       0     0     0
            wwn-0x5000039fe3cfcb2f  ONLINE       0     0     0
            wwn-0x5000039fe3cf3491  ONLINE       0     0     0
            wwn-0x5000039fe3cfbe45  ONLINE       0     0     0

errors: No known data errors

smartctl

We can see from the zpool status that one of our disks has a problem. The pool status is reporting an error, and we can see one disk has failed checksums 3 times. But is this disk bad? I can check it with smartctl.

I've allocated my pool disks by ID, so zpool status will report the disks using that ID. To check a disk with smartctl, simply pass the command the path using the same disk ID.

Note: Some OSes, need to have smartctl installed. In Ubuntu, it can be installed using "sudo apt install smartmontools"

~$ sudo smartctl -x /dev/disk/by-id/wwn-0x5000c50079975b01
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-77-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z500JERL
LU WWN Device Id: 5 000c50 079975b01
Firmware Version: CC25
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jun 26 18:15:07 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   89) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 330) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   106   099   006    -    29690410
  3 Spin_Up_Time            PO----   094   093   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    77
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   083   060   030    -    214055033
  9 Power_On_Hours          -O--CK   046   046   000    -    47624
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    76
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   098   098   000    -    2
188 Command_Timeout         -O--CK   100   100   000    -    0 0 0
189 High_Fly_Writes         -O-RCK   084   084   000    -    16
190 Airflow_Temperature_Cel -O---K   070   057   045    -    30 (Min/Max 27/33)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    37
193 Load_Cycle_Count        -O--CK   098   098   000    -    5695
194 Temperature_Celsius     -O---K   030   043   000    -    30 (0 15 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    46235h+43m+57.716s
241 Total_LBAs_Written      ------   100   253   000    -    16226994503
242 Total_LBAs_Read         ------   100   253   000    -    11227817981448
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

The part of the smartctl output we are interested in is the Attributes Table. We can see that some attibutes are "P - prefailure" (which is generally OK, and herolds the age of this disk) and some have an "R - error rate". The "error rate" attributes in particular communicate a failure.

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   106   099   006    -    29690410
  7 Seek_Error_Rate         POSR--   083   060   030    -    214055033
189 High_Fly_Writes         -O-RCK   084   084   000    -    16
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0

These four rows help explain what is going on with this disk. We can see that these values are above the manufacturers thresholds for this drive. Read errors are at 106/6, seek errors are at 83/30, high fly writes are at 84/0, and CRC errors are at 200/0. Seems like ZFS was right, and this drive is starting to fail.

Replacing the Disk

zpool offline & shutdown

I'm out of SATA ports in this system, so we will offline the disk, physically replace it, then run the "zpool replace" command. 

~$ sudo zpool offline tank wwn-0x5000c50079975b01
~$ sudo zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 2.20M in 0 days 00:00:10 with 0 errors on Sat Jun 26 04:06:30 2021
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          raidz1-0                  DEGRADED     0     0     0
            wwn-0x5000c50079975b01  OFFLINE      0     0     3
            wwn-0x5000039fe3cf4293  ONLINE       0     0     0
            wwn-0x5000039fe3cfcb2f  ONLINE       0     0     0
            wwn-0x5000039fe3cf3491  ONLINE       0     0     0
            wwn-0x5000039fe3cfbe45  ONLINE       0     0     0

errors: No known data errors
~$ sudo shutdown now

Replace the disk

Now turn off the system and physically replace the disk. It's a good idea to write down the disk ID, to make sure the correct drive was pulled. This will make it easier to identify the disk in the future, should another failure happen. At this point I write "BAD" in red sharpie on the label of the bad disk. Also take a good look at the new drive and write down or take a picture of any identifying numbers. We can pick a disk ID that uses these when the drive is replaced.

My replacement disk for this 3TB Seagate is going to be a new 4TB Toshiba. The Toshiba doesn't have a wnn number printed on it, so I take note of the other hardware IDs. In this case we will use the ata-HWID for zfs.

zpool replace

Booting the system, and another "zpool status" shows the correct drive was pulled. Now look for the new disk and using the disk ID, run zpool replace.

 ~$ ll /dev/disk/by-id/ata-*
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837 -> ../../sda
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS -> ../../sde
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS -> ../../sdc
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS -> ../../sdf
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS -> ../../sdd
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_HDWQ140_Y026K4H8FBJG -> ../../sdb

The disk in sdb matches the new Toshiba drive. The hardware ID matches the picture of the label that was taken. Another "hint" is that it doesn't have partitions "part1" or "part9" like the other disks.

~$ sudo zpool replace tank wwn-0x5000c50079975b01 /dev/disk/by-id/ata-TOSHIBA_HDWQ140_Y026K4H8FBJG
~$ zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jun 26 19:16:51 2021
        787G scanned at 615M/s, 364G issued at 285M/s, 10.9T total
        72.3G resilvered, 3.25% done, 0 days 10:49:28 to go
config:

        NAME                                    STATE     READ WRITE CKSUM
        tank                                    DEGRADED     0     0     0
          raidz1-0                              DEGRADED     0     0     0
            replacing-0                         DEGRADED     0     0     0
              wwn-0x5000c50079975b01            OFFLINE      0     0     0
              ata-TOSHIBA_HDWQ140_Y026K4H8FBJG  ONLINE       0     0     0  (resilvering)
            wwn-0x5000039fe3cf4293              ONLINE       0     0     0
            wwn-0x5000039fe3cfcb2f              ONLINE       0     0     0
            wwn-0x5000039fe3cf3491              ONLINE       0     0     0
            wwn-0x5000039fe3cfbe45              ONLINE       0     0     0

errors: No known data errors

Resilvering

The pool is now in a "resilvering" state. This can take hours to days depending on the pool. Data is copied from the other disks in the pool onto the new disk. This will bring the pool back into compliance with the raidz1 redundancy strategy.

It is important not to do anything to the zpool while resilvering is taking place. If something should happen to another disk in the pool during this process (there is only 1 disk parity), then data will be lost. This is less fragile if running raidz2 (can lose 2 disks before data is lost). With only 5 disks in this pool, and a backup policy that has already copied the data off of this pool, I feel confident recovering from a failure during resilvering.

Resources

https://help.ubuntu.com/community/Smartmontools

https://docs.oracle.com/cd/E19253-01/819-5461/gazgd/index.html






No comments:

Post a Comment