Discovering a Bad Disk

zpool status

I check my ZFS pools regularly. Proxmox is nice, because it will send you emails, but other systems might not. Still, I make it a curiosity once a week to manually check the size and health of my storage pools.

~$ zfs list NAME USED AVAIL REFER MOUNTPOINT tank 8.74T 1.81T 8.53T /tank ~$ zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: resilvered 2.20M in 0 days 00:00:10 with 0 errors on Sat Jun 26 04:06:30 2021 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 wwn-0x5000c50079975b01 ONLINE 0 0 3 wwn-0x5000039fe3cf4293 ONLINE 0 0 0 wwn-0x5000039fe3cfcb2f ONLINE 0 0 0 wwn-0x5000039fe3cf3491 ONLINE 0 0 0 wwn-0x5000039fe3cfbe45 ONLINE 0 0 0 errors: No known data errors

smartctl

We can see from the zpool status that one of our disks has a problem. The pool status is reporting an error, and we can see one disk has failed checksums 3 times. But is this disk bad? I can check it with smartctl.

I've allocated my pool disks by ID, so zpool status will report the disks using that ID. To check a disk with smartctl, simply pass the command the path using the same disk ID.

Note: Some OSes, need to have smartctl installed. In Ubuntu, it can be installed using "sudo apt install smartmontools"

~$ sudo smartctl -x /dev/disk/by-id/wwn-0x5000c50079975b01 smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-77-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1ER166 Serial Number: Z500JERL LU WWN Device Id: 5 000c50 079975b01 Firmware Version: CC25 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sat Jun 26 18:15:07 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 128 (minimum power consumption without standby) Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, frozen [SEC2] Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 89) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 330) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 106 099 006 - 29690410 3 Spin_Up_Time PO---- 094 093 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 77 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 7 Seek_Error_Rate POSR-- 083 060 030 - 214055033 9 Power_On_Hours -O--CK 046 046 000 - 47624 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 76 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 098 098 000 - 2 188 Command_Timeout -O--CK 100 100 000 - 0 0 0 189 High_Fly_Writes -O-RCK 084 084 000 - 16 190 Airflow_Temperature_Cel -O---K 070 057 045 - 30 (Min/Max 27/33) 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 37 193 Load_Cycle_Count -O--CK 098 098 000 - 5695 194 Temperature_Celsius -O---K 030 043 000 - 30 (0 15 0 0 0) 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 240 Head_Flying_Hours ------ 100 253 000 - 46235h+43m+57.716s 241 Total_LBAs_Written ------ 100 253 000 - 16226994503 242 Total_LBAs_Read ------ 100 253 000 - 11227817981448 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning

The part of the smartctl output we are interested in is the Attributes Table. We can see that some attibutes are "P - prefailure" (which is generally OK, and herolds the age of this disk) and some have an "R - error rate". The "error rate" attributes in particular communicate a failure.

ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 106 099 006 - 29690410 7 Seek_Error_Rate POSR-- 083 060 030 - 214055033 189 High_Fly_Writes -O-RCK 084 084 000 - 16 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0

These four rows help explain what is going on with this disk. We can see that these values are above the manufacturers thresholds for this drive. Read errors are at 106/6, seek errors are at 83/30, high fly writes are at 84/0, and CRC errors are at 200/0. Seems like ZFS was right, and this drive is starting to fail.

Replacing the Disk

zpool offline & shutdown

I'm out of SATA ports in this system, so we will offline the disk, physically replace it, then run the "zpool replace" command.

~$ sudo zpool offline tank wwn-0x5000c50079975b01 ~$ sudo zpool status pool: tank state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: resilvered 2.20M in 0 days 00:00:10 with 0 errors on Sat Jun 26 04:06:30 2021 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 wwn-0x5000c50079975b01 OFFLINE 0 0 3 wwn-0x5000039fe3cf4293 ONLINE 0 0 0 wwn-0x5000039fe3cfcb2f ONLINE 0 0 0 wwn-0x5000039fe3cf3491 ONLINE 0 0 0 wwn-0x5000039fe3cfbe45 ONLINE 0 0 0 errors: No known data errors ~$ sudo shutdown now

Replace the disk

Now turn off the system and physically replace the disk. It's a good idea to write down the disk ID, to make sure the correct drive was pulled. This will make it easier to identify the disk in the future, should another failure happen. At this point I write "BAD" in red sharpie on the label of the bad disk. Also take a good look at the new drive and write down or take a picture of any identifying numbers. We can pick a disk ID that uses these when the drive is replaced.

My replacement disk for this 3TB Seagate is going to be a new 4TB Toshiba. The Toshiba doesn't have a wnn number printed on it, so I take note of the other hardware IDs. In this case we will use the ata-HWID for zfs.

zpool replace

Booting the system, and another "zpool status" shows the correct drive was pulled. Now look for the new disk and using the disk ID, run zpool replace.

~$ ll /dev/disk/by-id/ata-* lrwxrwxrwx 1 root root 9 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837 -> ../../sda lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS -> ../../sde lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS -> ../../sdc lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS-part9 -> ../../sdc9 lrwxrwxrwx 1 root root 9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS -> ../../sdf lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS-part9 -> ../../sdf9 lrwxrwxrwx 1 root root 9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS -> ../../sdd lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS-part9 -> ../../sdd9 lrwxrwxrwx 1 root root 9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_HDWQ140_Y026K4H8FBJG -> ../../sdb

The disk in sdb matches the new Toshiba drive. The hardware ID matches the picture of the label that was taken. Another "hint" is that it doesn't have partitions "part1" or "part9" like the other disks.

~$ sudo zpool replace tank wwn-0x5000c50079975b01 /dev/disk/by-id/ata-TOSHIBA_HDWQ140_Y026K4H8FBJG ~$ zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Jun 26 19:16:51 2021 787G scanned at 615M/s, 364G issued at 285M/s, 10.9T total 72.3G resilvered, 3.25% done, 0 days 10:49:28 to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 wwn-0x5000c50079975b01 OFFLINE 0 0 0 ata-TOSHIBA_HDWQ140_Y026K4H8FBJG ONLINE 0 0 0 (resilvering) wwn-0x5000039fe3cf4293 ONLINE 0 0 0 wwn-0x5000039fe3cfcb2f ONLINE 0 0 0 wwn-0x5000039fe3cf3491 ONLINE 0 0 0 wwn-0x5000039fe3cfbe45 ONLINE 0 0 0 errors: No known data errors

Resilvering

The pool is now in a "resilvering" state. This can take hours to days depending on the pool. Data is copied from the other disks in the pool onto the new disk. This will bring the pool back into compliance with the raidz1 redundancy strategy.

It is important not to do anything to the zpool while resilvering is taking place. If something should happen to another disk in the pool during this process (there is only 1 disk parity), then data will be lost. This is less fragile if running raidz2 (can lose 2 disks before data is lost). With only 5 disks in this pool, and a backup policy that has already copied the data off of this pool, I feel confident recovering from a failure during resilvering.

Resources

https://help.ubuntu.com/community/Smartmontools

https://docs.oracle.com/cd/E19253-01/819-5461/gazgd/index.html

This is a long and involved process. Some of the steps may not be neccessary. For my particular application I decided I needed 2 things.

GPU Passthrough
CIFS mounts in the container

The GPU Passthrough may not be necessary on a privileged container, but I tackled this first before I realized I needed to upgrade the container to privileged.

GPU Passthrough

This is an excellent writeup on the plex forums for GPU Passthrough in Proxmox.

https://forums.plex.tv/t/plex-hw-acceleration-in-lxc-container-anyone-with-success/219289/35

Again note this may not be completely necessary in a privileged container. My container does have cgroups and nvidia devices listed in the conf file.

lxc.cgroup.devices.allow: c 195:* rwm lxc.cgroup.devices.allow: c 236:* rwm lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

In the container double check that all the nvidia devices exist after installing drivers.

ll /dev/nv*

CIFS Mounts

Because fuser is a kernel level process, containers need to be privileged to execute "mount" commands. I decided that this was an acceptable trade off for a plex server that is internal to my network.

The container needs to be privileged, and Nesting and CIFS enabled in the "Options" -> "Features" menu.

After the container is set up, we can create automount rules in the usual way.

apt install cifs-utils apt install autofs echo "/mnt/hostname /etc/auto.hostname --timeout 0" >> /etc/auto.master echo "mountname -fstype=cifs,rw,guest ://192.168.1.X/mountname" >> /etc/auto.hostname automount -v

Plex Install

It may be a good idea to create a non-root user for our plex container.

useradd -G sudo -m plex -s /bin/bash passwd plex

Now we can use the unprivileged user to SSH to the container if needed.

At this point we can continue with the plex install, which is fairly well documented.
https://support.plex.tv/articles/200288586-installation/

On ubuntu this involves downloading the .deb installer and running apt on that file

apt install /home/user/Downloads/plexmediaserver_1.20....._amd64.deb

If migrating plex, there is a guide here:
https://support.plex.tv/articles/201370363-move-an-install-to-another-system/

CIAduck's Tech Spot

Saturday, June 26, 2021

ZFS Diagnose and Replace Bad Disk

Discovering a Bad Disk

zpool status

smartctl

Replacing the Disk

zpool offline & shutdown

Replace the disk

zpool replace

Resilvering

Resources

Saturday, June 5, 2021

Creating a Proxmox Container for Plex

GPU Passthrough

CIFS Mounts

Plex Install

Saturday, February 20, 2021

Building a Nextcloud Container in Proxmox

Create Container

Create and Mount Data Directory

Install Prerequisites

Move System Dirs to Data Directory

Continue with Nextcloud Installation

Apache Config File

Friday, December 4, 2020

Correcting umask/file permissions for Unix CIFS mount points.

Samba Config

Topics

Blog Archive