Saturday, June 26, 2021

ZFS Diagnose and Replace Bad Disk

 Discovering a Bad Disk

zpool status

I check my ZFS pools regularly. Proxmox is nice, because it will send you emails, but other systems might not. Still, I make it a curiosity once a week to manually check the size and health of my storage pools.

~$ zfs list
NAME    USED  AVAIL     REFER  MOUNTPOINT
tank  8.74T  1.81T     8.53T  /tank
~$ zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 2.20M in 0 days 00:00:10 with 0 errors on Sat Jun 26 04:06:30 2021
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            wwn-0x5000c50079975b01  ONLINE       0     0     3
            wwn-0x5000039fe3cf4293  ONLINE       0     0     0
            wwn-0x5000039fe3cfcb2f  ONLINE       0     0     0
            wwn-0x5000039fe3cf3491  ONLINE       0     0     0
            wwn-0x5000039fe3cfbe45  ONLINE       0     0     0

errors: No known data errors

smartctl

We can see from the zpool status that one of our disks has a problem. The pool status is reporting an error, and we can see one disk has failed checksums 3 times. But is this disk bad? I can check it with smartctl.

I've allocated my pool disks by ID, so zpool status will report the disks using that ID. To check a disk with smartctl, simply pass the command the path using the same disk ID.

Note: Some OSes, need to have smartctl installed. In Ubuntu, it can be installed using "sudo apt install smartmontools"

~$ sudo smartctl -x /dev/disk/by-id/wwn-0x5000c50079975b01
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-77-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z500JERL
LU WWN Device Id: 5 000c50 079975b01
Firmware Version: CC25
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jun 26 18:15:07 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   89) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 330) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   106   099   006    -    29690410
  3 Spin_Up_Time            PO----   094   093   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    77
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   083   060   030    -    214055033
  9 Power_On_Hours          -O--CK   046   046   000    -    47624
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    76
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   098   098   000    -    2
188 Command_Timeout         -O--CK   100   100   000    -    0 0 0
189 High_Fly_Writes         -O-RCK   084   084   000    -    16
190 Airflow_Temperature_Cel -O---K   070   057   045    -    30 (Min/Max 27/33)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    37
193 Load_Cycle_Count        -O--CK   098   098   000    -    5695
194 Temperature_Celsius     -O---K   030   043   000    -    30 (0 15 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    46235h+43m+57.716s
241 Total_LBAs_Written      ------   100   253   000    -    16226994503
242 Total_LBAs_Read         ------   100   253   000    -    11227817981448
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

The part of the smartctl output we are interested in is the Attributes Table. We can see that some attibutes are "P - prefailure" (which is generally OK, and herolds the age of this disk) and some have an "R - error rate". The "error rate" attributes in particular communicate a failure.

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   106   099   006    -    29690410
  7 Seek_Error_Rate         POSR--   083   060   030    -    214055033
189 High_Fly_Writes         -O-RCK   084   084   000    -    16
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0

These four rows help explain what is going on with this disk. We can see that these values are above the manufacturers thresholds for this drive. Read errors are at 106/6, seek errors are at 83/30, high fly writes are at 84/0, and CRC errors are at 200/0. Seems like ZFS was right, and this drive is starting to fail.

Replacing the Disk

zpool offline & shutdown

I'm out of SATA ports in this system, so we will offline the disk, physically replace it, then run the "zpool replace" command. 

~$ sudo zpool offline tank wwn-0x5000c50079975b01
~$ sudo zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 2.20M in 0 days 00:00:10 with 0 errors on Sat Jun 26 04:06:30 2021
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          raidz1-0                  DEGRADED     0     0     0
            wwn-0x5000c50079975b01  OFFLINE      0     0     3
            wwn-0x5000039fe3cf4293  ONLINE       0     0     0
            wwn-0x5000039fe3cfcb2f  ONLINE       0     0     0
            wwn-0x5000039fe3cf3491  ONLINE       0     0     0
            wwn-0x5000039fe3cfbe45  ONLINE       0     0     0

errors: No known data errors
~$ sudo shutdown now

Replace the disk

Now turn off the system and physically replace the disk. It's a good idea to write down the disk ID, to make sure the correct drive was pulled. This will make it easier to identify the disk in the future, should another failure happen. At this point I write "BAD" in red sharpie on the label of the bad disk. Also take a good look at the new drive and write down or take a picture of any identifying numbers. We can pick a disk ID that uses these when the drive is replaced.

My replacement disk for this 3TB Seagate is going to be a new 4TB Toshiba. The Toshiba doesn't have a wnn number printed on it, so I take note of the other hardware IDs. In this case we will use the ata-HWID for zfs.

zpool replace

Booting the system, and another "zpool status" shows the correct drive was pulled. Now look for the new disk and using the disk ID, run zpool replace.

 ~$ ll /dev/disk/by-id/ata-*
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837 -> ../../sda
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-CT240BX200SSD1_1625F01DF837-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS -> ../../sde
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2EY2GS-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS -> ../../sdc
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46H2KNSGS-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS -> ../../sdf
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3MM7AS-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS -> ../../sdd
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_46M3S1WAS-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 Jun 26 19:06 /dev/disk/by-id/ata-TOSHIBA_HDWQ140_Y026K4H8FBJG -> ../../sdb

The disk in sdb matches the new Toshiba drive. The hardware ID matches the picture of the label that was taken. Another "hint" is that it doesn't have partitions "part1" or "part9" like the other disks.

~$ sudo zpool replace tank wwn-0x5000c50079975b01 /dev/disk/by-id/ata-TOSHIBA_HDWQ140_Y026K4H8FBJG
~$ zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jun 26 19:16:51 2021
        787G scanned at 615M/s, 364G issued at 285M/s, 10.9T total
        72.3G resilvered, 3.25% done, 0 days 10:49:28 to go
config:

        NAME                                    STATE     READ WRITE CKSUM
        tank                                    DEGRADED     0     0     0
          raidz1-0                              DEGRADED     0     0     0
            replacing-0                         DEGRADED     0     0     0
              wwn-0x5000c50079975b01            OFFLINE      0     0     0
              ata-TOSHIBA_HDWQ140_Y026K4H8FBJG  ONLINE       0     0     0  (resilvering)
            wwn-0x5000039fe3cf4293              ONLINE       0     0     0
            wwn-0x5000039fe3cfcb2f              ONLINE       0     0     0
            wwn-0x5000039fe3cf3491              ONLINE       0     0     0
            wwn-0x5000039fe3cfbe45              ONLINE       0     0     0

errors: No known data errors

Resilvering

The pool is now in a "resilvering" state. This can take hours to days depending on the pool. Data is copied from the other disks in the pool onto the new disk. This will bring the pool back into compliance with the raidz1 redundancy strategy.

It is important not to do anything to the zpool while resilvering is taking place. If something should happen to another disk in the pool during this process (there is only 1 disk parity), then data will be lost. This is less fragile if running raidz2 (can lose 2 disks before data is lost). With only 5 disks in this pool, and a backup policy that has already copied the data off of this pool, I feel confident recovering from a failure during resilvering.

Resources

https://help.ubuntu.com/community/Smartmontools

https://docs.oracle.com/cd/E19253-01/819-5461/gazgd/index.html






Saturday, June 5, 2021

Creating a Proxmox Container for Plex

 This is a long and involved process. Some of the steps may not be neccessary. For my particular application I decided I needed 2 things.

  1. GPU Passthrough
  2. CIFS mounts in the container

The GPU Passthrough may not be necessary on a privileged container, but I tackled this first before I realized I needed to upgrade the container to privileged.

GPU Passthrough

This is an excellent writeup on the plex forums for GPU Passthrough in Proxmox.

https://forums.plex.tv/t/plex-hw-acceleration-in-lxc-container-anyone-with-success/219289/35

Again note this may not be completely necessary in a privileged container. My container does have cgroups and nvidia devices listed in the conf file.

lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 236:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

In the container double check that all the nvidia devices exist after installing drivers.

ll /dev/nv*

CIFS Mounts

Because fuser is a kernel level process, containers need to be privileged to execute "mount" commands. I decided that this was an acceptable trade off for a plex server that is internal to my network.

The container needs to be privileged, and Nesting and CIFS enabled in the "Options" -> "Features" menu.

After the container is set up, we can create automount rules in the usual way.

apt install cifs-utils
apt install autofs
echo "/mnt/hostname /etc/auto.hostname --timeout 0" >> /etc/auto.master
echo "mountname -fstype=cifs,rw,guest ://192.168.1.X/mountname" >> /etc/auto.hostname
automount -v

Plex Install

It may be a good idea to create a non-root user for our plex container.

useradd -G sudo -m plex -s /bin/bash
passwd plex

Now we can use the unprivileged user to SSH to the container if needed.

At this point we can continue with the plex install, which is fairly well documented.
https://support.plex.tv/articles/200288586-installation/

On ubuntu this involves downloading the .deb installer and running apt on that file

apt install /home/user/Downloads/plexmediaserver_1.20....._amd64.deb

If migrating plex, there is a guide here:
https://support.plex.tv/articles/201370363-move-an-install-to-another-system/

Saturday, February 20, 2021

Building a Nextcloud Container in Proxmox


Create Container

Create and Mount Data Directory

mkdir /tank/nextcloud
pct set 107 -mp0 /tank/nextcloud,mp=/data

Install Prerequisites

apt install apache2 mariadb-server libapache2-mod-php7.4
apt install php7.4-gd php7.4-mysql php7.4-curl php7.4-mbstring php7.4-intl
apt install php7.4-gmp php7.4-bcmath php-imagick php7.4-xml php7.4-zip

Move System Dirs to Data Directory

systemctl stop mariadb
mv /var/lib/mysql /data/
ln -s /data/mysql /var/lib/mysql
systemctl start mariadb

systemctl stop apache2
mv /etc/apache2 /data/
ln -s /data/apache2 /etc/apache2
mv /var/www /data/
ln -s /data/www /var/www
systemctl start apache2

Continue with Nextcloud Installation

Note: Because the install is done as root, there may be some directory permissions that need to change.

chown www-data:www-data /var/www/nextcloud/ -R

Apache Config File

<VirtualHost *:80>
        DocumentRoot "/var/www/nextcloud"

        ErrorLog ${APACHE_LOG_DIR}/nextcloud.error
        CustomLog ${APACHE_LOG_DIR}/nextcloud.access combined

        <Directory /var/www/nextcloud/>
            Require all granted
            Options FollowSymlinks MultiViews
            AllowOverride All

           <IfModule mod_dav.c>
               Dav off
           </IfModule>

        SetEnv HOME /var/www/nextcloud
        SetEnv HTTP_HOME /var/www/nextcloud
        Satisfy Any

       </Directory>

</VirtualHost>

Friday, December 4, 2020

Correcting umask/file permissions for Unix CIFS mount points.

Today I had a very interesting issue, where a file created from a Proxmox container had incorrect permissions on the host server.

The typical umask in linux is 0002, but for the Proxmox root user it's 0022. This means files created with this user will have it's group permissions restricted to read only.

We could fix this by setting the umask on the Proxmox root user, but that could have severe and unintended consiquences (messing with root user perms never ends well). Instead, we look to smb.conf

Samba Config

I'm running Ubuntu to share the CIFS, and using "net usershare" to share the mount. We can set a global config, so that any files written to the mount will have the same default permissions that files created from the host have.

I simply uncommented and tweaked these lines in /etc/samba/smb.conf

# File creation mask is set to 0700 for security reasons. If you want to
# create files with group=rw permissions, set next parameter to 0775.
   create mask = 0664

# Directory creation mask is set to 0700 for security reasons. If you want to
# create dirs. with group=rw permissions, set next parameter to 0775.
   directory mask = 0775

 And finally don't forget to reload.

sudo systemctl reload smbd

Now files have the correct permissions (664) instead of user only read (744).

-rwxr--r--  1 nobody nogroup          0 Dec  3 20:24  test4.txt*
-rw-rw-r--  1 nobody nogroup          0 Dec  3 21:07  test5.txt