A healthy rpool vdev child disk was reporting the existence of a corrupted rpool in other areas of its GPT (other partitions). 😲
The “actual” and healthy vdev child disk partition seemed OK ($disk1-part3). The system was otherwise OK and didn’t seem to impacted during boot or runtime by this cosmetic issue.
Nonetheless this could lead to future issues or confusion so its best to proactively clear this up.
Here is an example that I tracked down to an “old” ZFS label located on $disk1:
# get a list of importable zpools from devices in /dev/disk/by-id
zpool import -d /dev/disk/by-id/
pool: rpool
id: 7184717139914799043
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
I’m not sure if the old labels were a result of the legacy boot switch procedure, BUT they were certainly old and undesired and zpool import -d was reporting at least one of them as unavailable/corrupted rpool. I also spotted one partition ZFS label was “rpool-old” which was from the legacy bootloader switch procedure [
So it looked like I had 2 older ZFS labels hanging around on $disk1 and $disk2.
Could zpool labelclear be performed while disk is active in pool?
Perhaps, but why risk the command doing something unexpected or causing a problem for the active pool? I did a bit of research to see what others had experienced, and decided that detaching was the safe way to go.
Its a good practice to zpool labelclear once a disk/partition is no longer in-use
It sounds like an obvious statement, but for my own notes, the partition table points to the sectors on the disk where data for a given partition resides. When the partition table is zapped or wiped using tools such as wipefs or sgdisk, the partition sector data is generally untouched and intact. This is why one can zap the GTP and recreate the previous table with the same geometry or restore the table from a backup and still find the partition data intact AND why ZFS labels survive tools like sgdisk and wipefs.
I’m not a fan of zeroing disks. Just zeroing a disk once its no longer in-use feels wasteful for me, especially for SSD’s which have limited write lifetimes and limits. Zeroing or obfuscating key parts of a disk is a good practice, for example GPT or LUKS data areas, or the first N MiB of each partition.
Typically I use encrypted filesystems for sensitive data, so unless there is a concern about the private key being compromised, wiping encrypted partitions has limited security benefits. Its basically random data without the decryption key.
I’m going to adopt a best practice of using zpool labelclear once a disk is no longer in-use AND prior to zapping the GPT or using wipefs. This should avoid a repeat of the problem described herein.
I suspect that one of the reasons this situation arose in the first place is that these disks and/or partitions were at some point in use in a zpool and were either detached manually or via zpool detach. They weren't fully sanitised when they could have been (before being reused), and what remains is cruft on the disks, which was detected by zpool import -d.
Note that for CT500MX500 SSD’s the wwn suffix is the last N characters of the disk serial number (lower case)
CT500MX500SSD1_XXXXE21ABA65e21aba65 - in this example 8 chars.
This would appear to vary per manufacturer. Seagate ST5000LM000-2AN170 wwn had no visible relation to the disk serial.
I’ve read in the past that the kernel wwn’s, at least for spinning rust, is incremented a little from the actual on-disk-label wwn. I cannot remember why right now but it might be to do with multipath logic? (edit: yes, transport address and individual port identifiers[
💡 My research on wwn in the past highlighted the main reason for choosing wwn is because the path *should* be portable on any system, any controller, any connection type, any sub-system etc.
Consider for example a disk moving from a USB enclosure to a SATA enclosure. Most /dev/disk/by-id paths would change but wwn *should* remain static and portable. A good reason to use it for:
⚠ Note that a checkpoint is not a backup, but it does provide the ability to undo/rewind a pool to a previous state. It will not undo any changes made to the disks under ZFS, such as partition table changes or other changes made directly to the disks/partitions.
In case something goes wrong we have a point we can undo/rewind the rpool: