Home lab & data vault

Explore

Pages

OpenZFS cheatsheet

Author:

/u/kyle0r⁠

Last updated: 2024-01-01

zpool - manage pools

The zpool command manages ZFS storage pool configuration and their virtual devices aka vdevs. A ZFS zpool and the zpool command has similar characteristics to a physical volume manager.

zpool list

⁠

zpool status

⁠

zpool features

zpool get all <pool> |grep feature@

# enable a specific feature on a zpool

zpool set feature@zpool_checkpoint=enabled <pool>

Expand pool size after adding larger disks to a mirror

# expand pool size, e.g. after replacing vdevs with larger ones.

# offline first not required.

zpool online -e <pool> <vdev>

⁠

Pool checkpoints

Pool checkpoints are awesome - I highly recommend to use them in your workflow. They can save the day! Read more details

here⁠

# checkpoint status

zpool status

# create a checkpoint

zpool checkpoint <pool>

# discard a checkpoint

zpool checkpoint --discard --wait <pool>

# rewind readonly

zpool export <pool>

zpool import -o readonly=on --rewind-to-checkpoint <pool>

# rewind discarding _ALL_ changes since checkpoint

zpool export <pool>

# ⚠ data loss warning ⚠

zpool import --rewind-to-checkpoint <pool>

Adding and removing various vdev child devices

# add log (slog) device

zpool add <pool> log <device-path>

# e.g.

zpool add hdd-store log /dev/disk/by-id/nvme-INTEL_SSDPED1D280GA_XXX

# log mirror - redundancy for sync writes

zpool add <pool> log mirror <device-path-n> <device-path-n>

# a log mirror first needs one log mirror device to be detached, the remaining device can then be removed

zpool detach <pool> <device-path>

# add a stripe >> ⚠ DANGER NO REDUNDANCY ⚠ <<

zpool add <pool> <device-path>

# create/extend a mirror - attach new device to existing vdev

# 💡 <existing-device-path> can be relative to /dev/disk/by-id

# 💡 AFAIK <new-device-path> can also be relative

# AFAIK both paths can be absolute if you prefer

zpool attach <pool> <existing-device-path> <new-device-path>

# add spare(s) (for mirror, raidz(2|3) and draid pools

zpool add <pool> spare <device-path-n> <device-path-n>

# remove vdev child device

zpool remove <pool> <device-path>

zpool create

💡 For pools to be (fully) portable, you must give the zpool command whole-disks, not just partitions, so that ZFS can label the disks with portable EFI labels. Otherwise, disk drivers on platforms of different endianness will not recognize the disks.

This citation comes from the zpool export man page [

link⁠

]. You should consider if this is an important factor when you create your zpool. If you find yourself in a situation where a pool is not portable between systems, one always has the option to create a new pool and zfs send a full pool replica.

See my example here⁠

zpool create does what it says on the tin. Here is a breakdown of an advanced example:

zpool create -o ashift=12 -O compression=zstd -O checksum=edonr -O atime=on -O relatime=on -O xattr=sa <pool> /dev/disk/by-id/<disk>

-o sets pool properties, and -O sets dataset properties i.e. properties that the root and child datasets will inherit.

-o ashift=12 aka alignment shift - its critical to get this right read more in zpool concepts [

here⁠

]. Getting this wrong can affect overall available pool storage space and overall pool performance. 💡 if you don’t specifiy -o ashift then OpenZFS will guess based on device parameters and an internal knowledge base of devices, which might be fine in most cases, but it can also guess wrong.

-O compression=zstd -O checksum=edonr sets those dataset properties as mentioned above. As of 2024-01-01, I consider these to be good and modern defaults. zstd is a performant, adaptive and strong compressor and edonr is a performant cryptographic checksum that supports no-op writes (nopwrite) [

read more⁠

]. 📑 Sidebar: Unfortunately there is a bit of maintainer politics around how to keep compression algorithms versioned in OpenZFS:

Update ZSTD to 1.4.8 by Ornias1993 · Pull Request #11367 · openzfs/zfs⁠

. The PR was closed because of a falling out and the debate of how to maintain compressor versions in ZFS is still open:

RFC: Compressor Versioning · Issue #12840 · openzfs/zfs⁠

. If and when those topics are resolved the zstd implemenation in OpenZFS will become even more performant.

-O atime=on -O relatime=on modifiy dataset properties behavior and handling of atime aka file access time. This dataset property can impact performance. Suggest to research it and adjust for your use case.

-O xattr=sa dataset property controls whether extended attributes are enabled. To cite

zfsprops⁠

The default value of on enables directory-based extended attributes. This style of extended attribute imposes no practical limit on either the size or number of attributes which can be set on a file. Although under Linux the getxattr(2) and setxattr(2) system calls limit the maximum size to 64K. This is the most compatible style of extended attribute and is supported by all ZFS implementations.

System-attribute-based xattrs can be enabled by setting the value to sa. The key advantage of sa type of xattr is improved performance. Storing extended attributes as system attributes significantly decreases the amount of disk I/O required.

<pool> this will be the name of the pool. This can be changed (renamed) when importing a pool.

/dev/disk/by-id/<disk> specifies a single thing to become a vdev child in the pool. This is where your ZFS objects and blocks will be stored. In this example the pool will have a single vdev with a single child thing and will not provide fault tollerance but many other ZFS features will be available as usual. This single vdev, single child thing can be turned into a mirror later [

read more⁠

]. More vdevs can be added later to form pool stripes. I say thing, because ZFS supports devices, partitions or even files as vdev children. In my example I hinted at whole-disk device.

⁠

zpool create can create at least 5 types of pool configuration (see the zpool create man page for examples [

link⁠

]):

single disk pool - like the example above zpool create <pool> <device1>

mirror zpool create <pool> mirror <device1> <device2>

striped mirror zpool create <pool> mirror <device1> <device2> mirror <device3> <device4>

raidz [

man page⁠

] zpool create <pool> raidz <devices...> one can also use raidz2 and raidz3 for additional parity levels

draid - [

man page⁠

] suggest reading the draid man page to understand the following prototype: I did some pro / con analysis of various storage concepts including draid

here⁠

zpool create <pool> draid[<parity>][:<data>d][:<children>c][:<spares>s] <devices...>

Hot spares

mirrors, raidz(2|3) and draid support hot spares. In draid spares are distributed and an active part of the pool which reduces time where a pool is degraded i.e. faster resilvering. In non-draid vdev types spares are standalone and inactive until a vdev child device fails in some way, at which point resilvering begins and the failed vdev child should be replaced by the spare once the resilvering is complete. If you need to understand resilivering in more detail

check here⁠

. Here are some zpool create examples with spares:

# mirror with a spare

zpool create <pool> mirror <device1> <device2> spare <device3>

# raidz with a spare

zpool create <pool> raidz <devices...> spare <devicen>

Spares can be added after creation. See the section:

Adding and removing various vdev child devices⁠

zpool destroy

# inspect the pool you want to destroy

zpool status <pool>

# list and inspect the datasets on the pool

zfs list -t all -r -oname,mounted,mountpoint <pool>

# take a moment to ensure anything using the datasets is cleaned up, and backed up

# e.g. pve storage cfg

# e.g. virtual disks

# e.g. need to create a backup of certain data?

# unmount the mounted datasets

zfs unmount /<pool>/data

# list datasets and check all are unmounted

zfs list -t all -r -oname,mounted,mountpoint <pool>

# ⚠ data loss warning ⚠

zpool destroy <pool>

# consider using wipefs to clean up the partitions etc

zpool clearlabel

TODO see post [

here⁠

]

zpool split

💡 This operation is irrevocable - it cannot be undone in the same way that you can undo zpool detach [

link⁠

] or zpool offline [

link⁠

zpool split takes a mirror based pool and splits the last device in each mirror to create a new pool. This is the default behavior which can be modified with various options and optional arguments. Check the

zfs-split man page⁠

for more details.

# the most basic form

zpool split <pool> <newpool>

# dry-run no-op to preview what would happen

zpool split -n <pool> <newpool>

💡 At the point in time of the split, <newpool> will be a replica of <pool> but consider if you are splitting an active pool - a pool that is not quiesced, for example an rpool (root pool) where an operating system is running, the split pools will instantly diverge due to writes on <pool>.

zpool upgrade

# create a rollback checkpoint in case something goes wrong with the upgrade

zpool checkpoint <pool>

zpool upgrade <pool>

# ⚠⚠⚠ CAUTION ⚠⚠⚠ if you want to upgrade a root/boot pool, you need to be aware that legacy/grub boot setups are very sensitive to changes like zpool upgrade. Ideally you would be booting UEFI mode (not using grub) and/or using a boot loader tool like proxmox-boot-tool. I wrote up a related post here: Proxmox: Switching from legacy boot when there is no space for ESP partition [

link⁠

zpool import

The import command is powerful and it can pay off to understand its versatility. The most simple invocations:

# list pools available for import

#+ the zpool.cache is searched if it exists, else

#+ searches for devices using libblkid on Linux and geom on FreeBSD

zpool import

# list pools available for import found in a given path with -d

# AFAIK this bypasses the zpool.cache file and is mutually exclusive to the -c option

#+ a specific device/partition can be also be specified

zpool import -d /dev/disk/by-id

# list destroyed pools only

zpool import -D

# import all found pools from zpool.cache else fall back to libblkid|geom

zpool import -a

# as above but forcefully import all found pools

zpool import -a -f

# import a named pool from pool zpool.cache

zpool import <pool>

# as above but forcefully import named pool

zpool import -f <pool>

# rename a pool

zpool import <pool> <new-pool-name>

# temporary pool rename, pool with be read-only

zpool import -o readonly=on <pool> <temp-pool-name>

Whenever a pool is imported on the system it will be added to (and/or update) the /etc/zfs/zpool.cache file. This file stores pool configuration information, such as the device names and pool state. If this file exists when running the zpool import command then it will be used to determine the list of pools available for import. When a pool is not listed in the cache file it will need to be detected and imported using the zpool import -d /dev/disk/by-id command. The zpool.cache file can become stale, see docs on

generating a new /etc/zfs/zpool.cache file⁠

Importing by pool-id instead of name: zpool import <pool-id> is also supported. <pool-id> can be obtained via zpool import or zdb.

Read-only import: A very useful import option is -o readonly=on which does what it says on the tin. This can used to inspect a pool without making modifications to the pool config/state/properties, or its data. This option prevents modification of the pool name and pool hostname properties which can be useful in avoiding issues caused by importing a pool on a foreign system.

-o readonly=on combined with the --rewind-to-checkpoint gives a sysop the look-back ability to inspect a pool at its check-pointed state without rewinding the pool. A highly useful option combination.

-N option prevents any pool datasets from being mounted and can be useful when working with pools on foreign systems (and/or alternate boot environments) and/or where mounting datasets could cause issues and conflicts with the existing system file hierarchy.

-f forces a pool import, for example if the pool hostname propery doesn’t match the importing system hostname. This often occurs when OR after importing a pool in a recovery system or Live CD (and/or alternate boot environment).

-R is another useful option when working with a pool on a foreign systems (and/or alternate boot environments) where mounting datasets could cause issues and conflicts with the existing system file hierarchy. -R <path> sets the following pool properties: cachefile=none [

link⁠

] and altroot=<path> [

link⁠

] which causes the <path> to be prepended to any mount points within the pool to avoid conflicts. Citation from

zpoolprops⁠

-R can be used when examining an unknown pool where the mount points cannot be trusted, or in an alternate boot environment, where the typical paths are not valid. altroot is not a persistent property. It is valid only while the system is up.

-d dir|device option causes the import process to search for the <pool> in the specifed directory or device. Often required in alternate boot environment such a recovery boot or Live CD where no zpool.cache exists.

-D Imports destroyed pools only. The -f option is also required.

Extreme options: There a few more options for zpool import including some which change the behaviour of how extreme the measures are in order to recover/rollback a pool to a working transaction. They come with the following ⚠ WARNING:

-X, -F and -T options can be extremely hazardous to the health of your pool and should only be used as a last resort.

I'm not going to cover these options, as they're probably better suited to a dedicated recovery guide than this cheatsheet.

Sometimes the -D option is required

cite:

ZFS import unable to find any pools⁠

⁠

zpool import -D -f <pool>

zpool export

Typically, exporting a pool is done because you might be putting the pool disk drive(s) into cold storage OR plan to import the pool on another system and you’d like to ensure everything is clean and consistent. See the man page [

link⁠

] for more details including information on how ZFS can support importing and exporting pools on systems with different endianness (under certain circumstances).

💡 exporting a pool has a useful side-effect on ARC - all objects/blocks of an exported pool in the ARC will be evicted. I.e. it can be used to flush the ARC cache of the objects cached for the pool being exported. This can be useful in testing, test cases, and also verifying property settings are working as one expects.

zpool detach|offline|remove|online

command

description

[

man

⁠

] detach

Detaches a device from a mirror. Consider offline if the device is to be re-added later on.

[

man

⁠

] offline

Takes a pool device offline. The device is quiesced. NA for spares. -f to force a faulted state.

[

man

⁠

] remove

Remove devices from a pool. Allocated space will be evacuated to other pool devices. Can be cancelled -s. For a dry-run use -n.

[

man

⁠

] online

Brings a pool device online. NA for spares. -e to expand device space.

There are no rows in this table

⁠

vdev child device evacuation

TODO. see:

link⁠

zfs - manage datasets

The zfs command manages ZFS datasets, which are children of pools. zfs defines and manages dataset hierarchy. A ZFS dataset has similar characteristics to a logical volume. So, the zfs command is similar to a logical volume manager.

zfs list

TODO some more advanced/recursive invocations

zfs list

zfs list -t all

zfs list -t snapshot

# types: filesystem, snapshot, volume, bookmark, or all

zfs get - get zfsprops

TODO some more advanced/recursive invocations

zfs get encryption

zfs get compression

# or combined

zfs get compression,encryption,checksum

⁠

zfs set - set zfsprops

TODO some basics and advanced/recursive invocations

zfs create

TODO some basics

dataset encryption

# currently NOT best practice to set encryption on the root of the zpool

# DO NOT DO THIS read the

encryption section⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

for details.

zpool create -o ashift=12 -O compression=lz4 -O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase -O xattr=sa <pool> <disk>

# currently best practice

# DO THIS!

# STEP 1: create unencrypted pool

zpool create -o ashift=12 -O compression=zstd -O checksum=edonr -O atime=on -O relatime=on -O xattr=sa <pool> <disk>

# STEP 2: create child dataset as an encryption root (name of your choosing)

zfs create <pool>/enc_data -o encryption=aes-256-gcm -o keylocation=prompt -o keyformat=passphrase

# STEP 3: verify the key

zfs unmount -u <pool>

zfs mount -l <pool>/enc_data # blocking prompt

# after reboot, to load key

zfs mount -l <pool>/enc_data

⁠

create zfs volumes aka zvols (but avoid using them:
`read my PSA here`⁠
)

zfs create <pool>/testvolxfs -s -V 20G -o volblocksize=4k

zfs create <pool>/vm-102-disk-2 -s -V 20G -o volblocksize=4k -o encryption=off

zfs destroy

# dry run

zfs destroy -vn <pool>/testvolxfs

# ⚠ data loss warning ⚠

zfs destroy -v <pool>/testvolxfs

# ⚠ recursive destruction data loss warning ⚠ - it is also possible to use the -r switch to recursively destroy datasets (to destroy part of a zfs hierarchy). Be very careful with this! Do practice dry runs (-vnr) to see what would happen first.

zfs send and receive

TODO - prerequisite: snapshot

Send a 1:1 full (whole) pool replica

# full and recursive clone of a pool (replica) - from src to dst

# assumes that @boot-transfer snapshot exsits recursively on the src

# also specify that the dst pool datasets should use specific compression and checksum settings

zfs send --replicate rpool-old@boot-transfer| mbuffer -s 128k -m 128M | zfs recv -F -v -u -o compression=zstd -o checksum=edonr rpool-new

zdb - ZFS storage pool debugging

In ZFS, objects are grouped together in object sets. A dataset is an “object set” object, files and directories are objects grouped into a dataset.

zdb allows us to query and read information about object sets and their child objects. Once one has obtained the coordinate for a given object, its is possible to read/extract raw data or make a backup of a dataset.

As a rule, to increase the consistency of zdb, you should use it on an exported inactive pool.

References

Official zdb manual [

link⁠

] and source code [

link⁠

]. If you can read and follow the code it reveals a lot about ZFS internals.

The original ~2006 ZFS on-disk format [

link⁠

]. A more recent OpenZFS on-disk format doc [

link⁠

Glossary

SPA - Storage Pool Allocator DVA - Data Virtual Address ZAP - ZFS Attribute Processor DMU - Data Management Layer MOS - Meta Object Set

DSL - Dataset and Snapshot Layer ZPL - ZFS Posix layer ZIL - ZFS Intent Log ZVOL - ZFS Volume

Should I run zdb on an exported or imported pool?

zdb can operate on exported or imported pools, zdb accesses block devices directly and “doesn’t care” about imported and active pools. For encrypted datasets there are limitations on what can be performed with older versions of zdb (zfs < 2.2) because zdb did not understand encryption prior to that version. See the section herein on zdb support for encrypted datasets. As a rule, to increase the consistency of zdb, you should use it on an exported inactive pool.

Citing from the zdb man page, with my emphasis:

zdb is an "offline" tool; it accesses the block devices underneath the pools directly from userspace and does not care if the pool is imported or datasets are mounted (or even if the system understands ZFS at all). When operating on an imported and active pool it is possible, though unlikely, that zdb may interpret inconsistent pool data and behave erratically.

zdb and encrypted datasets

Older zdb versions (zfs < 2.2) have some limitations when working with encrypted datasets, for example I get permission denied when attempting certain object operations on relative object paths (see

openzfs/zfs · Issue #11551⁠

). In this case use the zfs object id which can be obtained via ls -i or the zdb -dd example provided herein. Keep in mind that data extracted via zdb from encrypted datasets will be... encrypted! (Unless you are using a version of zdb which supports encryption).

Checking the OpenZFS source code [

PR⁠

] [

commit⁠

] from contributor robn [

link⁠

] shows that zdb received a new option -K tagged for release in zfs 2.2, which provides support for loading an encryption key for operations that require it.