Home lab & data vault
Share
Explore

icon picker
OpenZFS cheatsheet

Author: Last updated: 2024-01-01

zpool - manage pools

The zpool command manages ZFS storage pool configuration and their virtual devices aka vdevs. A ZFS zpool and the zpool command has similar characteristics to a physical volume manager.

zpool list

zpool status

image.png

zpool features

zpool get all <pool> |grep feature@

# enable a specific feature on a zpool
zpool set feature@zpool_checkpoint=enabled <pool>

Expand pool size

# expand pool size, e.g. after replacing vdevs with larger ones.
# offline first not required.
zpool online -e <pool> <vdev>

Pool checkpoints

Pool checkpoints are awesome - I highly recommend to use them in your workflow. They can save the day! Read more details
.
# checkpoint status
zpool status

# create a checkpoint
zpool checkpoint <pool>

# discard a checkpoint
zpool checkpoint --discard --wait <pool>

# rewind readonly
zpool export <pool>
zpool import -o readonly=on --rewind-to-checkpoint <pool>

# rewind discarding _ALL_ changes since checkpoint
zpool export <pool>
# ⚠ data loss warning ⚠
zpool import --rewind-to-checkpoint <pool>

Adding and removing various vdev child devices

# add log (slog) device
zpool add <pool> log <device-path>
# e.g.
zpool add hdd-store log /dev/disk/by-id/nvme-INTEL_SSDPED1D280GA_XXX

# log mirror - redundancy for sync writes
zpool add <pool> log mirror <device-path-n> <device-path-n>

# a log mirror first needs one log mirror device to be detached, the remaining device can then be removed
zpool detach <pool> <device-path>

# add a stripe >> ⚠ DANGER NO REDUNDANCY ⚠ <<
zpool add <pool> <device-path>

# create/extend a mirror - attach new device to existing vdev
zpool attach <pool> <existing-device-path> <new-device-path>

# add spare(s) (for mirror, raidz(2|3) and draid pools
zpool add <pool> spare <device-path-n> <device-path-n>

# remove vdev child device
zpool remove <pool> <device-path>

zpool create

💡 For pools to be (fully) portable, you must give the zpool command whole-disks, not just partitions, so that ZFS can label the disks with portable EFI labels. Otherwise, disk drivers on platforms of different endianness will not recognize the disks.
This citation comes from the zpool export man page [
]. You should consider if this is an important factor when you create your zpool. If you find yourself in a situation where a pool is not portable between systems, one always has the option to create a new pool and zfs send a full pool replica. .
zpool create does what it says on the tin. Here is a breakdown of an advanced example:
zpool create -o ashift=12 -O compression=zstd -O checksum=edonr -O atime=on -O relatime=on -O xattr=sa <pool> /dev/disk/by-id/<disk>
-o sets pool properties, and -O sets dataset properties i.e. properties that the root and child datasets will inherit.
-o ashift=12 aka alignment shift - its critical to get this right read more in zpool concepts [
]. Getting this wrong can affect overall available pool storage space and overall pool performance. 💡 if you don’t specifiy -o ashift then OpenZFS will guess based on device parameters and an internal knowledge base of devices, which might be fine in most cases, but it can also guess wrong.
-O compression=zstd -O checksum=edonr sets those dataset properties as mentioned above. As of 2024-01-01, I consider these to be good and modern defaults. zstd is a performant, adaptive and strong compressor and edonr is a performant cryptographic checksum that supports no-op writes (nopwrite) []. 📑 Sidebar: Unfortunately there is a bit of maintainer politics around how to keep compression algorithms versioned in OpenZFS: . The PR was closed because of a falling out and the debate of how to maintain compressor versions in ZFS is still open: . ​If and when those topics are resolved the zstd implemenation in OpenZFS will become even more performant.
-O atime=on -O relatime=on modifiy dataset properties behavior and handling of atime aka file access time. This dataset property can impact performance. Suggest to research it and adjust for your use case.
-O xattr=sa dataset property controls whether extended attributes are enabled. To cite :
The default value of on enables directory-based extended attributes. This style of extended attribute imposes no practical limit on either the size or number of attributes which can be set on a file. Although under Linux the getxattr(2) and setxattr(2) system calls limit the maximum size to 64K. This is the most compatible style of extended attribute and is supported by all ZFS implementations.
System-attribute-based xattrs can be enabled by setting the value to sa. The key advantage of sa type of xattr is improved performance. Storing extended attributes as system attributes significantly decreases the amount of disk I/O required.
<pool> this will be the name of the pool. This can be changed (renamed) when importing a pool.
/dev/disk/by-id/<disk> specifies a single thing to become a vdev child in the pool. This is where your ZFS objects and blocks will be stored. In this example the pool will have a single vdev with a single child thing and will not provide fault tollerance but many other ZFS features will be available as usual. This single vdev, single child thing can be turned into a mirror later []. More vdevs can be added later to form pool stripes. I say thing, because ZFS supports devices, partitions or even files as vdev children. In my example I hinted at whole-disk device.
zpool create can create at least 5 types of pool configuration (see the zpool create man page for examples [
]):
single disk pool - like the example above zpool create <pool> <device1>
mirror zpool create <pool> mirror <device1> <device2>
striped mirror zpool create <pool> mirror <device1> <device2> mirror <device3> <device4>
raidz [] zpool create <pool> raidz <devices...> one can also use raidz2 and raidz3 for additional parity levels
draid - [] suggest reading the draid man page to understand the following prototype: I did some pro / con analysis of various storage concepts including draid
.
zpool create <pool> draid[<parity>][:<data>d][:<children>c][:<spares>s] <devices...>

Hot spares

mirrors, raidz(2|3) and draid support hot spares. In draid spares are distributed and an active part of the pool which reduces time where a pool is degraded i.e. faster resilvering. In non-draid vdev types spares are standalone and inactive until a vdev child device fails in some way, at which point resilvering begins and the failed vdev child should be replaced by the spare once the resilvering is complete. If you need to understand resilivering in more detail . Here are some zpool create examples with spares:
# mirror with a spare
zpool create <pool> mirror <device1> <device2> spare <device3>

# raidz with a spare
zpool create <pool> raidz <devices...> spare <devicen>
Spares can be added after creation. See the section: .

zpool destroy

# inspect the pool you want to destroy
zpool status <pool>

# list and inspect the datasets on the pool
zfs list -t all -r -oname,mounted,mountpoint <pool>

# take a moment to ensure anything using the datasets is cleaned up, and backed up
# e.g. pve storage cfg
# e.g. virtual disks
# e.g. need to create a backup of certain data?

# unmount the mounted datasets
zfs unmount /<pool>/data

# list datasets and check all are unmounted
zfs list -t all -r -oname,mounted,mountpoint <pool>

# ⚠ data loss warning ⚠
zpool destroy <pool>

# consider using wipefs to clean up the partitions etc

zpool split

💡 This operation is irrevocable - it cannot be undone in the same way that you can undo zpool detach [
] or zpool offline [
].
zpool split takes a mirror based pool and splits the last device in each mirror to create a new pool. This is the default behavior which can be modified with various options and optional arguments. Check the for more details.
# the most basic form
zpool split <pool> <newpool>

# dry-run no-op to preview what would happen
zpool split -n <pool> <newpool>
💡 At the point in time of the split, <newpool> will be a replica of <pool> but consider if you are splitting an active pool - a pool that is not quiesced, for example an rpool (root pool) where an operating system is running, the split pools will instantly diverge due to writes on <pool>.

zpool upgrade

# create a rollback checkpoint
zpool checkpoint <pool>

zpool upgrade <pool>

# suggest to reboot the node and verify your existing set up works as expected

zpool import

The import command is powerful and it can pay off to understand its versatility. The most simple invocations:
# import a pool from pool cache
zpool import <pool>

# rename a pool
zpool import <pool> <new-pool-name>

# one-off temporary pool rename
zpool import -o readonly=on <pool> <temp-pool-name>
These invocations rely on the presence of a cached pool config typically located in /etc/zfs/zpool.cache. i.e. the system has previously imported this pool.
Read-only import: A very useful import option is -o readonly=on which does what it says on the tin. This can used to inspect a pool without making modifications to the pool config/state/properties, or its data. This option prevents modification of the pool name and pool hostname properties which can be useful in avoiding issues caused by importing a pool on a foreign system.
-o readonly=on combined with the --rewind-to-checkpoint gives a sysop the look-back ability to inspect a pool at its check-pointed state without rewinding the pool. A highly useful option combination.
-N option prevents any pool datasets from being mounted and can be useful when working with pools on foreign systems (and/or alternate boot environments) and/or where mounting datasets could cause issues and conflicts with the existing system file hierarchy.
-f forces a pool import, for example if the pool hostname propery doesn’t match the importing system hostname. This often occurs when OR after importing a pool in a recovery system or Live CD (and/or alternate boot environment).
-R is another useful option when working with a pool on a foreign systems (and/or alternate boot environments) where mounting datasets could cause issues and conflicts with the existing system file hierarchy. -R <path> sets the following pool properties: cachefile=none [
] and altroot=<path> [
] which causes the <path> to be prepended to any mount points within the pool to avoid conflicts. Citation from :
-R can be used when examining an unknown pool where the mount points cannot be trusted, or in an alternate boot environment, where the typical paths are not valid. altroot is not a persistent property. It is valid only while the system is up.
-d dir|device option causes the import process to search for the <pool> in the specifed directory or device. Often required in alternate boot environment such a recovery boot or Live CD.
Extreme options: There a few more options for zpool import including some which change the behaviour of how extreme the measures are in order to recover/rollback a pool to a working transaction. They come with the following ⚠ WARNING:
-X, -F and -T options can be extremely hazardous to the health of your pool and should only be used as a last resort.
I’m going to skip the coverage of those options as they are probable better in a dedicated recovery guide rather than a cheatsheet.

zpool export

Typically, exporting a pool is done because you might be putting the pool drive(s) into cold storage OR plan to import the pool on another system and you’d like to ensure everything is clean and consistent. See the man page [
] for more details including information on how ZFS can support importing and exporting pools on systems with different endianness (under certain circumstances).
💡 exporting a pool has a useful side-effect on ARC - all objects/blocks of an exported pool in the ARC will be evicted. I.e. it can be used to flush the ARC cache of the objects cached for the pool being exported. This can be useful in testing, test cases, and also verifying property settings are working as one expects.

zpool detach|offline|remove|online

command
description
1
[
] detach
Detaches a device from a mirror. Consider offline if the device is to be re-added later on.
2
[
] offline
Takes a pool device offline. The device is quiesced. NA for spares. -f to force a faulted state.
3
[
] remove
Remove devices from a pool. Allocated space will be evacuated to other pool devices. Can be cancelled -s. For a dry-run use -n.
4
[
] online
Brings a pool device online. NA for spares. -e to expand device space.
There are no rows in this table

vdev child device evacuation

TODO. see:
.

zfs - manage datasets

The zfs command manages ZFS datasets, which are children of pools. zfs defines and manages dataset hierarchy. A ZFS dataset has similar characteristics to a logical volume. So, the zfs command is similar to a logical volume manager.

zfs list

zfs list
zfs list -t all
zfs list -t snapshot
# types: filesystem, snapshot, volume, bookmark, or all

get zfs properties

zfs get encryption
zfs get compression

# or combined
zfs get compression,encryption,checksum

zfs create

TODO some basics

dataset encryption

# currently NOT best practice to set encryption on the root of the zpool
# DO NOT DO THIS read the
for details.
zpool create -o ashift=12 -O compression=lz4 -O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase -O xattr=sa <pool> <disk>

# currently best practice
# DO THIS!
# STEP 1: create unencrypted pool
zpool create -o ashift=12 -O compression=zstd -O checksum=edonr -O atime=on -O relatime=on -O xattr=sa <pool> <disk>

# STEP 2: create child dataset as an encryption root (name of your choosing)
zfs create <pool>/enc_data -o encryption=aes-256-gcm -o keylocation=prompt -o keyformat=passphrase

# STEP 3: verify the key
zfs unmount -u <pool>
zfs mount -l <pool>/enc_data # blocking prompt

# after reboot, to load key
zfs mount -l <pool>/enc_data

create zfs volumes aka zvols (but avoid using them: )

zfs create <pool>/testvolxfs -s -V 20G -o volblocksize=4k
zfs create <pool>/vm-102-disk-2 -s -V 20G -o volblocksize=4k -o encryption=off

zfs destroy

# dry run
zfs destroy -vn <pool>/testvolxfs
# ⚠ data loss warning ⚠
zfs destroy -v <pool>/testvolxfs

# ⚠ recursive destruction data loss warning ⚠ - it is also possible to use the -r switch to recursively destroy datasets (to destroy part of a zfs hierarchy). Be very careful with this! Do practice dry runs (-vnr) to see what would happen first.

zfs send and receive

TODO - prerequisite: snapshot

Send a 1:1 full (whole) pool replica

# full and recursive clone of a pool (replica) - from src to dst
# assumes that @boot-transfer snapshot exsits recursively on the src
# also specify that the dst pool datasets should use specific compression and checksum settings
zfs send --replicate rpool-old@boot-transfer| mbuffer -s 128k -m 128M | zfs recv -F -v -u -o compression=zstd -o checksum=edonr rpool-new

zdb - ZFS storage pool debugging

In ZFS, objects are grouped together in object sets. A dataset is an “object set” object, files and directories are objects grouped into a dataset.
zdb allows us to query and read information about object sets and their child objects. Once one has obtained the coordinate for a given object, its is possible to read/extract raw data or make a backup of a dataset.
As a rule, to increase the consistency of zdb, you should use it on an exported inactive pool.

References

Official zdb manual [
] and source code [
]. If you can read and follow the code it reveals a lot about ZFS internals.
The original ~2006 ZFS on-disk format [
]. A more recent OpenZFS on-disk format doc [
].

Glossary

SPA - Storage Pool Allocator DVA - Data Virtual Address ZAP - ZFS Attribute Processor DMU - Data Management Layer
DSL - Dataset and Snapshot Layer ZPL - ZFS Posix layer ZIL - ZFS Intent Log ZVOL - ZFS Volume

Should I run zdb on an exported or imported pool?

zdb can operate on exported or imported pools, zdb accesses block devices directly and “doesn’t care” about imported and active pools. For encrypted datasets there are limitations on what can be performed with older versions of zdb (zfs < 2.2) because zdb did not understand encryption prior to that version. See the section herein on zdb support for encrypted datasets. As a rule, to increase the consistency of zdb, you should use it on an exported inactive pool.
Citing from the zdb man page, with my emphasis:
zdb is an "offline" tool; it accesses the block devices underneath the pools directly from userspace and does not care if the pool is imported or datasets are mounted (or even if the system understands ZFS at all). When operating on an imported and active pool it is possible, though unlikely, that zdb may interpret inconsistent pool data and behave erratically.

zdb and encrypted datasets

Older zdb versions (zfs < 2.2) have some limitations when working with encrypted datasets, for example I get permission denied when attempting certain object operations on relative object paths (see ). In this case use the zfs object id which can be obtained via ls -i or the zdb -dd example provided herein. Keep in mind that data extracted via zdb from encrypted datasets will be... encrypted! (Unless you are using a version of zdb which supports encryption).
Checking the OpenZFS source code [
] [] from contributor robn [
] shows that zdb received a new option -K tagged for release in zfs 2.2, which provides support for loading an encryption key for operations that require it.

pool config

# show <pool> configuration from /etc/zfs/zpool.cache
zdb -C <pool>

# show the <pool> cached AND on-disk configuration
zdb -CC <pool>

# if you are running in a recovery environment, the cache is likely missing, so specify the disk
# 💡 if pool isnt whole-disk then use full partition path
zdb -C -e -p /path/to/disk/or/partition/containing/pool <pool>

# you can also specify a directory of devices to search
zdb -C -e -p /dev/disk/by-id/ <pool>

print block stats for a pool

Performs a block leak check and prints high level block stats for a pool:
zdb -b <pool> -e -p
Performs a block leak check and prints various block stats including a block size histogram table:
# 💡 specify -b up to six times to increase verbosity.
zdb -bbb <pool>

list top-level objects in a given pool

zdb -d <pool>

list all objects in a given pool

zdb -dd <pool>

list objects in a given dataset

zdb -dd <pool>/path/to/dataset

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.