Home lab & data vault

Explore

Pages

Data recovery and look back aka time machine

Created: 2023-12-10 Last update: 2023-12-12

I recently tried to explain a little bit about my backup and recovery strategy on reddit, and a fellow redditor made comments that suggest people might have the following reactions to my explanation. So hopefully this post will clear things up! 😉

⁠

My home lab / data vault set up

This section is a summary of the details found here:

Background on my goals and data storage concepts⁠

⁠

The following diagram shows the logical implementation of the lab/vault chassis and data related topics. The red numbers highlight:

OpenZFS and the zpools - PRIMARY DATA - verifiable data storage and corruption detection

The cold storage / off-site - BACKUP 2nd copy of primary data - verifiable backup and corruption detection

6 data, 3 parity SnapRAID array - near-time parity - verifiable data storage:

Up to 3 data/store volumes can fail concurrently and be rebuilt from parity

Flexible scrubbing strategy

File level undeletion and corruption recovery

Note: The data in the pools is glacial and does not require real-time parity or high availability array features. At the time of writing, my primary data does NOT use ZFS mirroring or parity (z-raid) features. Feel free to read how I've evaluated various setups

here⁠

⁠

Here one can see there is a physical layer where the hypervisor runs a modern kernel and OpenZFS, and a virtual layer using ZFS as a foundation.

The layers flow approximately as follows:

Redundant and battery backed power supply

Chassis and hardware with ECC RAM and up to 10 GbE connectivity

Proxmox 8 Virtual Environment [

link⁠

] - Debian based with modern Ubuntu kennel, maintains own OpenZFS ZoL fork with patches

ZFS

rpool - SSD zpool mirror for proxmox and rootfs for kvms and containers

PRIMARY DATA - six 5TB single disk encrypted zpools - spinning rust with Intel 900p to boost synchronous write workloads

BACKUP - cold storage - six 5TB single disk encrypted zpools - synchronised with data pools via ZFS replication

KVM filer [

link⁠

]

six virtual “store” volumes (XFS raw partitions on ZFS) distributed across and provisioned from the six data zpools

OMV - provides filer capabilities and services to the local net like smb and co [

link⁠

]

mergerfs - provides a single file hierarchy “union” of the n store volumes [

link⁠

]

SnapRAID array 6d 3p (triple parity) [

link⁠

]

More details here:

Background on my goals and data storage concepts⁠

⁠

Data integrity topics

My approach maintains 3 layers of checksums: 1) ZFS, 2) SnapRAID, and 3) standalone sfv format hierarchical lists

L1 - ZFS checksums

ZFS implements hierarchical checksumming of all data and metadata, ensuring that the entire storage system (zpool) can be verified on use (as non-cached data is read from underlying storage), and confirmed to be stored correctly, or remedied if corrupt (requiring parity or >1 block copies such as a mirror). Checksums are stored with a block's parent block.

Multiple checksum algorithms are supported and some support advanced features such as nopwrite, where OpenZFS compares the checksums of incoming writes to the checksums of the existing on-disk data and avoids issuing any write i/o for data that has not changed.

L2 - SnapRAID checksums

SnapRAID is a filesystem level, near-time parity solution that offers to maintain checksums and parity information for a collection of drives (or paths) - the logical array. It is possible to verify individual files, one or more drives, and offers a flexible scrubbing rather than only the full array. An array supports up to 6/hexa parities. A major advantage of SnapRAID is that it is not fussy about the size and type of drives and supports a mixed drive array, and because the array is logical, drives can be accessed independently of the array, i.e. data is independent of the host hardware and the SnapRAID software.

L3 - Standalone checksums

Motto: “To be sure, to be sure!” Each data “store” volume maintains a standalone set of checksums of it’s file hierarchy. The sfv checksum format is used [

link⁠

] which is interoperable with tools such as cfv and rhash. The checksums are occasionally updated to reflect the latest changes in the file hierarchy. The checksums can be used for independent verification of file integrity. A specific use case in the past was to verify large copy jobs of data with rsync from one location to another and to be able to independently verify that everything went as expected and nothing bad happened like read holes on the source system, or write holes on the destination system, and so on. ZFS and SnapRAID are not infallible, a fascinating recent example was a silent corruption bug in OpenZFS (Issue #15526) that had been in the source code for a long time but was difficult and unlikely to reproduce under typical workloads [

link⁠

]. A lot of folks wished they had independent checksums to verify whether they were affected or not.

Data scrubbing

Scrubbing is the technique of verifying that stored data matches known good checksums - that data is integral.

ZFS OpenZFS offers pool-level scrubbing which scrubs the entire pool. Typically, scrubs are scheduled to run monthly to verify the integrity of the pool and detect silent corruption or underlying storage problems.

SnapRAID SnapRAID’s content file structure stores information about the files in the array including block map, modification and last scrub timestamps, and the checksum. With this info SnapRAID can offer a flexible scrubbing approach that allows the user to specify what and how much of the array’s data to scrub.

# check 5% of the array for files with scrub timestamps older than 10 days:

snapraid -p 5 -o 10 scrub

My schedule: I run this 5% scrub on my array nightly and the oldest scrubbed block/file hovers around ~20 days old. My strategy is to “exercise” the data and drives on a daily basis - to become aware of data or drive problems as soon as possible so that proactive maintenance can be carried out.

I also scrub data as soon as it is synchronised into the array, as follows:

snapraid --pre-hash sync && snapraid -p new scrub

Which has the following effects:

Pre-hash the file/block delta (new data) prior to parity computation, then sync the array This causes the data delta to be read twice for extra integrity in case something bad happens during a sync

Scrubs the newly added data in the array to verify post sync consistency, and sets fresh last scrub timestamps. Based on my scrub schedule and logic, this means the newest blocks in the array will go to the back/end of the scrub queue.

This approach aims to reduce the chances of a read or write hole or similar error during sync operations and maximise the chances that the data in the array is integral.

Scrub notifications zed handles OpenZFS scrub notifications on the hypervisor, and I wrote a bash script for SnapRAID scrubs that triggers a notification if something goes wrong with a nightly scrub.

Recovery layers

When drives fail or corruption happens (see my

Data recovery stories⁠

). I can choose to recover the data via SnapRAID or from backups.

Verified Backups

A key point: backups don’t mean anything unless they have been tested (that data can be recovered) AND verified to be integral, and that this testing and verification is done on a regular basis.

In the worst case scenario, I can recover all my data and disks from the second verified copy of my primary data.

For example, if a zpool is lost due to a drive failure, I have the option of creating a new pool and seeding it from a snapshot of my second copy of the data. If there is a data gap between the backup and current data (temporary data lost), this gap can be closed using Snapraid parity.

A very nice feature of the Copy-on-Write transactional nature of OpenZFS is that this type of restore does not break ZFS replication, once the primary data is restored, including the snapshots, the primary to backup replication can continue as normal.

Parity

In my setup with 6d 3p, SnapRAID provides near-time parity for up to 3 concurrent volume failures.

SnapRAID provides an easy-to-use CLI to rebuild volumes, rebuild parity, fix detected corruption or recover deleted files and hierarchies.

For example, if a zpool is lost due to a drive failure, I have the option to:

create a new pool and an empty volume in that new pool, and provision to the kvm

restore data to that new volume with SnapRAID

verify the restored data with SnapRAID.

Snapshots

In certain scenarios ZFS snapshots can be used to look back at, and recover data. Consider two scenarios:

I want to rollback a virtual volume to a specific point in time Solution: I can rollback to one of my snapshots

I want to mount a virtual volume at a specific point in time and interrogate or restore files Solution: I can use the special .zfs folder to mount virtual volumes read-only and do the needful 😉 If necessary, I can read-only mount all volumes at a given point in time, mount with mergerfs and do the needful

zpool checkpoints

A very useful feature of ZFS is pool checkpoints. Creating a checkpoint, similar to taking a snapshot of a dataset. A pool checkpoint marks a point in time in the pool (a transaction) and gives the user the ability to rollback (or rewind) to that point in time. This is great for at least three use cases:

Create a pool checkpoint before upgrading zpools and/or ZFS versions - if something goes wrong - rollback.

Create a pool checkpoint before destroying zfs datasets, if something goes wrong - rollback.

Create a pool checkpoint before major operating system changes, e.g. distribution upgrades.

AFAIK, for an rpool (root pool), where the rootfs and operating system is installed on ZFS, a bootable Live CD such as System Rescue with ZFS support [

link⁠

] is required to rewind pool checkpoints, as the rewind is performed during the pool import. See my OpenZFS Cheatsheet [

link⁠

Verifying checksums after recovery

An important step after recovery is to verify that the recovered data matches known good checksums.

Data drives can be accessed individually

Say my chassis becomes inoperable or inaccessible, I have the ability to boot a Live CD i.e. System Rescue with ZFS support [

link⁠

] and import zpools and get access to my data, either my primary copy or my backup copy. This works because the drives are part of a logical array and not “locked in” to a ZFS or vendor array.

Recovering a deleted or corrupt file

As mentioned in the recovery layers section, recovery can be performed as follows:

Recover data from backup copy OR

Recover data using SnapRAID OR

Recover data using dataset snapshots

If a scrub detects corruption?

Use one of the recovery layers to restore and verify the data.

If the corruption happens to be in an XFS raw partition file stored on ZFS, care must be taken to coordinate the restore using one of the recovery layers - it may be best to create a new zpool, restore and verify data, and then destroy the corrupt zpool. A documented case and detailed recovery strategy can be read here [

link⁠

Recovering from a bad operational state

For the hypervisor bad states could include:

Failed operating system upgrade

Bad zpool upgrade

Bad zfs dataset destruction

Ideally one would have a recent pool checkpoint before undertaking these kind of steps, in which case it is possible to rewind the pool to the checkpoint, reverting everything that happened after the checkpoint.

Worst case, restore from backups. I do perform infrequent backups of the hypervisor rpool via zfs send and I have proxmox backups of my KVM’s and containers. These are stored on USB storage attached to the hypervisor. The backups are tested and verified.

For KVM’s and containers, either rollback to a good ZFS snapshot OR restore from a backup.

Looking back at older snapshots aka time machine

I keep snapshots for ~3 years. When I’m pruning old snapshots I like to use ZFS special .zfs snapshot folder to mount volumes from older snapshots and compare my vault between two points in time and verify what I’m pruning. This is a rudimentary time machine and works very well, especially combined with tools like Beyond Compare [

link⁠

]. The workflow is something like this:

Pick the two points in time (snapshots) that I’d like to compare.

LEFT side of the comparison: use the special .zfs snapshot folder to mount volumes read-only, use a temporary mergerfs mount and smb share.

RIGHT side of the comparison: either repeat the step for the snapshot to compare, or just compare current state of the vault.

Perform the comparison, typically with Beyond Compare, save anything I might still want to keep, verify everything else is no longer required. Logic is something like:

verify all LEFT orphans are duplicates of data on the RIGHT or can be genuinely discarded. I use AllDup [

link⁠

] to help with this, and the standalone sfv checksums that I maintain for each store volume.

verify all 1:1 diffs are as expected

review the RIGHT orphans are as expected

Prune the discardable snapshops from my PRIMARY data zpools. I might keep the snapshot(s) being pruned in the backups for a little longer.

Update my backup/snapshot diary so I know why I did what I did, and when.

This technique can also be used on an ad hoc basis to inspect my vault at a given point in time and perform file recovery and comparisons. It helps me answer the question: "What changed between time X and Y?" and make good snapshot pruning decisions.

httm – The Hot Tub Time Machine is Your ZFS Turn-Back-Time Method

Robert Swinford aka

kimono-koans⁠

has created a project called httm [

link⁠

] that can be used for CLI-based time-machine like interaction with your ZFS snapshots. Some related blog posts by Robert are here: [

link⁠

] and [

link⁠

], where he goes into detail on httm and some use cases. This includes a recording of httm in action.

My vault is stored on XFS stored on ZFS, so I’m dealing with large raw XFS partitions on the ZFS filesystem. So I don’t have a regular use case for httm outside of my rpool. You may find it useful for your use cases. Kudos for Robert for writing and sharing his project with the community.

My home lab / data vault set up

Data integrity topics

L1 - ZFS checksums

L2 - SnapRAID checksums

L3 - Standalone checksums

Verifying checksums after recovery