Introduction
Data loss is not a question of if, but when. Hardware failures, ransomware attacks, accidental deletions, and datacenter outages are realities every infrastructure team must plan for. A multi-site backup strategy built on ZFS snapshots and encrypted offsite replication provides a resilient foundation that balances performance, storage efficiency, and security. This guide covers the full stack: snapshot fundamentals, incremental send/receive pipelines, native ZFS encryption, offsite replication over SSH tunnels, automated retention policies, and restore verification procedures.
ZFS Snapshot Fundamentals
ZFS snapshots are point-in-time, read-only copies of a dataset or pool. Unlike traditional backup tools that copy data block by block, ZFS snapshots are instantaneous and consume zero additional space at creation — they only consume space as the live dataset diverges from the snapshot state. This copy-on-write property makes snapshots extremely cheap to create and ideal as the base unit of a backup pipeline.
Creating a snapshot is a single command:
zfs snapshot rpool/data/vms@2026-04-03_02:00
Snapshots are identified by the @ separator. The dataset name precedes it, the snapshot name follows. List all snapshots on a pool with:
zfs list -t snapshot -o name,creation,used,refer -s creation
The used column shows how much unique data the snapshot holds that is no longer referenced by newer snapshots — a key metric for understanding retention cost. Rolling back to a snapshot destroys all data written after the snapshot was taken:
zfs rollback rpool/data/vms@2026-04-03_02:00
Incremental Send and Receive
ZFS send/receive is the native mechanism for replicating datasets between pools, hosts, or storage backends. A full send serializes the entire dataset into a byte stream piped directly into zfs receive on a remote host.
# Full initial send
zfs send rpool/data/vms@2026-04-03_02:00 | ssh backup01 zfs receive tank/backups/vms
After the initial full send, incremental sends transmit only the blocks that changed between two snapshots:
# Incremental send between two snapshots
zfs send -i rpool/data/vms@2026-04-03_02:00 rpool/data/vms@2026-04-04_02:00 \
| ssh backup01 zfs receive tank/backups/vms
The -I flag (capital I) sends all intermediate snapshots between two points, useful when the remote is multiple snapshots behind:
zfs send -I rpool/data/vms@2026-04-01 rpool/data/vms@2026-04-04 \
| ssh backup01 zfs receive tank/backups/vms
For large datasets across slow links, pipe through mbuffer to smooth out I/O bursts and add progress visibility:
zfs send -I rpool/data/vms@2026-04-01 rpool/data/vms@2026-04-04 \
| mbuffer -s 128k -m 1G \
| ssh -c [email protected] backup01 \
"mbuffer -s 128k -m 1G | zfs receive -F tank/backups/vms"
Native ZFS Encryption
Since OpenZFS 0.8, native encryption is available at the dataset level. Unlike filesystem-level encryption tools such as LUKS, ZFS encryption encrypts individual blocks and allows granular key management per dataset. Creating an encrypted dataset with a passphrase:
zfs create \
-o encryption=aes-256-gcm \
-o keyformat=passphrase \
-o keylocation=prompt \
rpool/data/encrypted-vms
For automated operation, store the key as a raw file:
openssl rand -base64 64 > /etc/zfs/keys/encrypted-vms.key
chmod 600 /etc/zfs/keys/encrypted-vms.key
zfs create \
-o encryption=aes-256-gcm \
-o keyformat=raw \
-o keylocation=file:///etc/zfs/keys/encrypted-vms.key \
rpool/data/encrypted-vms
When sending encrypted datasets offsite, use the -w (raw) flag to transmit data in its encrypted form. The receiving host never sees plaintext — a critical property for offsite replication to untrusted or third-party storage:
zfs send -w -I rpool/data/encrypted-vms@yesterday rpool/data/encrypted-vms@today \
| ssh storage.example-corp.com zfs receive tank/offsite/encrypted-vms
Offsite Replication to Remote Storage Boxes
Many hosted storage providers offer SFTP/SSH access with large storage quotas. The challenge is that these targets do not run ZFS, so you cannot use zfs receive remotely. Instead, serialize the stream to a compressed file:
SNAP_DATE=$(date +%Y-%m-%d)
DATASET="rpool/data/encrypted-vms"
PREV_SNAP="${DATASET}@$(date -d yesterday +%Y-%m-%d)"
TODAY_SNAP="${DATASET}@${SNAP_DATE}"
zfs send -w -i "$PREV_SNAP" "$TODAY_SNAP" \
| pv \
| gzip \
| ssh -i /root/.ssh/storage_box \
-p 23 \
[email protected] \
"cat > /backups/vms/incremental_${SNAP_DATE}.zfs.gz"
For ZFS-capable offsite targets — a second Proxmox node, a VPS with ZFS — create a dedicated replication user with only the necessary permissions:
# On the remote backup host
useradd -m -s /bin/bash zfsrepl
zfs allow zfsrepl receive,create,mount,destroy tank/backups
Restrict the SSH key on the receiving end using command= in authorized_keys to prevent interactive shell access:
command="zfs receive -F tank/backups/vms",no-port-forwarding,no-x11-forwarding,no-agent-forwarding ssh-ed25519 AAAA... zfsrepl@proxmox01
Retention Policies and Automated Pruning
Unconstrained snapshots accumulate indefinitely and eventually exhaust pool space. A tiered retention policy balances granularity against storage cost. A common scheme for production infrastructure:
- Hourly snapshots — retain 24 (covering the last day)
- Daily snapshots — retain 7 (covering the last week)
- Weekly snapshots — retain 4 (covering the last month)
- Monthly snapshots — retain 12 (covering the last year)
The sanoid tool implements this policy declaratively. Define a policy in /etc/sanoid/sanoid.conf:
[rpool/data/vms]
use_template = production
recursive = yes
[template_production]
frequently = 0
hourly = 24
daily = 7
weekly = 4
monthly = 12
autosnap = yes
autoprune = yes
The companion tool syncoid handles replication with matching declarative configuration, managing bookmark tracking so incremental sends resume automatically after network interruptions. For custom scripted pruning:
#!/bin/bash
DATASET="rpool/data/vms"
KEEP_DAILY=7
zfs list -t snapshot -H -o name -s creation "$DATASET" \
| grep "@daily-" \
| head -n -${KEEP_DAILY} \
| while read snap; do
echo "Destroying old snapshot: $snap"
zfs destroy "$snap"
done
Verification: Restore Tests and Checksum Validation
A backup that has never been tested is not a backup — it is a hypothesis. Automated restore verification should run weekly at minimum. The verification process has two components: checksum validation and functional restore.
ZFS scrubs validate the integrity of all data on a pool against stored checksums:
zpool scrub rpool
zpool status rpool | grep -A3 "scan:"
For offsite file backups, generate checksums at send time and store them alongside the archive:
zfs send -w "$TODAY_SNAP" \
| tee >(sha256sum > /tmp/checksum.txt) \
| gzip > /tmp/backup.zfs.gz
scp /tmp/checksum.txt [email protected]:/backups/vms/backup_${SNAP_DATE}.sha256
For functional restore tests, clone the backup snapshot to an isolated dataset and verify filesystem integrity:
#!/bin/bash
RESTORE_SNAP="tank/backups/vms@$(date -d 'last sunday' +%Y-%m-%d)"
TEST_DATASET="tank/restore-test/vms"
zfs destroy -r "$TEST_DATASET" 2>/dev/null
zfs clone "$RESTORE_SNAP" "$TEST_DATASET"
if mountpoint -q "/mnt/restore-test"; then
echo "RESTORE OK: $(date)" | tee -a /var/log/backup-verify.log
else
echo "RESTORE FAILED: $(date)" | tee -a /var/log/backup-verify.log
/usr/local/bin/send_alert.sh "ZFS restore verification failed on $(hostname)"
fi
zfs destroy -r "$TEST_DATASET"
Cron Automation with Logging and Alerting
The full backup pipeline — snapshot, replicate, prune, verify — should be fully automated with structured logging and failure alerts. A production cron layout on a Proxmox host:
# /etc/cron.d/zfs-backup
0 * * * * root /usr/local/bin/zfs-snapshot-hourly.sh >> /var/log/zfs-backup/hourly.log 2>&1
0 2 * * * root /usr/local/bin/zfs-replicate-offsite.sh >> /var/log/zfs-backup/replicate.log 2>&1
0 3 * * * root /usr/local/bin/zfs-prune.sh >> /var/log/zfs-backup/prune.log 2>&1
0 4 * * 0 root /usr/local/bin/zfs-verify-restore.sh >> /var/log/zfs-backup/verify.log 2>&1
Each script should log a structured entry on success and send an alert on non-zero exit. A minimal alerting wrapper using curl to post to a webhook:
send_alert() {
local message="$1"
curl -s -X POST https://alerts.example-corp.com/webhook \
-H "Content-Type: application/json" \
-d "{\"text\": \"BACKUP ALERT on $(hostname): ${message}\"}"
}
trap 'send_alert "Script failed at line $LINENO"' ERR
Practical Example: Proxmox Host Backing Up VM and CT Datasets
A Proxmox VE host stores virtual machines under rpool/data as ZVOL-backed or directory-backed datasets. The following is a complete replication script for a Proxmox environment replicating to a secondary node at backup01.example-corp.com:
#!/bin/bash
set -euo pipefail
DATASETS=("rpool/data/vm-100-disk-0" "rpool/data/vm-101-disk-0" "rpool/data/subvol-200-disk-0")
REMOTE="backup01.example-corp.com"
REMOTE_POOL="tank/backups"
SNAP_NAME="auto-$(date +%Y-%m-%dT%H:%M)"
LOG="/var/log/zfs-backup/replicate.log"
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"; }
for DATASET in "${DATASETS[@]}"; do
SHORTNAME=$(basename "$DATASET")
REMOTE_DS="${REMOTE_POOL}/${SHORTNAME}"
log "Snapshotting ${DATASET}@${SNAP_NAME}"
zfs snapshot "${DATASET}@${SNAP_NAME}"
LAST_LOCAL=$(zfs list -t snapshot -H -o name -s creation "$DATASET" | tail -2 | head -1)
LAST_SNAP=$(echo "$LAST_LOCAL" | cut -d@ -f2)
log "Sending incremental ${LAST_SNAP} -> ${SNAP_NAME} to ${REMOTE}:${REMOTE_DS}"
zfs send -w -i "${DATASET}@${LAST_SNAP}" "${DATASET}@${SNAP_NAME}" \
| ssh -i /root/.ssh/zfsrepl -c [email protected] "$REMOTE" \
"zfs receive -F ${REMOTE_DS}"
log "Replication complete for ${SHORTNAME}"
done
log "All datasets replicated successfully"
Summary
A ZFS-based multi-site backup strategy delivers snapshot efficiency, encrypted offsite transfer, and automated lifecycle management in a single coherent toolchain. The key principles are: take frequent local snapshots, replicate incrementally to a secondary site with -w (raw) mode to preserve encryption end-to-end, enforce tiered retention policies with automated pruning, and run scheduled restore verification so failures are discovered during a drill — not a disaster. Combined with structured logging and alerting, this approach provides the observability needed to maintain confidence in backup integrity across heterogeneous infrastructure.
