Notes on NFS — or why you might need Proxmox Backup Server
The previous post ended with "let's go" — first Synology backup running tonight. In the morning I opened the management panel. Interface doesn't load. :)
SSH
Server is reachable, ping goes through, SSH works. But uptime says:
load average: 26.29, 24.46, 23.54
Load of 26 on a Xeon 2388G is not normal operating mode — that's every core pinned and the major services lying down for a rest.
Three pveproxy workers sitting in D-state since 02:59.
D-state means the kernel is waiting for I/O and won't hand the thread to anyone else. SIGKILL doesn't help.
In the logs:
100.64.0.4:/volume1/pve-backups ... hard,fatal_neterrors=none,timeo=600,retrans=2
In tailscale logs from 03:36:
netcheck: UDP is blocked, trying HTTPS
timeout opening TCP 100.64.0.5 => 100.64.0.4:111
What happened
→ Backup job started at 03:00
→ At 03:36 — tailscaled missed a WireGuard keepalive under the load of active backup I/O
→ UDP connection went dark: NAT on the home router expired
→ Tailscale switched to DERP relay over HTTPS
→ DERP runs on the same physical machine (I know)
→ hard + fatal_neterrors=none mount: processes wait forever, no errors returned
→ Load climbs → headscale VM starves → DERP degrades → Tailscale can't reconnect
→ No exit. Deadlock.
NFS over Tailscale with DERP relay on the same machine isn't just a bad config. It's an architecture that will deadlock on any UDP blip. Not "might" — will.
Result: 16 D-state processes, load 29, pveproxy unresponsive, headscale VM unreachable over SSH. VMID 9000 (465 MB) finished writing to .vma.zst at 03:01. VMID 1013001 — 15 GB of an unfinished .vma.dat out of 500 GB.
How I fixed it
reboot -f didn't execute — SSH couldn't deliver the command at load 29. Had to use SysRq:
echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-triggerAfter reboot — rebuilt the connection scheme.
Tailscale is out of the backup path entirely. MikroTik port-forwards TCP 2049 to the Synology, source-restricted to the Moscow server's public IP only. Two firewall rules: accept from that IP, drop everything else on that port. RouterOS is genuinely great for this.
It works.
In storage.cfg:
options vers=4,soft,timeo=30,retrans=3
vers=4 — no portmapper needed, just port 2049.
soft — three failures over 3 seconds, process gets an error and exits clean. Backup aborts, host keeps running.
Ran manually. VMID 9000 — 55 seconds, 473 MB. VMID 1013001 went from scratch: 225 MB/s off ZFS, ZSTD compressing on the fly before sending.
→ hard NFS over any tunnel isn't fault tolerance — it's "I'll hang forever instead of failing"
→ DERP relay on the same machine as the NFS client is a backup exit through the same wall
→ WireGuard keepalive under I/O load — that's UDP behavior, not a bug
→ NFSv4 doesn't need portmapper. One port, 2049. That's enough.
The previous post in this series ended on "let's go." This one ends on "it works."