Incremental backup is a great thing, especially on a laptop that is easy to lose. There are of course many ways to accompish this with widely used tools like rsync. My previous solution was using duplicity to backup to S3 glacier, with GPG encrypted and compressed archives, from a btrfs snapshot volume. This did work, however, it was not very fast at all, independent of network speed. Calculating the differences between the current and existing snapshot states is a very expensive and slow operation, and unfortunately GPG is also quite slow for large amounts of data.
What was epsecially un-needed about this setup was the time needed to calculate the differences in the snapshots. btrfs snapshots themselves are great, but the far superior option is using btrfs send to another btrfs volume. The only trouble was, having another btrfs volume to send to. Additionally, I needed to find a good utility for wrapping up the logic around btrfs send and snapshot.
I decided the best way to achieve that was to set up a storage machine using btrfs. I wanted something small and low power, as the only place I have to put it is my office at home. For the case, ITX form factor was a natural choice, and I went with a Fractal Design Node 304. As with all of Fractal Design’s cases it’s a pleasure to work on, even considering the necessarily compact size. For the Mainboard and CPU, I picked a Supermicro A1Sri-2758F. This has an 8-core Atom C2758 processor, which is sort of overkill, but I wanted to run other things as well. There are cheaper setups of course, but ECC was important to me, to avoid any potential bit-rot over time. Continuing with the overkill I put in 16GB of ECC ram, and a Seasonic SS-400FL2 fanless PSU. For the all-important hard drives, I shucked the drive out of 4 Seagate Expansion 8TB external hard drives. These were super cheap at under $150 a piece when I bought them. Since I am not doing a lot of in-place modifications, the fact they are SMR drives is fine to me. For the operating system, I just used a SATA SSD I had lying around from an old laptop.
Here it is assembled, bad cable management and all
For the software, I went a little more bare-bones and simply installed Debian testing, instead of some distro like FreeNAS or similar. For the RAID setup, I chose btrfs native RAID10. I didn’t want to use RAID5 here as rebuild times would be dangerously high, and I suspect the uncorrected read error rate from these cheap drives is unsuitable for parity-based RAID in general. Creation is super simple:
mkfs.btrfs -m raid10 -d raid10 /dev/sda /dev/sdb /dev/sdc/ /dev/sdd
For health monitoring, besides the usual smartctl, I check btrfs’s device stats:
parshimers@yam ~> sudo btrfs device stats /dev/sda
[/dev/sda].write_io_errs 0
[/dev/sda].read_io_errs 0
[/dev/sda].flush_io_errs 0
[/dev/sda].corruption_errs 0
[/dev/sda].generation_errs 0
This is easy to turn into a cron job to check if any of these values are nonzero:
@hourly /bin/btrfs device stats /dev/sda | grep -vE ' 0$'
Similarly to maintain the array, I do a scrub every month:
@monthly /bin/btrfs scrub start -Bq /mnt/cellar
And collect smartctl info, just to be safe:
@monthly /usr/sbin/smartctl --test=long /dev/sda &> /dev/null
@monthly (/usr/sbin/smartctl -a -d ata /dev/sda) | /bin/grep "Serial\|Firmware\|result:\|Reallocated_Sector_Ct\|Spin_Retry_Count\|Reallocated_Event_Count\|Current_Pending_Sector\|Offline_Uncorrectable\|Extended\ offline"
Now all that I need is a good utility to wrap up the functionality that btrfs provides for incremental backup, to make it a seamless process. Not much is necessary- a simple bash script could suffice for most things. However I wanted:
- Privilege separation. Even keyed ssh to root is not wonderful.
- Understandable configuration separated from code, in files.
- Same utility for local and remote targets
The best utility I have found so far is btrbk. It does pretty much everything, and is easy to configure. For my use, the config is just a few lines on the client side:
snapshot_preserve_min 2d
snapshot_preserve 14d
target_preserve_min no
target_preserve 20d 10w *m
snapshot_dir rootfs/backup
backend btrfs-progs-sudo
ssh_identity /etc/btrbk/ssh/id_rsa
volume /mnt/btr_pool
subvolume rootfs
target send-receive ssh://grusonii.echinocact.us/mnt/cellar/w541
ssh_user btrbk
The btrfs-progs-sudo backend restricts the ‘btrbk’ user somewhat via sudo, with the following allowed paths in sudo:
btrbk ALL=NOPASSWD: /bin/btrfs, /sbin/btrbk, /usr/sbin/btrbk, /usr/share/btrbk/scripts/ssh_filter_btrbk.sh, /bin/readlink
Which obviously is not really much, being able to delete the filesystem or copy it entirely. However it will stop the need for remote root login which is an easy target for low-effort bots and the like.
Local backup is even simpler. On this machine I have a separate harddrive installed for media and backups, and using it with btrbk is simple:
snapshot_preserve_min 2d
snapshot_preserve 14d
target_preserve_min no
target_preserve 20d 10w *m
snapshot_dir rootfs/backup
volume /mnt/btr_pool
subvolume rootfs
target send-receive /mnt/heap/btrbk