Backups

Original author: Tommy Nguyen

Last modified: Tue Aug 23 20:21

While there is a variety of backup software out there, this article will only cover two: duplicity and git-annex. git-annex is not backup software in of itself, but it is still useful for such a purpose.

BTRFS

As BTRFS is the default in Fedora now, one may want to take advantage of the snapshot feature. I’ve written a series of articles on combining Borgmatic and Snapper.

Duplicity

Duplicity uses a GPG encrypted tar format. The primary advantage of duplicity is that the archives are very small compared to alternatives (see gilbertchen’s benchmarks). The two major disadvantages is that backup/restore time is lengthy and that the incremental backups are useless without the full backup in the chain. One thing I will give to Duplicity compared to Duplicacy is that the command-line interface is superior.

Danger

Duplicity is undergoing a Python 2 to 3 transition, which seems to be resulting in a lot of bugs. Use at your own risk.

Warning

I’m not sure if this is a bug or not, but you should be aware that for include/exclude patterns, specifying a path like /path/to/go will also match /path/to/gopher, even when not using a wildcard.

GPG

In order to instruct Duplicity to use gpg2, pass --gpg-binary='gpg2'. You can then either use the PASSPHRASE environment variable or the --use-agent flag. If you go with the former route, your passphrase will be in cleartext, but if it’s in your crontab or protected with 700 permissions only your user can read it. For the latter, you may opt to keep your passphrase cached so that you can run the backups unattended. Add these to your ~/.gnupg/gpg-agent.conf:

default-cache-ttl 34560000
max-cache-ttl 34560000

Then reload the GPG agent (either echo RELOADAGENT | gpg-connect-agent or gpgconf --kill gpg-agent).

Keychain

Keychain is a front-end for ssh-agent and gpg-agent. It will cache your keys and export environment variables (SSH_AUTH_SOCK, etc.) that can be sourced for non-interactive scripts like crontabs. At the time of writing, Fedora ships version 2.8.0, which is too old for our purposes. The latest version at the time of writing (2.8.5) allows us to use GPG2. Since it’s just a shell script, installation is simple:

git clone git@github.com:funtoo/keychain.git
cd keychain && make
cp keychain ~/.local/bin/

Unless specified, Keychain will not start gpg-agent nor use gpg2. Further, you need to explicitly specify which keys to use (i.e, id_rsa). You will also need to invoke Keychain from your shell startup scripts. For Bash, this will look like:

# Environment variables automatically sourced, no need to do it manually here
eval `keychain --agents gpg,ssh --gpg2 --eval id_rsa some_gpg_key_id`

For Fish:

if status --is-interactive
    keychain --agents gpg,ssh --gpg2 --eval id_rsa some_gpg_key_id
end

if test -f ~/.keychain/(hostname)-gpg-fish
    source ~/.keychain/(hostname)-gpg-fish
end

if test -f ~/.keychain/(hostname)-fish
    source ~/.keychain/(hostname)-fish
end

To avoid future issues, make sure you have a permanent hostname. You can set it with:

hostnamectl set-hostname hostname

Note

At the time of writing the Fish example in the man Keychain page is broken. This example was pulled from issue #4583 in the Fish issue tracker.

Finally, add this to the top of your cron jobs:

[ -z "$HOSTNAME" ] && HOSTNAME=$(uname -n)
[ -f "$HOME/.keychain/$HOSTNAME-sh" ] && \
    source "$HOME/.keychain/$HOSTNAME-sh" 2>/dev/null
[ -f "$HOME/.keychain/$HOSTNAME-sh-gpg" ] && \
    source "$HOME/.keychain/$HOSTNAME-sh-gpg" 2>/dev/null

Unattended backups

Note

If you intend to use systemd, it cannot be used within a (user) cron tab. It can only run within a login session or be run as root.

However, two commands you may find useful are flock and systemd-inhibit. flock will allow you to prevent jobs from overlapping. You can also wake up the system by writing a systemd unit and using the WakeSystem property. Example:

[Unit]
Description=Weekly backup

[Timer]
Unit=weekly_backup.service
OnCalendar=Sun 23:00:00
WakeSystem=true

[Install]
WantedBy=multi-user.target

And the corresponding service file:

[Unit]
Description=Weekly backup

[Service]
Type=oneshot
ExecStartPre=/bin/sleep 1m
ExecStart=/bin/systemd-inhibit /bin/su -c "/usr/bin/flock -w 0 /path/to/cron.lock # ...

We sleep before running systemd-inhibit because there’s a race condition if it runs while the system is still waking from suspend. See this mailing list post for details.

Note

The service files should not have an [Install] section. When you enable the units, only enable the timers.

Read man systemd.time for what format OnCalendar takes. You can verify the time format is correct by using systemd calendar. Since WakeSystem requires privileges, this cannot be a per-user unit. So place them inside /etc/systemd/system.

flock ensures that if there’s a conflict, the monthly (i.e, full backup) job will take precedence. You can run fuser -v /path/to/cron.lock to see what processes are holding a lock.

systemd-inhibit on the other hand will prevent the system from suspending until the given command is complete. Per the documentation, it can inhibit a variety of operations. By default, this is idle:sleep:shutdown but laptop users will find handle-lid-switch useful.

Alternatively, if you choose not to use systemd-inhibit, you can simply adjust the power management inactivity value. For example, on XFCE this would look like:

xfconf-query -c xfce4-power-manager -p /xfce4-power-manager/inactivity-on-ac -s 0

This has the advantage of not requiring root privileges.

git-annex

git-annex is a location/metadata tracker that’s built on top of git. It essentially adds new verbs (prefixed with git annex) to any configured repository. There are a few things to keep in mind:

  • git annex init may not initialize the repository with the latest version. i.e, if you have git-annex v6, the repository may be v5. In that case, you should run git annex upgrade

  • git annex sync needs to be run in each repository, not just one, if you are using a distributed rather than centralized workflow

  • In v6, once a file is unlocked, it remains unlocked. If you make frequent changes to files you should use git annex unlock since direct mode is deprecated

As far as I’m aware, git-annex doesn’t track permissions or xattrs (important for SELinux). However, etckeeper has some helper scripts which store and restore metadata: 20store-metadata and 20restore-etckeeper respectively. Rename the scripts to git-store-metadata and git-restore-metadata and add them to your PATH. You will need to set the VCS environment variable to git.

In order to restore security contexts, you can simply use chcon -R --reference=source_dir/ target_dir/, where source_dir contains the context you want to apply to target_dir.

The following helper script should get you started:

#!/bin/bash

set -x
set -o pipefail
shopt -s dotglob

# Import environment variables SSH_AUTH_SOCK, etc.
[ -z "$HOSTNAME" ] && HOSTNAME=$(uname -n)
[ -f "$HOME/.keychain/$HOSTNAME-sh" ] && \
    source "$HOME/.keychain/$HOSTNAME-sh" 2>/dev/null
[ -f "$HOME/.keychain/$HOSTNAME-sh-gpg" ] && \
    source "$HOME/.keychain/$HOSTNAME-sh-gpg" 2>/dev/null

cd "$HOME/backup"

# ...snip...
# Copy your files to backup here
# If using cp, make sure you use -a to preserve permissions and xattrs
# If using rsync, make sure you use -avzAX
# ...snip...

git-store-metadata
git annex add .
git annex sync --content --message="$(date +%F)"

# For each remote we need to run sync in order to actually
# propagate the changes. Doing sync from the initial directory
# only creates a branch with the changes. Running sync in the target
# directory performs the merge.
for remote in $(git remote)
do
    URL=$(git remote get-url "$remote")
    cd "$URL"
    git annex sync --content --message="$(date +%F)"
    git-restore-metadata
done

Previously it was stated that git annex will create a symlink. This was incorrect. It’s the act of locking the file that does so. If you wish to always add files as unlocked (and manually lock files that you don’t intend on modifying), then use this option:

git annex config --set annex.addunlocked true

To always add files to the annex (otherwise git-annex will use regular git add in some situations instead):

git annex config --set annex.largefiles anything

Finally, git-annex ignores dot files by default. Change this with:

git annex config --set annex.dotfiles true