Backups

While there is a variety of backup software out there, this article will only cover two: duplicity and git-annex. git-annex is not backup software in of itself, but it is still useful for such a purpose.

Duplicity

Duplicity uses a GPG encrypted tar format. The primary advantage of duplicity is that the archives are very small compared to alternatives (see gilbertchen’s benchmarks). The two major disadvantages is that backup/restore time is lengthy and that the incremental backups are useless without the full backup in the chain. One thing I will give to Duplicity compared to Duplicacy is that the command-line interface is superior.

GPG

In order to instruct Duplicity to use gpg2, pass --gpg-binary='gpg2'. You can then either use the PASSPHRASE environment variable or the --use-agent flag. If you go with the former route, your passphrase will be in cleartext, but if it’s in your crontab or protected with 700 permissions only your user can read it. For the latter, you may opt to keep your passphrase cached so that you can run the backups unattended. Add these to your ~/.gnupg/gpg-agent.conf:

default-cache-ttl 34560000
max-cache-ttl 34560000

Then reload the GPG agent (either echo RELOADAGENT | gpg-connect-agent or gpgconf --kill gpg-agent).

Keychain

Keychain is a front-end for ssh-agent and gpg-agent. It will cache your keys and export environment variables (SSH_AUTH_SOCK, etc.) that can be sourced for non-interactive scripts like crontabs. At the time of writing, Fedora ships version 2.8.0, which is too old for our purposes. The latest version at the time of writing (2.8.5) allows us to use GPG2. Since it’s just a shell script, installation is simple:

git clone [email protected]:funtoo/keychain.git
cd keychain && make
cp keychain ~/.local/bin/

Unless specified, Keychain will not start gpg-agent nor use gpg2. Further, you need to explicitly specify which keys to use (i.e, id_rsa). You will also need to invoke Keychain from your shell startup scripts. For Bash, this will look like:

# Environment variables automatically sourced, no need to do it manually here
eval `keychain --agents gpg,ssh --gpg2 --eval id_rsa some_gpg_key_id`

For Fish:

if status --is-interactive
    keychain --agents gpg,ssh --gpg2 --eval id_rsa some_gpg_key_id
end

if test -f ~/.keychain/(hostname)-gpg-fish
    source ~/.keychain/(hostname)-gpg-fish
end

if test -f ~/.keychain/(hostname)-fish
    source ~/.keychain/(hostname)-fish
end
Note
At the time of writing the Fish example in the man Keychain page is broken. This example was pulled from issue #4583 in the Fish issue tracker.

Finally, add this to the top of your cron jobs:

[ -z "$HOSTNAME" ] && HOSTNAME=`uname -n`
[ -f $HOME/.keychain/$HOSTNAME-sh ] && \
    . $HOME/.keychain/$HOSTNAME-sh 2>/dev/null
[ -f $HOME/.keychain/$HOSTNAME-sh-gpg ] && \
    . $HOME/.keychain/$HOSTNAME-sh-gpg 2>/dev/null

Unattended backups

Note
If you intend to use systemd, it cannot be used within a (user) cron tab. It can only run within a login session or be run as root.

However, two commands you may find useful are flock and systemd-inhibit. flock will allow you to prevent jobs from overlapping. You can also wake up the system by writing a systemd unit and using the WakeSystem property. Example:

[Unit]
Description=Weekly backup

[Timer]
Unit=weekly_backup.service
OnCalendar=Sun 23:00:00 EST
WakeSystem=true

[Install]
WantedBy=multi-user.target

And the corresponding service file:

[Unit]
Description=Weekly backup

[Service]
Type=oneshot
ExecStart=/bin/systemd-inhibit /bin/su -c "/usr/bin/flock -w 0 /path/to/cron.lock # ...
Note
The service files should not have an [Install] section. When you enable the units, only enable the timers.

Read man systemd.time for what format OnCalendar takes. You can verify the time format is correct by using systemd calendar. Since WakeSystem requires privileges, this cannot be a per-user unit. So place them inside /etc/systemd/system.

flock ensures that if there’s a conflict, the monthly (i.e, full backup) job will take precedence. You can run fuser -v /path/to/cron.lock to see what processes are holding a lock.

systemd-inhibit on the other hand will prevent the system from suspending until the given command is complete. Per the documentation, it can inhibit a variety of operations. By default, this is idle:sleep:shutdown but laptop users will find handle-lid-switch useful.

git-annex

git-annex is a location/metadata tracker that’s built on top of git. It essentially adds new verbs (prefixed with git annex) to any configured repository. There are a few things to keep in mind:

  • git annex init may not initialize the repository with the latest version. i.e, if you have git-annex v6, the repository may be v5. In that case, you should run git annex upgrade
  • git annex sync needs to be run in each repository, not just one, if you are using a distributed rather than centralized workflow
  • In v6, once a file is unlocked, it remains unlocked. If you make frequent changes to files you should use git annex unlock since direct mode is deprecated

The following helper script should get you started: