Python-based rsync backup script

This is a backup script I created in Python which uses the eminent rsync utility to perform the actual backup operation.  This script addresses a few things that are lacking from rsync’s options that I really want: One is the ability to specify in one single config file which paths (directories and/or files) to backup and which to ignore.

If you’re familiar with rsync you will know that it actually does have the –exclude-from <file> option which will read from a designated file for paths to exclude.  But unfortunately that file only contains paths to exclude, and paths to include must be specified on the command line.

In the past I wrote a Bash script to perform this desired functionality however using Bash to do this was extremely tedious due to string quoting/escaping issues.  I finally decided to write the script in Python which is a vast improvement.

This is a first-draft version of the script and it is intended to run under Cygwin on Windows.  It will run on Linux but is not (yet) optimized for it.  I intend to improve this script even further but as it stands right now it very robust and works well.

The magic of this script is how it uses the config file to include and exclude paths. For example let’s assume you want to backup two directories /cygdrive/c/Users/smith/testdir and /cygdrive/c/Users/jones/testdir . Both of these would be specified in the config file just as they are:

/cygdrive/c/Users/smith/testdir
/cygdrive/c/Users/jones/testdir

Now let’s say both of these users have a directory named stuff under testdir and we want to exclude the stuff directory for both users. We can do so with:

-testdir/stuff

Note that the at the beginning of the line specifies that it is an exclude line. Exclude lines must begin with a hyphen followed by an anchor directory.

Exclude lines must always be anchored in at least the lowest level directory specified in a backup path. By anchored I mean that the last directory in the backup path /cygdrive/c/Users/smith/testdir overlaps with the first part of the exclude path testdir/stuff . In this example testdir is the anchor.

If you only want to exclude testdir/stuff from jones but not smith, then you can move the anchor point back one level to specify only the jones directory:

-jones/testdir/stuff

in which case jones is the anchor directory.

Similarly you could exclude only testdir/stuff for smith but not for jones:

-smith/testdir/stuff

I actually didn’t know rsync was this intelligent with it’s handling of excludes until I started testing it while creating this script. The ability to shift anchor points up the tree and to be able to use multiple source directories with exclude statements is amazing.

The config file for this backup script is extremely powerful in this way. One further thing to note is that spaces in pathnames must be escaped with a backslash. For example to exclude a directory “My Files” under testdir for smith use:

-smith/testdir/My\ Files

To include a directory “some stuff” in smith’s home use:

/cygdrive/c/Users/smith/some\ stuff

I could have finageled the script to allow specifying paths without having to backslash escape whitespaces, however I deliberately did not want to do this because the backslash escape is exactly how whitespaces in paths are represented on the Unix command line, for example in output of the command pwd.

You can also include comment lines in the config file. Comment lines must contain # as the first character of the line. Blank lines in the config file are ignored.

The order of lines in the config file does not matter as they all will be parsed and sorted before processing.

You should not put trailing slashes at the end of pathnames in the config file but the script will remove them anyway for safety’s sake.

Obviously with any script that writes to disk you should be careful. Anything in the backup directory not specified as part of an include path in the config file will be deleted.

Python is growing into an extremely powerful systems administration language which I would not be surprised – and would be happy – if it supplants the Bash shell in the future (see this for an exciting project in the present). It already has the ability to perform an increasing amount of os-related functions (also see here).

#!/usr/bin/python

# backupsync.py - a script to backup directories to a specified location
#
#   paths to backup/exclude from backup are specified in config file
#
#   created by alaya.net 2017.12.26
#   last edited on Tue 2017.12.26

import sys
import re
import shutil
from subprocess import run

# require Python interpreter > v.3.5
assert sys.version_info >= (3,5)

# don't run if not under a Unix-type environment
#  this could potentially happen e.g. if system default Python is Anaconda
#  but Cygwin is also installed
if sys.platform == 'win32':
    sys.stderr.write("This script is incompatible with the Windows native"
                      "environment.\nIt should be run under a Unix-like"
                       "environment such as Linux or Cygwin.\n.")
    exit(1)


# exit with error if rsync not found
if not shutil.which('rsync'):
    sys.stderr.write("rsync executable not found in PATH\n")
    exit(1)

confFile = "/cygdrive/c/Users/smith/backupsync.conf"
destDir = "/cygdrive/d/backup"

excludesList = list(); includesList = list(); excludes = ""; includes = ""

with open(confFile, 'r') as f:
    for line in f:
        if not re.search(r'^#|^\s*$', line): # ignore comment and blank lines
            line = re.sub(r"/$", "", line) # strip any trailing slashes
            if re.search(r'^-', line):   # exclude line
                excludesList.append(re.sub(r"^-", "--exclude=", line))
            else:                       # include line
                includesList.append(line)


excludesList.sort(); includesList.sort()

for i in excludesList:
    excludes += " " + i.strip()

for i in includesList:
    includes += " " + i.strip()

cmd = ("rsync -rvR --size-only --delete --delete-excluded "
        + excludes + includes + " " + destDir + "/")

run(cmd, shell=True)

Below is a sample backupsync.conf config file:

# Config file for backupsync.py
# lines beginning with "-" are exclude lines

# paths to backup:
/cygdrive/c/Users/smith
/cygdrive/e

# paths to exclude:
-smith/ntuser*
-smith/NTUSER*
-smith/AppData/Local/Temp
-smith/AppData/Roaming/Local/Temp
-smith/AppData/Cache
-smith/AppData/Roaming/Pub/Cache
-smith/AppData/Roaming/Opera\ Software/cache
-smith/AppData/Local/Vivaldi/User\ Data/cache
-smith/AppData/Local/Vivaldi/User\ Data/Default/Cache
-e/Android

Because of the default permissions of the top-level /cygdrive and /cygdrive/c directories in Cygwin, when they are copied to the backup directory you may need to chmod them to the appropriate permissions for your user in order for the backup to run. This will be addressed in the next version.


Comments

Leave a Reply