IMHO this beats cloud storage

This is an old-school command, but I find that it is still way better than cloud storage for sharing a directory between two or more systems:

rsync [options] source destination

If you’re using Cygwin under Windows, then the command would most likely be:

rsync -rv --size-only --delete spectra:~/edu/ ~/edu/

Explanation:

On my other machine, called spectra, which I was doing homework on earlier today at the cafe, I have a directory under my home (C:\Users\me) named edu. I also have a directory named edu on my home system.

After working at the cafe, when I come home I want to sync the edu directory with my home system.

Now of course one way to do this would be to use a cloud storage system like Dropbox (or even much better would be Owncloud). But the thing is, I have found that actually for something like basic syncing of a directory, cloud storage is way overkill. You have to have a client running all the time on both systems, constantly monitoring the filesystem, and I’ve just seen too many slowdowns because of this when working heavily in synced directories.

Another serious issue is sync latency. Ok you may have a directory set up under cloud storage, but how often does the client running on your machine actually sync it with the cloud? Its not instantaneous. It can be intervals of time as long as 15 minutes. And what if it didn’t sync before you need to access it on the other computer? Also, how do you even know if it already synced, without checking the log of the sync client?

And one final consideration is network traffic. If you’re at a cafe working, do you really need to be pushing syncs to your cloud while you’re working, using up bandwidth? I don’t need to sync until I’m actually done working. In fact I don’t really even want to sync before then. Its kind of silly because I’m still working on files, editing them, creating or deleting them. Its kind of a waste to sync until I’m actually finished with my work.

All of which thus leads to: rsync. rsync is oldschool and is a really clean, highly efficient, and basically the most excellent tool for syncing directories between two machines.

To repeat the above command:

rsync -rv --size-only --delete spectra:~/edu/ ~/edu/

let me explain what’s going on.

First, I’m using the -r and -v switches. -r means be recursive, that is sync everything below the current level. This is necessary when syncing a directory. If you are only syncing one file, then you don’t need -r. Otherwise you do.

-v means be verbose. Whatever is being synced will be listed on the screen. When syncing a really large directory then sometimes -v can be omitted, but otherwise its nice to actually see what the command is doing.

–size-only is specific to Windows. This has to do with the way rsync compares files to determine whether or not they need to be synced, which is different on Linux filesystems than Windows. Under Linux –size-only would not be necessary because rsync can effectively compare the local and remote files to determine whether or not they are the same. But with Windows it can’t so easily. It needs the –size-only option unfortunately. I wish this were not so, and it does represent a slight degradation of rsync, but it still works.

Now let’s say that I’m about to go out to do my homework, and I can’t remember if I synced the edu directory of my home system with spectra. Then I can easily do a test run of rsync to see if it would actually sync anything:

rsync -rvn –size-only –delete ~/edu/ spectra:~/edu/

The -n switch means do a dry (test) run. No files are actually transferred but, because the -v switch is used, it will show if any files actually would be transferred.

There is one final important aspect of the syncing which I did not yet cover, which is indicated by the –delete switch. The –delete switch is potentially dangerous but also powerful. It tells rsync to delete any files on the destination which are not on the source.

Think about this: If I’m at the cafe working and I decide to rename some documents, then when I get home I am going to sync spectra to my home machine. If I don’t use the –delete switch then it will sync the renamed copies of the documents, and the copies with the old names will still remain on the home system. This would not be good. So using –delete it will delete the non-renamed copies of the files and make sure that the contents of the edu directory on both machines is exact.

And of course if you haven’t set up Cygwin then you definitely should. (If you’re on a Mac then you don’t need Cygwin and can install rsync fairly easily in a console.) You will also need to have ssh (OpenSSH) setup on both systems since that is used for the file transfer.

It may seem cumbersome to many people to set this stuff up and learn how to use it, but there are certain basic operations related to information processing which it turns out are most efficient when done via a command line. This is related to the fact that we humans use words and that the way we perform functions on data is based on text.

Nothing is more powerful, flexible, and efficient as using rsync to keep directories in sync, and the benefit of knowing how to use it will extend into many other things.