Using Rsync to Transfer Data
Rsync is a data transfer tool. To use rsync you will be provide it with a source, and a destination. For each file in the source, rsync will first look to see if the file exists in the target. If not, then rsync will transfer a copy of the file to the target. If file does exist on the target, then rsync confirms that the files are the same. If not, then it determines the smallest amount of data that needs to be transferred to make the target file look like the source file. This makes rsync ideal for mirroring sets of files. Because rsync ignores files that are up to date it's also useful for resuming a transfer that might have gotten interrupted. However, this powerful piece of software can easily be used to create a mess, or damage files. If you wish to use rsync you should carefully read through this document to make sure you understand its proper use. It does also require a bit more familiarity with how files are stored, and accessed in UNIX. If you are still not comfortable with this, you should visit our New Users Guide which has some references for learning to work with UNIX. You might also refer to our UNIXary which defines some commonly used terms.
JUST TO SAY IT AGAIN, YOU MUST THINK WHILE/BEFORE USING THIS TOOL, AND MAKE SURE YOU ARE SYNCING THE RIGHT THINGS.
UNIX users (and Mac by use of Terminal.app) can use rsync by typing
rsync opts src tgt
src and tgt are the locations that you want to keep synced. The rule is that tgt will be made to look like src. There are several options. First, you can just use short names. For example,
rsync a b
will sync object a to object b where both a and b are in your current working directory. By object, it is meant either a file or a directory. Using the syntax above src = a, tgt = b, and opts = "" (ie, we are not providing any additional options to modify the behavior of rsync). If a, and b are both files, then b is updated to look like a. If b is a directory, then b/a is updated to look like a. Keep in mind that "updated" means "created" if it doesn't exist. If a is a directory, then rsync will fail because, by default, rsync will not transfer directories. You can get around this by using the -a option. In other words,
rsync -a a b
will transfer the directory a to become b. Typically, you will almost always be using -a, since it sets a number of other options which are used to help rsync mirror sets of files between two locations. With this use of rsync b/a is created or updated. If you use a/ as a source, then the contents of a are placed in b. You can also use the trailing slash with a target. For example, if a is a file, and you run
rsync -a a b
then either b (if it is a file), or b/a (if it is a directory) is updated. In the case that you want to guard against typos overwriting a file, you can use b/. Then if what you type is not a directory, rsync will refuse to transfer the file. To sum up
object meaning
a b sync object (file or directory) a to (file or directory b)
a/ b sync contents of directory a to b
a b/ sync a to directory b
a/ b/ sync directory a/ to b/
Syncing to Another Directory or a Remote Host
Until now we've just assumed that we are syncing files in the same directory. However, rsync will also sync files between directories. For example, if you have modified files in your home directory, you might want to make sure the changed files are in your personal directory in your lab folder. In this case you could run,
rsync -a a /share/FooLab/bar1234/ProjectFiles
which creates or updates /share/FooLab/bar1234/ProjectFiles/a. You can reverse this of course, and use rsync to update date your current working directory with source that is in another directory. rsync allows either the source or the target on a remote host. This allows you to sync files to a remote host using the syntax
rsync -a a remote_host:dir
which places the object a in the directory dir on the the host remote_host. If dir is a full path, then that is where the syncing is done. If it not, then the syncing is done in ~/dir where ~ is the directory your are logged in to when you access the remote system using ssh. If you don't provide a directory at all, then ~/a is synced.
Other Options
Besides -a to set up mirroring, rsync comes with a number of other options. For example, you can use --verbose to see the list of files that are being transferred. Over time you may find that there are files that are being transferred that you don't care about. For example, you might be using rsync to copy the contents of your home directory over. In this case, you may notice files being synced over that you don't care about. For example, I notice that when I rsync my home directory, cache files created by pages created by firefox. In this case using
--exclude=.mozilla/firefox/eog3nqtx.default/Cache
ensures that these files files don't care copied over. You can use --exclude multiple times, and if you have enough of them, you can put them into a file; then, you can use --exclude-from=FILE to tell rysnc to use the named file as a list of patterns to exclude. You may also want to use --delete which tells rsync to remove deleted files in the target since by default it doesn't do this. Another way to avoid rsyncing lots of junk is to only rsync the things you are interested in. You can either list of objects can be used in --include-from=FILE, or --files-from=FILE where FILE is again the file name. The difference is that the include file used in include-from can have wildcards (as can exclude-from). For more information about wildcards, run
man rsync
and read about it. Also, --dry-run, --list-only, and --itemize-changes are good ways to see what rsync is doing, or to test what it might do.
Rsync as a Backup Tool
CURRENTLY THIS SCRIPT IS KNOWN TO WORK ON MAC. IF YOU HAVE PROBLEMS ON OTHER SYSTEMS, LET US KNOW.
Also, before syncing the contents of your laptop onto your lab share, you should consult with your PI, and make sure that this is an acceptable use of the space they purchased.
CCBB users in the Bull, Cannatella, Chen, Hillis, and Hofmann labs can take advantage of this to keep their Linux or Mac laptops up to date. To do this first go into your lab's folder /share/PILab (replace PI with the name of your lab's head). In this folder, should be another folder with your username. This folder is accessible only by you, and by your PI. Once you are in your personal folder, make a directory called SYNC2FILES, and go into it. Finally, make a folder for the device that you want back up (eg, LAPTOP). Please refer to the files server portion of our docs for info on how to do this.
Next go to your home directory, and download our backup script Rsync backup script. Run the command,
tar xvf ccbb-rsync-backup.tar
which will extract the backup script, and create the following
~bin/sync2files
~/ccbb-rsync-backup/exclude
~/ccbb-rsync-backup/include
~/ccbb-rsync-backup/sync2files
(and others, but these are the ones that you will actually care about). First, use a text editor to edit ~/ccbb-rsync-backups/sync2files. At the top, you should provide your lab head, and your eid. Also, provide the folder name that you created above. These 3 things ensure that
Some caveats:
- This script is not 100% tested, so you will need to make sure that it is actually working. If there are problems, please mention them so that we can make fixes.
- This is not necessary if you are only using a cluster account, or another UNIX server that we maintain, because these are already being backed up. This script is meant only for people who want easy backups of their laptops without having to worry about dragging, and dropping specific files to the server.
- Much more needs to be added to the exclude file. For sure, we do not have the space for you to back up your personal music collection, iPhoto library, and other non-work related files. Also, we cannot store security sensitive files (see our ISORA page for more info), and in fact you should not be storing them on your laptop or desktop anyways. If someone can send us information as to where things like iTunes store their files, or where other files that can be safely ignore are stored, then we can make this process better by making OS specific exclude files.
Once you have your files ready to go, then you can type
~/bin/sync2files
provide your password.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.