Quickly Transferring Files In Linux: Transfer files intellegently with RSync
This weekend I spent some time migrating my Minecraft server a good 700 miles closer to my house. The new server is ~30 milliseconds closer. If you play games, you probably know how huge that can be.
So, how did I move data to the new server? FTP, right? No. Definitely no.
Why not FTP?
FTP is plain-text
Unless you're using something like FTPS, you'll be sending your server credentials across the wire in plain text, so anyone sitting between you and your new server has all the info they need to login.
Per-file overhead can kill speed
My Minecraft server had just over 1000 files in the world save. Each file transfer over FTP adds a significant cost that shows itself as downtime. Not so cool.
No Deltas
If I use FTP I have to fully stop the server before sending any data across since there isn't a practical way for me to know if something changed after I take the first cut. That means even more downtime.
How about rsync?
As you can probably guess, we can avoid all of these problems with rsync. It has super lower per-file overhead, only sends files that have changed, and is usually transported over SSH1.
We'll copy our local ‘minecraft’ folder to our new server as jdoe at example.com with one quick command.
rsync -avz --progress minecraft jdoe@example.com:/home/jdoe
sending incremental file list
created directory test
minecraft/
minecraft/ForgeModLoader-server-0.log
1452252 100% 135.37MB/s 0:00:00 (xfer#1, to-check=1038/1040)
minecraft/ForgeModLoader-server-0.log.lck
0 100% 0.00kB/s 0:00:00 (xfer#2, to-check=1037/1040)
minecraft/ForgeModLoader-server-1.log
1525594 100% 76.57MB/s 0:00:00 (xfer#3, to-check=1036/1040)
...
sent 904047183 bytes received 18332 bytes 139087002.31 bytes/sec
total size is 903869415 speedup is 1.00
So, how did that strange line work?
Decomposing the Command
There's actually a lot going on in this short command. Let's start with -a. This one letter told rsync that we want to archive this folder, which amongst other things means keeping file ownership, permissions, and to recurse2.
Next was -v for verbose. That might seem a bit unimpressive after all that was packed into the previous argument, but without this argument, rsync sits with a blank screen until file transfer completes, letting you guess if everything is working properly.
Past this we've got -z in order to compress the data for transit. This increases CPU demand, but can greatly lower the amount of bandwidth required.
Finally, we've added –progress so that during file transfer we can see speed, time elapsed, and percentage complete. Like -v this is optional, but ends up being quite nice.
This brings us to the source for our file copy. For our case, it is our local folder name without a trailing slash. The lack of slash is important here. Since we didn't include a slash, the source folder is copied into the destination folder. With a slash, the contents of our minecraft folder would have been copied into our destination, without a parent ‘minecraft’ folder.
Lastly, we add our destination folder. In this case, we're taking the form ‘username@server:/filepath’. Also, since we're using this format, rsync knows to transport over SSH.
Copying the Delta
After getting the first cut of the data onto the server, I spent a bit of time verifying that everything was working. Ensuring that it was, it was time to copy any changes from the old server.
An important detail here is that not only can files be updated and created, they can also be deleted. Using FTP this would have caused us to require a brand new cut of data. In this case, we'll modify our previous command to only show differences.
rsync -avz --progress --delete --dry-run minecraft jdoe@example.com:/home/jdoe
sending incremental file list
minecraft/world/DIM1/region/
deleting minecraft/world/DIM1/region/r.0.0.mca
sent 27571 bytes received 114 bytes 55370.00 bytes/sec
total size is 902472679 speedup is 32597.89 (DRY RUN)
We've made two changes: –delete and –dry-run. We added –delete in order to delete any files on the target server not present on the source server. With that, we added –dry-run for sanity. This gives us a last chance to see what would happen if we were to run the command. Since a typo in the source or destination could cause data loss, I highly advise using a dry run before deleting.
In this case, everything looks correct, so we'll remove the –dry-run. A second or two later, our destination folder should be a mirror of our source folder.