The perfect backup
I think the perfect backup (in any operating system) has to satisfy at least these few requirements.
Complete automation
No user action should be needed after the first setup. Otherwise the "quality" of the backup would depend on the user attention and devotion to that, which is unacceptable.
Physical separation of the media
The backup has to stay on a medium that is completely different and far from the one in which the original data are. The meaning of "far" depends on the level of reliability you want for your backup:
- you want to be safe against accidental file deletion? A separate folder or better a separate partition of your hard disk is enough
- you want to be safe against hardware failure? Then you need at least a separate hard disk for your backup
- you want to be safe against a general hardware failure (electric surge) or against the even of your computer being stolen? Then an external hard disk is great (suggested solution)
- note that having a computer and its backup on an external hard disk doesn't guarantee any backup in case of thieves in your house or other accidental events (flooding, fire, general electric network surge/failure). Then you may want to keep your backup in another room / building (at work, for example)
- you want to be safe against major geographically-located events (big flooding, earthquakes, war)? Banks do, for example. Then you need a backup somewhere else in the world, for example on a server in another country.
You may think this is beyond your needs, and in fact you're probably ok with an external hard disk on your desk. Still, it depends on the importance of your data: you don't want to loose your almost-done PhD thesis only because one night thieves broke in your room and took both the laptop and the pocket hard disk...
Easy restore
Restoring from a backup must be the easiest thing ever! In particular you should have the option of restoring only some files, together with the options of just having everything restored in one shot.
Plus, there should be no need of any software to restore! You don't want your backup to depends on a software house, or to find out that your 5-old backup is not accessible any more because that software is not available for your new OS. (ask Win98 users if they can access in WinXP the backups they did with the official-Microsoft-supported backup utility in those years...).
Backup with rsync
The main tools that allows to do what I think is the perfect backup is RSync, an open source utility that provides fast incremental file transfers. The reason why this utility is so useful, is that it allows to copy the content of a series of folder into another location while actually transferring only the few files that have changed from the last "synchronization".
There is a good article that explains how to use this utility to backup your files. The key idea is to regularly make a whole copy of your data into your backup location. Rsync makes this copy fast thanks to the feature I just said.
Actually one can do even better: that is, keeping a series of "snapshots" of his/her data, for example one every week, so that not only data is backed up regularly, but older versions are available for you to restore some data.
One may thing that keeping a series of weekly snapshots is very expensive in terms of needed space, but it's not! Exploiting the hard link feature of the Ext3 filesystem, you only need the amount of space of the files that have changed from a snapshot to the next. An the backup you get is nothing more than a series of folders, containing all your data at different times in the past. Hard links are completely transparent to the user.
All these things are done in an automatic and consistent way by a wonderful, open source utility: rsnapshot.
Rsnapshot
Rsnapshot is the opensource utility that implements what I think is the perfect backup. What it does is to automatically take snapshots of your data at the desired frequency (for example every week). It then stores this snapshot on your backup medium in a fast way thanks to rsync.
All the snapshots you've taken in the past are then accessible as regular folders, even if they actually do not take all the space of a series of copies - only the differences are actually recorded on the medium.
Plus, there is the space-saving features of lowering the frequency of backups in the past. For example you can take weekly snapshots, saving the last 4 of them, and then keep only one snapshot for month for older backups. Rotating of backups is done automatically.
Another important feature is that you can access, and then backup, not only your hard drive folders, but also shared folders on another machine (even a Windows machine), remote folders, etc...
Last but not least, to access your backups you only need a computer that is able to read the Ext3 filesystem, that is almost any Linux computer, but also any Windows computer with the appropriate, free, driver.
The only thing one has to spend some time on, is setting up rsnapshot. Documentation is widely available on the official website (where a HOW TO is present too), together with a lot of unofficial guides out there on the net. Here I just want to copy-and-past my configuration, so that one can use it if it fits his/her needs. By the way, the Linux distribution I'm using is ubuntu, but most of the things should be linux-wide compatible.
The main (only) configuration file for rsnapshot is /etc/rsnapshot.com. It is commented in details, so you should have no problems in configuring it just going through the lines (after you've read the short guide on their website). It is also divided in sections, so I will just post here the section that I configured to fit my needs.
Snapshot root directory
# All snapshots will be stored under this root directory.
snapshot_root /media/LACIE_ext3/
# If no_create_root is enabled, rsnapshot will not automatically create
# the snapshot_root directory. This is particularly useful if you are
# backing up to removable media, such as a FireWire or USB drive.
no_create_root 1
I think this setup lines are straightforward and well explained by the comments. /media/LACIE_ext3 is my Ext3 partition on my external LaCie hard disk.
External program dependencies
Unchanged.
Backup intervals
# The interval names (hourly, daily, ...) are just names and have no
# influence on the length of the interval. The numbers set the number
# of snapshots to keep for each interval (hourly.0, hourly.1, ...).
# The length of the interval is set by the time between two executions
# of rsnapshot <interval name>, this is normally done via cron.
# Feel free to adapt the names, and the sample cron file under
# /etc/cron.d/rsnapshot to your needs. The only requirement is that
# the intervals must be listed in ascending order.
interval weekly 4
interval monthly 36
This section configures the structure of backups in time. In this case I decided to have 4 backups in the past, spaced by one week (that is, covering the last month). When backups become older than a month, only one backup per month is kept. If for example I look at my backups today, February 10th 2008, they look like
drwxr-xr-x 3 root root 4096 2008-02-03 11:03 weekly.0
drwxr-xr-x 3 root root 4096 2008-02-03 10:05 weekly.1
drwxr-xr-x 3 root root 4096 2008-01-28 21:04 weekly.2
drwxr-xr-x 3 root root 4096 2008-01-27 00:04 weekly.3
drwxr-xr-x 3 root root 4096 2008-01-22 20:18 monthly.0
drwxr-xr-x 3 root root 4096 2008-01-06 13:04 monthly.1
drwxr-xr-x 3 root root 4096 2007-12-07 16:03 monthly.2
drwxr-xr-x 3 root root 4096 2007-11-05 15:04 monthly.3
drwxr-xr-x 3 root root 4096 2007-10-29 10:04 monthly.4
drwxr-xr-x 3 root root 4096 2007-10-08 20:04 monthly.5
drwxr-xr-x 3 root root 4096 2007-09-02 10:03 monthly.6
drwxr-xr-x 3 root root 4096 2007-08-05 17:34 monthly.7
...
Pay attention: the name you give to every backup intervals have no consequence on the frequency of backups! It's just a reminder for you. The actual frequency of backups depends on how you schedule the backup utility to run (see below).
The only effect of this setup is that when you call
rsnapshot weekly
then the backup utility rotate all the weekly backups, deleting the last one (weekly.3), renaming the others (weekly.1 becomes weekly.2 and so on) and creating a new weekly.0 with today's backup. When you call
rsnapshot monthly
then the backup utility rename all the monthly backups (monthly.1 becomes monthly.2 and so on) and renames the last weekly backup (weekly.3) into monthly.0.
Global options
Unchanged.
Backup points / scripts
# LOCALHOST
backup /home/saverio/ localhost/
backup /etc/ localhost/
This section tells which folders you want to backup (in this case, my own home directory and the /etc directory). The second argument of each line specifies the parent directory in the backup media. Pay attention to the trailing slashes!
Scheduling the backup
The next important step to setup your backup is to schedule the execution of the backup command. Commands in linux are scheduled by some daemons (programs running in background). The scheduler I used, available in Ubuntu, is cron. Under the /etc directory there are a series of folders related to cron:
/etc/cron.d
contains a series of files, one per scheduled job
/etc/cron.daily
contains a series of executable files to be executed daily
/etc/cron.weekly
contains a series of executable files to be executed weekly
and so on. The solution I adopted is to add a file in the cron.d directory. It's a file called /etc/cron.d/rsnapshot with an only line in it:
3 * * * * root /usr/bin/nice /home/saverio/bin/rsnapshot_weekly
which, according to the crontab standard, says that the command rsnapshot_weekly has to be executed every hour, when minutes count 03 (that is at 9.03, 10.03, 11.03, etc...). The command is executed with root privileges and runs in the background (see "man nice").
Let's now give a look to the rsnapshot_weekly command in my /home/saverio/bin directory. It's a script I made:
#! /bin/bash
# Run a backup script every sunday or if a Sunday has passed
# without backup.
# Every month it also run the monthly backup.
a_week=7
a_month=31
timestamp_file_weekly=/home/saverio/.backup_timestamp_weekly
log_file=/home/saverio/.backup.log
date_now=`date "+%Y%m%d"`
day_of_week=`date "+%u"`
time_now=`date "+%s"`
date >> $log_file
if [ -e $timestamp_file_weekly ]; then
date_of_backup=`stat --format=%y $timestamp_file_weekly |
... tr -d - | awk '{print $1}'`
time_of_backup=`stat --format=%Y $timestamp_file_weekly`
echo last weekly backup $date_of_backup >> $log_file
if (( $day_of_week % 7 >= $time_now / 86400 - $time_of_backup /
... 86400 )); then
echo "no Sunday is passed without doing a backup, doing
... nothing" >> $log_file
exit 0;
fi
echo "a Sunday is passed without doing a backup!" >> $log_file
if (( $date_now - $date_of_backup >= $a_month )); then
echo "a new month started: time to do a monthly backup" >>
...$log_file
/usr/bin/rsnapshot monthly && echo "monthly backup: DONE!" >>
...$log_file
fi
fi
echo "let's do a weekly backup" >> $log_file
date > /home/saverio/.today
/usr/bin/rsnapshot weekly && touch $timestamp_file_weekly && echo
..."weekly backup: DONE!" >> $log_file
I've added dots to break long lines, so don't copy and paste it because it won't work. You can download the script here.
What this script does is to run a weekly backup every Sunday. It correctly deals with all the cases in which, for example, the external hard disk is not connected to the laptop. Moreover, it deals with the possibility that one Sunday I don't turn on the computer, or I never connect the external hard disk. In this case, the script will do the backup the first time it founds the external hard disk connected in the following days.
It also launches the monthly backup (which is not a backup, it's just the action of rotating old backups) every first Sunday of the month.
You don't have to go through all the code of the script, you just have to set two options in the script: the timestamp_file_weekly which is an empty file that is only used to keep track of the last backup. You can create this empty file with
touch timestamp_filename
The second parameter you have to enter in the script is a path to a log_file, where a detailed log is kept (see below).
Logging
The script file I wrote created a detailed log of all its actions to allow debugging. This is an example of log file on my machine:
Sat Feb 2 23:03:01 CET 2008
last weekly backup 20080128
no Sunday is passed without doing a backup, doing nothing
Sun Feb 3 10:03:01 CET 2008
last weekly backup 20080128
a Sunday is passed without doing a backup!
a new month started: time to do a monthly backup
monthly backup: DONE!
let's do a weekly backup
Sun Feb 3 11:03:01 CET 2008
last weekly backup 20080128
a Sunday is passed without doing a backup!
a new month started: time to do a monthly backup
monthly backup: DONE!
let's do a weekly backup
weekly backup: DONE!
Sun Feb 3 12:03:01 CET 2008
last weekly backup 20080203
no Sunday is passed without doing a backup, doing nothing
Every entry start with date and time, and you can see as backups are correctly performed. As the log file grows indefinitely (slowly, as it's a text file), you have to deal with that. The easiest way is to use the logrotate utility already available in Ubuntu (and other Linux distributions). Basically, it's an utility that runs periodically and compresses old log files, keeping them small in size and deleting them when they are old. All you have to do to tell logrotate to manage your backup log is to add the file /etc/logrotate.d/backup with the following lines in it
/home/saverio/.backup.log {
rotate 6
monthly
compress
missingok
}
which tells logrotate to rotate the log files monthly, keeping the last 6 logs and compressing them with gzip. No error is given if a log file is missing, for example because it has been deleted by the user.
That's it! You have your system being backed up regularly and automatically!
I hope you liked this page!
If you think it did help you, why don't you buy me a spritz ? ;-)

