Back Up Files on Linux

How hard is it to back up files on Linux? It's not very hard—but it's not very straightforward, either, and for every way that works, there are several that don't. Windows is the operating system that doesn't work well at the low-level nuts-and-bolts. Because Linux is the workhorse OS of the web and is known for its stability and ability to manipulate files, you'd think that there would be an easy way just to archive a bunch of files. But there isn't.

My first attempt was using 7-Zip, a high-compression, open-source, patent-free compression that works very nice on Windows. There is a Linux client available, and although it doesn't preserve Linux owner and group information, that didn't matter as much to me. I just wanted to make sure that I had a backup of all my important Linux web configuration files.

So I created a list of files that I wanted to back up, and named it backup.lst:

/etc/sysconfig/network-scripts/ifcfg*
/usr/local/apache2/conf/httpd.conf
/usr/local/ssl/certs/*
/usr/local/tomcat/conf/server.xml

Then I created a backup shell script, backup.sh:

#!/bin/bash
7z a -mx9 backup.7z @backup.lst

That seemed to work just like I wanted it to, until I looked inside backup.7z. 7-Zip had stored all the files in the archive without path information. There's a recent SourceForge feature request that suggests adding a "store full path" option to 7-Zip, but it's not there yet.

From reading several forum threads I figured that 7-Zip must require relative paths to store the path information, so I switched to relative paths:

etc/sysconfig/network-scripts/ifcfg*
usr/local/apache2/conf/httpd.conf
usr/local/ssl/certs/*
usr/local/tomcat/conf/server.xml

Then I changed the batch file to switch to the root of the file system before performing the backup:

#!/bin/bash
pushd /
7z a -mx9 backup.7z @backup.lst
popd

But then 7-Zip hit another snag. The path /user/local/tomcat is a soft link to the Tomcat path (which contains the Tomcat version number). Apparently 7-Zip cannot navigate links when using relative paths. So I tossed out 7-Zip and tried something else.

In Linux the preferred backup approach is to use tar to create an archive, and then use some other program to compress the archive. It appears that bzip2 is the best current compression algorithm, and indeed it compressed even better than 7-Zip. In addition, tar maintains the file owner information. So I brought back my original backup.lst file and changed my backup.sh script to contain:

tar cvf backup.tar -T backup.lst

What are all those errors? It turns out that tar file lists do not work with wildcards. Neither does piping in the contents of the file:

cat backup.lst | xargs tar cvf backup.tar

It turns out there is some obscure bash shell globbing that for some reason automatically expands wildcards:

tar cvf backup.tar $(cat backup.lst)

So (drumroll) the file batch file for the simple task of backup up files is below. Whatever kudos Linux gets for lean efficiency, it loses in its crypticness.

#!/bin/bash
tar cvf backup.tar $(cat backup.lst)
bzip2 -v backup.tar

Who knows why this shell script even works, and how would anyone besides an expert know about it?