I have tons of archives. Mainly graphics files that I have created in the past or the some 15,000 photos that I have taken with my D70 since buying it. Not to mention the thousands of photos that I took before the D70. Then I also have my scans. I scan almost every piece of paper that comes into my house. Bills, receipts, and the like. (I will post an article about that some day.)
I like to keep everything accessible on a shared server so that I can easily get to them. But all of this is in the 100+ Gig range. It is easy enough to store all that somewhere. But the problem is backup.
Sure I can split the archive into 4.5G chunks and write it to dvd, but how do I know it worked. MD5SUM you say? Yes.
In the past I have used programs like tripwire or aide to do server integrity sweeps. Why not use one of them for making sure that when I backup my data that it really is saved. Well, the problem is that those are pretty rigid and have config files stored in /etc/blah. They are meant more for intrusion detection. Not really what I need.
I could write a script that recurses directories and stores the md5sum along with each file name. Then after making a backup I could run those again recursively against the stored data. Not good again. That would require me to have to write software. That is a bad idea. Anything that creates real work for me is a bad idea.
Enter md5deep. After searching for a few hours and looking at all the integrity programs I stumbled upon md5deep. Wow. It is just what I am looking for. It allows me to do something like this:
md5deep -r /etc >/tmp/file.database.txt <- this recurses the directory and generates a flat text file with all the md5sums.
md5deep -r /etc -X /tmp/file.database.txt <- this reads the database and recurses the directory telling me if any files don't match the database.
Couldn't be simpler.
To test it I will run it on my photo archive. Then make a backup to dvd-r. Run the checker and see if what I think is on them is on them.