At this particular office, the backup system was a DAT tape drive. In case you're not familiar, the DAT tape cartridges are just a couple inches wide. We always tried to create backup systems so that a full backup fit onto one cartridge.
That way, we didn't have to rely on humans to remember to put in a second tape in the morning and then swap tapes again in the afternoon. One tape per day. One full backup per tape.
We also had a tape rotation system that dramatically improved the reliability of restores if need be. Generally speaking, we tried to keep about ten tapes in rotation, and remove one from the rotation permanently at the end of the month. That way, we had several restore points and twelve "permanent" offsite restore points in case we had to go back more than a few weeks.
I'm glad that the days of tape backup for the smallest businesses are behind us. Too many opportunities for human error (designing the backup system, implementing it, running it every day, testing it). It's also slow.
BUT I am a huge fan of tape. As I've always said, it is the most reliable and robust backup you will ever find. Tape is nearly indestructible. At the end of the day, tape isn't the problem: People are the problem.
Anyway . . .
One day the server crashed at AAA. I forget all the details, but we had to restore from backup. Lisa, the office manager, had been given the task of taking tapes home every night for safe storage. And, at the end of the month, she took one offsite for permanent storage. Of course these were all labeled.
My faith in our backup systems is absolute. We've never failed to restore 99-100% of the data from a backup that we designed, implemented, and maintained. So I was very confident when I told Lisa, "Remember all those tapes you've been taking home? This is the day we need them. Please go bring in every tape from the last year."
We did whatever we needed to do to prep the server so we could restore from tape and Lisa went home to get the tapes. She brought them to us in a one-gallon zip-lock baggie.
It was dripping water. The bag had water on the inside. All the tapes were wet!
I asked what happened. She said that she knew they were important and she stored them with all of her most important documents - in the freezer!!!
Okay. You gotta work with what you have.
So, we opened the door on each cartridge and shook out as much water as we could. Then we laid them out on paper towels. And tried to read the smeared labels. We put them in the best chronological order we could.
The plan was pretty simple:
1) Remove the tape drive so that it's connected but physically outside the server. That way, water inside the drive will stay inside the drive.
2) Restore as much data as we could.
3) Order a new (and larger capacity) tape drive.
4) Find a better place to store backup tapes.
The result: 100% success! The client lost work from the day of the crash, but we recovered 100% of the data from the wet backup tapes.
I swear: True story, except the names were changed.
Lesson learned: Improve our training process with regard to offsite storage of backups. You can never guess what clients are going to do, so remove as much ambiguity as you can from every process they're involved in.
And of course this improved our training across all other clients as well.
We created a one-time task of finding out where all of our clients were actually putting those take-home tapes. Some of them needed to make changes. No one else was keeping them in the freezer.
:-)
I love how you've stated a "lesson learned" from this story, Karl. Amazing! Many people would simply blame the client. I love your attitude towards this! :-)
ReplyDelete