Friday, July 11, 2025

Absolutely Nothing is More Important than Testing Backups

Absolutely Nothing is More Important than Testing Backups

- Lessons Learned, Episode 34

Everyone in IT loves to give lip service to backups. We back up everything. We have backups of backups. And yet, MOST MSPs do a horrible job of taking care of the single most important thing you ever have to do with a client: Test the Backups!


If you've followed the series, you know that I have a long history of backing up a large variety of systems. One of my chores as an outsourced manager for HP's Roseville, CA plant was to manage the backup of all servers. This was a full backup every night - and the tapes went offsite for a year before they were used again. So, five days a week, that's 250 full backups per year.

Too much? Well, what do you think the cost of lost data would be at a plant with 5,000 technical people? If you said, "immeasurable," that's a good start.

As I moved into consulting with individual businesses, one rule has remained true for more than three decades: fifty per cent of all installed backups are not working. This is true today. This is true across all backup systems. This is true everywhere, all the time.

Why do backups fail? Well, because of three primary reasons.

1) Technicians set them up wrong

2) No one tests them

3) Stuff breaks

Let's start with "stuff breaks." You can't stop the fact that hardware fails, components fail, electricity goes out at the wrong time, Windows updates break scheduled programs, the employee who's supposed to swap media doesn't always do it, etc. Stuff happens. What you CAN do is to test the backups to see if they're working. 

You can do something about #1 by being very well trained on the most important technology you deploy. And #2 is inexcusable. Testing backups should be the first thing - the highest priority job - at every client, every month. 

Looking at a dashboard to see if the BDR self-reports a green light is NOT testing the backup. Looking at screenshots of successful automated self-tests is not testing the backups. 

A human being person who works for you needs to access a client's system, mount an image, and restore some data. If you use tape (as do Amazon, Google, and Microsoft), someone needs to restore from tape. If you backup to hard drives, someone needs to restore from hard drive. In all cases, someone needs to restore individual emails to an alternate location and verify success.

Every month.

Every client.

No exceptions.

And if you have technicians, I highly recommend that every technician be rotated through each client so that every technician has some experience restoring data from every client (or at least several clients). That way, no one is seeing something for the first time in an emergency.

I won't repeat the story here, but in the last real job I had before I became a consultant, we had a system failure that cost about $20 million for one day's downtime. See my blog post here: https://blog.smallbizthoughts.com/2024/07/one-piece-of-your-security-strategy-is.html. Given the hardware and backup systems of the day, this was the smallest possible outage. That's why companies have insurance.

Today, some clients can afford downtime of less than a day. For some, they can afford to be up within an hour. But everyone can afford to totally rebuild, because the backup systems are so much better.

For more than ten years, I have been shocked and amazed that any business ever pays for a ransomware incident. In my opinion, this is never necessary because they should have a complete backup at all times, and that system should be tested.

If you cannot restore a client from last night's backup, either you sold them the wrong thing, they failed to buy what your recommended, or you are failing to do your job. Your job does not end when the backup is installed. It does not end when the client pays their bill. The job ends when you have finished testing that backup by restoring data. And that has to be repeated every month.

Period.

The single most important thing you do as a technician is to test backups. If you're not doing this, you're not doing your job. You certainly are not providing managed services, and you should not call yourself a managed service provider.

Ultimately, backups don't fail; technicians fail.

Sales tip: Ask a prospect for a copy of the report they got this month showing that their backup was tested and is working. I have never met a small business owner who could show this to me. And it has opened many doors that led to full network assessments and new clients.

Fifty percent of all backups are failing right now. How are your clients doing? Prove it.

Feedback Welcome.

-----

All comments welcome.

-----

Episode 34

This Episode is part of the ongoing Lessons Learned series. For all the information, and an index of Lessons Learned episodes, go to the Lessons Learned Page

Leave comments and questions below. And join me next week, right here.

Subscribe to the blog so you don't miss a thing.

:-)


1 comment:

  1. One quibble-- I would have titled this "Absolutely Nothing is More Important than Testing Restores." In the end, no one pays for a backup. By itself, a backup is useless. People are paying for the ability to restore. Otherwise-- this bears repeating, again and again until network administrators and consultants actually believe it.

    ReplyDelete

Feedback Welcome

Please note, however, that spam will be deleted, as will abusive posts.

Disagreements welcome!