Daily Monitoring of Client Machines
What is the "guts" of managed services? It's managing client systems. Monitoring, automated ticketing, patching, fixing, and applying updates. It's preventive maintenance. In my book Service Agreements for SMB Consultants: A Quick-Start Guide to Managed Services, I talk about building a "roll your own" monitoring system with Small Business Server and a few other tools. That system works very well. But whether you use a system like that or invest in Continuum, Level Platforms, LabTech, or something else, the daily monitoring is critical to delivering on the promises of managed service.
As with any other tools, you can set up all the alerts and monitoring you want. With the right combination you can even have tickets created automatically. But if you don't actively manage that whole system, then you're not really doing what you need to do.
Okay, so what does it mean to actively manage? Basically, it means that you check to make sure things are working. When they're not, you create service tickets. When problems keep recurring, you escalate the issue. Here's a simple checklist. I'm going to assume you have a remote monitoring and management tool (RMM). If you do not, then you will have to manually check each of these items.
Note on the Exchange "Monitor" folder: If you have reports emailed to your from Small Business Server or various backup programs, set the email to a mail-enabled public folder named Monitor. That way, you will have all such reports in one place and you can easily address them and then move the emails to the appropriate client folder.
I won't go into a lecture about how important backups are. You already know that.
[ Insert your personal rant about backups here. ]
Checklist: Daily Backup Monitoring of Client Systems
1. Check the "all in one" report on system backups
a. Open the Daily Backup Monitoring Record spreadsheet located on SharePoint at "Tech Documents\Daily Monitoring Record.xlsx" and update the previous night’s column with backup successes and failures.
2. Sort the Monitor mailbox in Exchange Public Folders and make necessary edits to the Daily Monitoring Record spreadsheet
a. Be sure to check to see if there were any old backup jobs marked with an ‘R’ and replace it with an ‘X’ or ‘O’ if they completed or failed.
b. Once documented, move all emails to their respective client folders so the Monitor folder is empty.
c. If there are backups that were not accounted for in both Monitor email box and the RMM portal, login to the server itself and check the backup software for the status of the backups. Update the Daily Monitoring Record with findings.
d. Create Service Requests (tickets) as needed and set them to the appropriate priority (see below).
e. If you have an outsourced help desk, add all necessary criteria to SRs so that the help desk shouldn’t have to ask for further information. Then assign all such tickets to the help desk to investigate.
3. Review all Backup related tickets in the PSA and move them forward
a. Check SRs to see if help desk is waiting for information or action by us.
b. If a backup job has failed more than once, adjust the priority as needed.
c. If a backup fails four days in a row and has been assigned to the outsourced help desk, take it back in-house and send an urgent email to the service manager.
Backup Jobs and Ticket Priorities
Note the SOP on setting ticket priorities: If a backup fails once, the ticket is created as Priority 3 (medium). If it fails twice in a row, you can leave it at P3 as we do, or you might decide to move to P2. If a backup fails three days in a row, it must be a P2 (high priority). See the discussion of setting priorities for service tickets.
You should have lots of backups (at least one per client), so do not let failed backup jobs go unattended! There are very few things that jump right to the top of your priority list: a failed backup is one!
Daily attention matters! When you track backups every day, you quickly learn the little quirks in each client system. This is particularly true when a specific piece of software mis-reports the results of a backup job. Grrrr. Annoying, but at least you know your system works even if the software is having problems.
It is extremely rare for a client to respect the importance of backups. Even though it's in their best interest, they just don't comprehend how critical backups are. Of course it's not their job to care: That's what they're paying you for.
Other Daily Monitoring
With your RMM tool, you should be monitoring all key functions on client machines. These are an automated, minute-by-minute version of the things you should be checking on in your monthly maintenance checklist (disc space, processor usage, stopped services, critical events, etc.). For a sample of things to monitor, download the latest version (68-Point Checklist version 2.0), at the White Papers page at SMB Books. No credit card required. Instant download in PDF format.
For the most part, daily monitoring of desktop machines is very basic. Other than virus updates and Windows updates, there's not much that needs to be monitored. You should be able to find (or create) a dashboard in your RMM so you can view all desktops/laptops with green, yellow, and red dots.
Servers should also be basic, but of course they're more important. Each server has at least one critical function, so you need to verify that that function is working, along with the basics of disc usage, services, etc. Again, a nice dashboard with lights goes a long way. See the graph.
Just as with backups, your daily once-over of the servers will keep you in touch with the weird stuff that somehow develops in some machines. You'd think that every SBS 2011 server with the same hardware, same patch level, and same hard drives would be the same . . . but Noooooooooo . . .
Time and Tickets
It is important that service tickets are created for any work to be performed. The task of daily monitoring is an administrative task and the time should be logged to an internal ticket. For each client task that needs to be performed, a separate client ticket should be created. If your systems are patched up as they should be, the daily monitoring should take no more than 15-30 minutes for 1,000 machines monitored. Do not let the tech get side-tracked into fixing things at this point and muddling up the time.Remember the mantra: All work is performed against a service ticket! That means the tech who does daily monitoring can create tickets, but must not work them at this time. Once daily monitoring is complete, any new tickets will be in the system, prioritized properly. Who knows? By the time the monitoring task is finished, the service manager may have already assigned some new tickets, or even worked them.
Anyway, daily monitoring is critical to keeping your fingers on the pulse of your clients' machines. It also allows you to create service tickets, work them, and close them before the client is aware that there was an issue. Just make sure they get a report each month telling them all the wonderful things you do without their knowledge.
Your Comments Welcome.
- - - - -
About this Series
SOP Friday - or Standard Operating System Friday - is a series dedicated to helping small computer consulting firms develop the right processes and procedures to create a successful and profitable consulting business.
Find out more about the series, and view the complete "table of contents" for SOP Friday at http://www.smallbizthoughts.com/events/SOPFriday.html.
- - - - -
Next week's topic: The Tech on Call for The Day - Managing Daily Workflow
[NOTE: This blog post updated October 2012 to reflect the new location of the 68-Point Checklist version 2.0.]
:-)
Check Out the Managed Services Operations Manual Four Volume Set The Managed Services Operations Manual by Karl W. Palachuk Over 1,100 pages - plus lots of juicy downloads |
Paperbacks - Ebooks - Audio Books Standard operating procedures, policies, and practical advice for IT consulting companies of all sizes. From the author of Managed Services in a Month. Learn More! |
Upon request, I have added a screen shot of the daily backup monitoring checklist.
ReplyDelete- kp