What is the life of an Incident Manager?

Blogs· 5min April 5, 2023

David Macarthur gives us a peek into his life as an Incident Manager at Form3. He explains what incident management is, how to effectively prepare for incidents and the role of incident managers when incidents do occur. Investing in good Incident Management is essential to operating a good business and David shares his thoughts on how to do just that.

Introduction

My name is David Macarthur and I've been in Service Delivery for most of my working life. Specifically, I have been in Incident Management for over a decade as my speciality. I've worked with some of the big players in the space of outsourcing this specific service. So, for those who are not in this field I wanted to give a quick summary of what we do, how we do it and maybe even spark interest of someone who could pursue a career in Incident Management.

What is Incident Management?

Before we get into what the day to day is like, let's explore what Incident Management is.

An incident in its most simple terms is anything outside of business as usual (BAU) operations that causes impact.

Incident Management is how we respond to that incident. It’s a response, not a reaction. Reactions are in the moment; they can be emotionally charged and leave you having to make a plan on the fly. Responding involves following a plan that Incident Managers have in place for as many scenarios as possible.

Preparing for incidents

On a good day Incident Management is very much as most people’s office life:

  • We have meetings about things that have happened and things that are yet to happen.
  • We brainstorm ideas for process improvement. Incident Managers focus on what can go wrong and plan for a response to these possible failures.
  • Waiting is much of an Incident Managers job in a well-planned out company. When everything functions as it should, Incident Managers are able to invest time into predicting what could go wrong. We do this through careful analysis of previous incidents but also just through experience and spotting holes others may miss.

For someone not involved in the incident process it may seem perfectly fine to have just a help desk to phone with a vendor, but let's consider the following scenarios:

  • What happens when that line is down?
  • What happens if the person is unable to get the resources they need?

That’s what Incident Managers plan for. The what-if.

From a management perspective, someone may think they have a redundant system with an automated failover so there's no need for an engineer on-call in that scenario, but this approach has downsides too:

  • What if it doesn’t switch over automatically?
  • Does the engineer know how to do a manual failover?
  • Do engineers they know how to do it at 3am, blurry eyed having just woken up? What if they don’t?

These are the questions Incident Managers are there to ask. The answers to these questions on the days without Incidents build a strong foundation for when an incident does occur. More often than not, Incident Managers focus on building redundancy and seeing things from the perspective of an operational live service. If you ever see an Incident Manager seemingly doing nothing, know they are waiting, planning and ready.

Managing the incidents

On the days that incidents occur, Incident Managers have alerts to investigate. All of the preparation they have done kicks in and following the established plans minimise the consequences.

At Form3, this process looks like this:

  • Within minutes of an alert being triggered, an Incident Manager and an engineer are paged through PagerDuty and will join a Zoom bridge.
  • Our engineers are spectacularly talented individuals with a wealth of experience. Their focus is on finding the root cause of the incident and investigate possible mitigations.
  • The Incident Manager is free to become the voice of the customer on that call and focus on restoring service as quickly as possible.
  • The engineer on-call and Incident Manager work together to assess options and make mitigation decisions. Evidence based, knowledgeable decisions from a customer perspective. Incident Managers ask questions, lean on the expertise of our engineering and product colleagues, and we make a decision.

In Incident Management, we work with the entire company, across every product we deliver to ensure when something goes wrong, we are there, we have a plan, and we get it fixed as quickly as possible.

Conclusions

The cycle continues throughout the year. Following the days with Incidents and Alerts, Incident Managers review and analyse that response for any weak links to improve on. The more focus we put into this, the less incidents we have and the stronger the company gets.

Good Incident Management is a crucial part of any organisation and woe on any business who thinks they can take it for granted. One poorly managed incident can have knock-on effects for months to come, can impact a company’s reputation and even drive away a new customer.

A good company invests in good incident management and Form3 does just that!

Written by

github-icon
David Macarthur Incident Management Specialist

David is one of our Incident Managers at Form3. He has a focus on continual improvement in our processes across the whole company. He is also passionate about accessibility, diversity, inclusion and leadership.

You can find David on LinkedIn where he has several articles on various topics.