Table of contents
Blogs· 4min July 26, 2023
In this blogpost, David introduces us to the five W's of information gathering - Who? What? When? Where? Why? Answering the five Ws helps Incident Managers get a deeper understanding of the cause and impact of incidents, not just their remedy, leading to more robust solutions. Fixing the cause of an outage is only just the beginning and the five Ws pave the way for team collaboration during investigations.
If you're new to the ideas of Incident Management, you can read David's introductory blog post giving a quick intro to the day-to-day life of an Incident Manager.
One of the first things that comes to mind when an incident occurs is one question. Ok, we have a problem. How do we fix it? A perfectly valid and logical response, but not always the one that meets our goals in the most effective way as Incident Managers. Before we can even start to think about how we fix an issue we first have a series of rapid fire questions we need to process first.
Who? What? When? Where? Why?
Those are the questions we really need to answer before we can move on to how.
If we have no idea who is impacted by this issue, or what kind of expertise we are likely to need to fix it. How are we going to know the answer to our overall question of how do we fix it?
If we have no idea what is not working as we expect it to, if we have no idea what the impact is. How can we effectively communicate with our customers and how can we start working on a resolution?
If we can't see the point at which an issue started, it becomes very hard to identify a cause and a resolution. Some issue, such as say a high CPU or low memory warning can be just that, a warning. So, we need to know when is that warning going to become a problem. We need to be able to articulate that to our customers if needed. Likewise, we as a company have a duty of care. We need to acknowledge an engineer may be at the end of an on-call period and a handover may be more beneficial.
In infrastructure a blocker can be a high sign of where we can start looking for a fix. If we look at the flow of information we can start at the start, health check, but how long will that take? If we know where our stopping point is, our blocker. We can "skip" five steps and focus in on where we see in the logs that we need to check first.
Why can be one of the most effective initial tools in our arsenal. Perhaps the key question Incident Managers can at times forget, why can be seen as a Problem Management question. Why did it break? Well, that's in root cause analysis. Why can point an investigation in a direction very quickly because it can identify something out of the normal, something unusual and unexpected.
Why can lead us to the answers of so many questions and open our minds to find more questions that could have been overlooked by a rush. The Incident Management process should never be a rush, it should be a smooth process of decisive but deliberate choices. All our questions lead us to more questions, often more than we do answers. We must work together to choose which of these questions need answers, to find our priorities.
One of the huge benefits at Form3 is the tooling we have, but more than that, ones Incident Management has. It can be easy to think a technical tool for monitoring for example should be used by Development to provide information to Operations. Good dashboards and metrics that Incident Management have access to can answer most of these questions without an Incident Manager ever needing to ask an engineer. Our Incident Managers frequently rely on logz.io and Grafana as their sources of information.
When we work with Development in a DevOps environment, be that DevOps or SecDevOps, we give Incident Management more tools and more options so our engineers can focus on the fix, while we focus on the impact and the five Ws.
The five Ws can be the best place to start in an incident scenario but not all at once. We don't need answers to why, what, when and who to reach how, but it means we know the purpose of being there and as our decisions are made, they build the foundation of the best possible solution, the best way to restore service. I've seen so many people seek how with a need for instant gratification and sometimes it works, often however it creates removes the confidence of technical teams who must say we don't know.
I say, we don't know yet, but we have questions and that's the perfect start.
David is one of our Incident Managers at Form3. He has a focus on continual improvement in our processes across the whole company. He is also passionate about accessibility, diversity, inclusion and leadership.
You can find David on LinkedIn where he has several articles on various topics.
Blogs · 10 min
A subdomain takeover is a class of attack in which an adversary is able to serve unauthorized content from victim's domain name. It can be used for phishing, supply chain compromise, and other forms of attacks which rely on deception. You might've heard about CNAME based or NS based subdomain takeovers.
October 27, 2023
Blogs · 4 min
Patrycja, Artur and Marcin are engineers at Form3 and some of our most accomplished speakers. They join us to discuss their motivations for taking up the challenge of becoming conference speakers, tell us how to find events to speak at and share their best advice for preparing engaging talks. They offer advice for new and experienced speakers alike.
July 19, 2023
Blogs · 4 min
Andy recently wrote about what he considers to be the key ingredients for bootstrapping a new engineering organisation. Many of these ingredients are about what you use to build your organisation with. However, this post describes one key element of how you do it. This blog post describes one way of making decisions and designing new changes which scales well with a growing team, and includes everyone in the process.
June 28, 2023