Everyone is frustrated of traditional IT processes of using a ticketing system such as Jira to request access to systems. They are cumbersome and it's hard to verify that the person made the change exactly as requested in the ticket. Travis talks us through how at Teleport they have solved this problem by leveraging infrastructure as code to manage access to all internal systems. He talks us through how you can go about migrating to infrastructure as code and what are some of the gotchas you need to watch out for.
Travis Gary is running the IT department at Teleport. The access management tools that Teleport provide have been really important for companies going remote, who now have to embrace zero trust and change how they are doing their security.
Travis recently spoke at Conf42. His talk "Using Infra-as-code, not Jira tickets to pass audits" is all about moving off Jira ticket driven workflows to infrastructure as code. He will share more of his thoughts on this important topic with us in this episode
As a system administrator, begin with thinking about your current workflows. You might receive a ticket, go log into environment that needs changing and then action that change manually in the console. This is sometimes known as ClickOps.
This process doesn't scale for a large amount of changes or large systems, as you might see in fast growing organisations. Furthermore, making manual changes through the console is neither secure, nor repeatable. Changes are not reviewed, so there are no guarantees they are made correctly or that they won't break anything.
Infrastructure as code (IaC) allows you to describe processes, including errors and rollbacks, into code. It is not generally a programming language, but a simple language that allows you to describe your resources. Under the hood, it calls to backend APIs and endpoints to create and manage these resources.
It is important to remember that IaC is stateful. It brings resources back to the their described configuration and state, regardless of what has previously been done to them manually or otherwise. IaC is very well suited to IT processes - what's defined in the code is what will exist. This is really easy to review and audit, breaking the disconnect between what a ticket prescribes and reality.
In the IaC world, changes are made using pull requests and branches. This opens up the opportunity for automated testing of our changes, easing up the need for manual QA verification. All changes can then be reviewed by the necessary experts, but anyone can propose a change and submit it for review. The ability to propose changes without any gatekeeping is great for the developer agility of distributed teams.
Terraform is an example of an infrastructure as code tool. It allows a remote service to apply your changes. It can also generate a plan on an opened pull request, so you can see the changes that will be made, before they take place. This is another powerful mechanism for verification of current state vs proposed state.
Keeping your infrastructure definitions in code also allows you to check when and who made changes, as well giving you the possibility to search through changes. This makes it easier to work asynchronously, accross locations and timezones.
The migration process requires a cultural shift - you have to have full buy-in from your developers and consider the developer experience. A lot of the benefits of IaC migrations come at the tailend of the process so you really need support from your developers to go on the journey.
Making one change will often times be faster using ClickOps, but audits and security are much better in IaC. Initially, there might be a friction, but the gains will come as we shift left, making the process closer to the development cycle. The platform stability will improve with tighter control, making for quieter oncall rotas as well.
Comparatively to the cultural change, the technical changes are quite simple. Terraform is a declarative language that is relatively easy for engineers to pick up. From a security perspective, the burden moves from one space to another. Terraform still needs access to powerful credentials to be able to make changes to infrastructure.
At Teleport, the engineers have a "hack yourself" mentality so they have had engineers trying to play capture the flag games against infrastructure as code repositories. As you migrate to IaC, you need to consider that you are shifting security concerns from humans to the pipeline.
Generally, the aim of IaC is to remove all admin users from the system. This can make recovery when something goes wrong really difficult - the "break glass procedure" becomes difficult. One way Teleport has handled this is with an alerting pattern and admin roles. When something goes wrong, an incident is created and a limited amount of users can instantly take on the admin role to fix the platform.
Teleport run their own podcast titled "Access control podcast". It has some great episodes about IaC and security, so make sure to give it a listen if you liked this episode.
Adelina is a polyglot engineer and developer relations professional, with a decade of technical experience at multiple startups in London. She started her career as a Java backend engineer, converted later to Go, and then transitioned to a full-time developer relations role. She has published multiple online courses about Go on the LinkedIn Learning platform, helping thousands of developers up-skill with Go. She has a passion for public speaking, having presented on cloud architectures at major European conferences. Adelina holds an MSc. Mathematical Modelling and Computing degree.
Here are some other resources that you might find interesting:
Blogs · 10 min
A subdomain takeover is a class of attack in which an adversary is able to serve unauthorized content from victim's domain name. It can be used for phishing, supply chain compromise, and other forms of attacks which rely on deception. You might've heard about CNAME based or NS based subdomain takeovers.
October 27, 2023
Blogs · 4 min
In this blogpost, David introduces us to the five W's of information gathering - Who? What? When? Where? Why? Answering the five Ws helps Incident Managers get a deeper understanding of the cause and impact of incidents, not just their remedy, leading to more robust solutions. Fixing the cause of an outage is only just the beginning and the five Ws pave the way for team collaboration during investigations.
July 26, 2023
Blogs · 4 min
Patrycja, Artur and Marcin are engineers at Form3 and some of our most accomplished speakers. They join us to discuss their motivations for taking up the challenge of becoming conference speakers, tell us how to find events to speak at and share their best advice for preparing engaging talks. They offer advice for new and experienced speakers alike.
July 19, 2023