Everyone is frustrated of traditional IT processes of using a ticketing system such as Jira to request access to systems. They are cumbersome and it's hard to verify that the person made the change exactly as requested in the ticket. Travis talks us through how at Teleport they have solved this problem by leveraging infrastructure as code to manage access to all internal systems. He talks us through how you can go about migrating to infrastructure as code and what are some of the gotchas you need to watch out for.
Travis Gary is running the IT department at Teleport. The access management tools that Teleport provide have been really important for companies going remote, who now have to embrace zero trust and change how they are doing their security.
Travis recently spoke at Conf42. His talk "Using Infra-as-code, not Jira tickets to pass audits" is all about moving off Jira ticket driven workflows to infrastructure as code. He will share more of his thoughts on this important topic with us in this episode
As a system administrator, begin with thinking about your current workflows. You might receive a ticket, go log into environment that needs changing and then action that change manually in the console. This is sometimes known as ClickOps.
This process doesn't scale for a large amount of changes or large systems, as you might see in fast growing organisations. Furthermore, making manual changes through the console is neither secure, nor repeatable. Changes are not reviewed, so there are no guarantees they are made correctly or that they won't break anything.
Infrastructure as code (IaC) allows you to describe processes, including errors and rollbacks, into code. It is not generally a programming language, but a simple language that allows you to describe your resources. Under the hood, it calls to backend APIs and endpoints to create and manage these resources.
It is important to remember that IaC is stateful. It brings resources back to the their described configuration and state, regardless of what has previously been done to them manually or otherwise. IaC is very well suited to IT processes - what's defined in the code is what will exist. This is really easy to review and audit, breaking the disconnect between what a ticket prescribes and reality.
In the IaC world, changes are made using pull requests and branches. This opens up the opportunity for automated testing of our changes, easing up the need for manual QA verification. All changes can then be reviewed by the necessary experts, but anyone can propose a change and submit it for review. The ability to propose changes without any gatekeeping is great for the developer agility of distributed teams.
Terraform is an example of an infrastructure as code tool. It allows a remote service to apply your changes. It can also generate a plan on an opened pull request, so you can see the changes that will be made, before they take place. This is another powerful mechanism for verification of current state vs proposed state.
Keeping your infrastructure definitions in code also allows you to check when and who made changes, as well giving you the possibility to search through changes. This makes it easier to work asynchronously, accross locations and timezones.
The migration process requires a cultural shift - you have to have full buy-in from your developers and consider the developer experience. A lot of the benefits of IaC migrations come at the tailend of the process so you really need support from your developers to go on the journey.
Making one change will often times be faster using ClickOps, but audits and security are much better in IaC. Initially, there might be a friction, but the gains will come as we shift left, making the process closer to the development cycle. The platform stability will improve with tighter control, making for quieter oncall rotas as well.
Comparatively to the cultural change, the technical changes are quite simple. Terraform is a declarative language that is relatively easy for engineers to pick up. From a security perspective, the burden moves from one space to another. Terraform still needs access to powerful credentials to be able to make changes to infrastructure.
At Teleport, the engineers have a "hack yourself" mentality so they have had engineers trying to play capture the flag games against infrastructure as code repositories. As you migrate to IaC, you need to consider that you are shifting security concerns from humans to the pipeline.
Generally, the aim of IaC is to remove all admin users from the system. This can make recovery when something goes wrong really difficult - the "break glass procedure" becomes difficult. One way Teleport has handled this is with an alerting pattern and admin roles. When something goes wrong, an incident is created and a limited amount of users can instantly take on the admin role to fix the platform.
Teleport run their own podcast titled "Access control podcast". It has some great episodes about IaC and security, so make sure to give it a listen if you liked this episode.
Here are some other resources that you might find interesting:
blogs · 4 min
Sam Owens joins us to tell us all about our approach to testing at Form3. He gives us an overview of our testing strategy, the different types of tests we run and explains how to use Pact for testing your services. Finally, he tells us why he prefers BDD style tests.
May 11, 2022
blogs · 7 min
Network Address Translation, forward proxies, and reverse proxies, are three common techniques for managing network traffic at scale. This blog will attempt to distil each idea into its simplest form, and write a code example where possible
May 6, 2022
blogs · 8 min
Exposing pools of machines to clients, or routing network traffic via an intermediary, are common techniques in distributed computing, and large networks. Network Address Translation, forward proxies, and reverse proxies, are three common techniques for managing network traffic at scale. However, I've always found each of these topics to be somewhat mystical, and I've never understood the fundamentals of how each technique works. This blog post will attempt to distil each idea into its simplest form, and write a code example where possible.
May 4, 2022