.tech Podcast - The core components of container systems

Blogs· 4min May 17, 2023

Michael Kerrisk is a Linux expert and trainer. He joins us to explain what containers are and deep dive into the four core components of containers: namespaces, capabilities, cgroups and seccomp. He also draws parallels on how they are used by Docker to power container systems as we know them today.

Michael Kerrisk is a Linux expert and runs a Linux System Programming course, which is a very popular course for Form3 engineers. He started working with UNIX, the predecessor of Linux, and has used this knowledge in his Linux courses. Linux was roughly a re-implementation of the UNIX kernel that had been written more than 20 years before at Bell Laboratories. His primary area of focus is not the kernel internals, but the kernel interface that it presents to the world, which is the same as classical UNIX. Michael has always had a passion for teaching, having spent years as a university teacher, before starting his corporate career. He joined the Training department of a previous employer and started delivering the system programming course for them. This was the ideal job for him, as it brought together the two things he was passionate about: UNIX and teaching. He is also the author of "The Linux Programming Interface", which is a a detailed guide and reference for Linux and UNIX system programming.

The container illusion

Michael explains that a container is an illusion. It's an illusion that for a group of processes that are on a system, there is no one else on this same system. The containers think they are the only processes on the system and get private resources that appear to be visible only to them. This concept of isolation is fundamental to containers.

They also provide a set of standards that make it possible to develop a runtime for a container and deploy the unit anywhere that supports the runtime. Due to these standards, someone producing a container can deliver it anywhere that can run that runtime.

This is specifically what the Docker idea is about, but containers also have more general uses. We can also freeze, restore and migrate containers. The Open Container Initiative (OCI) sets the standards for container interfaces across cloud providers and services.

Core components of container systems

Michael explains there are four components to container systems. He delves into each component in detail.

Namespaces

A namespace is the most important part of the isolation provided by containers. They isolate a global resource to make it appear to a group of processes that they have a private instance of that resource. For example, the uts namespace provides isolation for hostname, making it possible for every container to have and broadcast its own hostname. There are other important namespaces, such as the mount namespaces which provide isolation of the mount list, making it possible for processes to see different sets of mounted file systems.

Capabilities

The motivation for creating capabilities is due to the coarse privilege model of UNIX. In this model, superusers can bypass limitations and rules, but regular users must abide by the rules. There is no extra way to grant to grant subsets of permissions.

The general concept of capabilities is to allow the creation of programs that are less powerful than root programs. The power of super-users is split into 41 capabilities. The ability to limit the power granted to programs also allows us to mitigate the risk to our systems in the case that they get compromised, as the attacker has less power to do damage.

User namespaces combine the powers of isolation and elevated privilege. They allow us to grant capabilities only inside an isolated container, but not outside it. This means that they can only perform elevated privilege actions on the resources that are governed by their container. For example, we can mount file systems or add network infrastructure only inside the container they have been granted capabilities for.

cgroups

cgroups serve the purpose of measurement and limitation of the usage of various kinds of resources. For example, we can limit the use of memory and CPU. They allow us to set shared limits for groups of processes, which was not possible with the old mechanisms available in UNIX. Sharing limits is helpful, as applications typically constitute of multiple processes.

We can also set up limits hierarchically through parent-child relationships between cgroups. This is especially useful as we now have the idea of containers inside containers. This is exactly how Docker resource constraints are implemented.

seccomp

The general idea behind seccomp is that the kernel provides about 400 system calls, but most applications only make use of a tiny subset of these. seccomp allows us to set up sandbox limitations for the system calls that are available to our processes. This allows us to limit what programs can do and mitigate the risk of damage that an attacker can do in the case that the process is compromised.

By default, common system calls are allowed and disallows around 30-40 potentially dangerous system calls. We can create Docker seccomp security profiles to modify the default security profile.

Docker builds on these four core components, abstracting away the complexities. Gaining a deeper understanding of the underlying mechanisms can help us gain a deeper understanding of the behaviour or our systems. Michael's training courses allow you to do just that.

Written by

github-icongithub-icongithub-icon
Adelina Simion Technology Evangelist

Adelina is a polyglot engineer and developer relations professional, with a decade of technical experience at multiple startups in London. She started her career as a Java backend engineer, converted later to Go, and then transitioned to a full-time developer relations role. She has published multiple online courses about Go on the LinkedIn Learning platform, helping thousands of developers up-skill with Go. She has a passion for public speaking, having presented on cloud architectures at major European conferences. Adelina holds an MSc. Mathematical Modelling and Computing degree.