Security is usually a matter of trade-offs. Questions like: "Is X Secure?", don't often have direct yes or no answers. A technology can mitigate certain classes of risk even as it exacerbates others.
Containers are just such a recent technology and their security impact is complex. Although some of the common risks of containers are beginning to be understood, many of their upsides are yet to be widely recognized. To emphasize the point, this post will highlight three of advantages of containers that sysadmins and DevOps can use to make installations more secure.
To give this discussion focus, we will consider an example application: a simple imageboard application. This application allows users to create and respond in threads of anonymous image and text content. Original posters can control their posts via "tripcodes" (which are basically per-post passwords). The application consists of the following "stack":
- nginx to serve static content, reverse proxy the active content, act as a cache-layer, and handle SSL
- node.js to do the heavy lifting
- mariadb to enable persistence
The Base Case
The base-case for comparison is the complete stack being hosted on a single machine (or virtual machine). It is true that this is a simple case, but this is not a straw man. A large portion of the web is served from just such unified instances.
The Containerized Setup
The stack naturally splits into three containers:
- container X, hosting nginx
- container J, hosting node.js
- container M, hosting mariadb
Additionally, three /var locations are created on the host: (1) one for static content (a blog, theming, etc.), (2) one for the actual images, and (3) one for database persistence. The node.js container will have a mount for the the image-store, the mariadb container will have a mount for the database, and the nginx container will have mounts for both the image-store and static content.
Advantage #1: Isolated Upgrades
Let's look at an example patch Tuesday under both setups.
The Base Case
The sysadmin has prepared a second staging instance for testing the latest patches from her distribution. Among the updates is a critical one for SSL that prevents a key-leak from a specially crafted handshake. After applying all updates, she starts her automatic test suite. Everything goes well until the test for tripcodes. It turns out that the node.js code uses the SSL library to hash the tripcodes for storage and the fix either changed the signature or behavior of those methods. This puts the sysadmin in a tight spot. Does she try to disable tripcodes? Hold back the upgrade?
The Contained Case
Here the sysadmin has more work to do. Instead of updating and testing a single staging instance, she will update and test each individual container, promoting them to production on a container-by-container basis. The nginx and mariadb containers suceed and she replaces them in production. Her keys are safe. As with the base case, the tripcode tests don't succeed. Unlike the base case, the sysadmin has the option of holding back just the node.js's SSL library and the nature of the flaw being key-exposure at handshake means that this is not an emergency requiring her to rush developers for a fix.
Of course, isolated upgrades aren't unique to containers. node.js provides them itself, in the form of npm. So---depending on code specifics---the base case sysadmin might have been able to hold back the SSL library used for tripcodes. However, containers grant all application frameworks isolated upgrades, regardless of whether they provide them themselves. Further, they easily provide them to bigger portions of the stack.
Containers also simplify isolated upgrades. Technologies like rubygems or python virtualenvs create reliance on yet another curated collection of dependencies. It's easy for sysadmins to be in a position where they need three or more such curated collections to update before their application is safe from a given vulnerability. Container-driven isolated upgrades let sysadmins lean on single collections, such as Linux distributions. These are much more likely to have---for example---paid support or guaranteed SLA's. They also unify the dependency management to the underlying distribution's update mechanism.
Containers can also make existing isolation mechanisms easier to manage. While the above case might have been handled via node.js's npm mechanism, containers would have allowed the developers to deal with that complexity, simply handing an updated container to the sysadmin.
Of course, isolated upgrades are not always an advantage. In large-use environments the resource savings from shared images/memory may make it worth the additional headaches to move all applications forward in lock-step.
Advantage #2: Containers Simplify Real Isolation
"Containers do not contain." However, what containers do well is group related processes and create natural (if undefended) trusts boundaries. This---it turns out---simplifies the task of providing real containment immensely. SELinux, cgroups, iptables, and kernel capabilities have a---mostly undeserved---reputation of being complicated. Complemented with containers, these technologies become much simpler to leverage.
The Base Case
A sysadmin trying to lock-down their installation in the traditional case faces a daunting task. First, they must identify what processes should be allowed to do what. Does node.js as used in this application use /tmp? What kernel capabilities does mariadb need? The need to answer these questions is one of the reasons technologies such as SELinux are considered complicated. They require a deep understanding of the behavior of not just application code, but the application runtime and the underlying OS itself. The tools available to trouble-shoot these issues are often limited (e.g. strace).
Even if the sysadmin is able to nail down exactly what processes in her stack need what capabilities (kernel or otherwise) the question of how to actually bind the application by those restrictions is still a complicated one. How will the processes be transitioned to the correct SELinux context? The correct cgroup?
The Contained Case
In contrast, a sysadmin trying to secure a container has four advantages:
- It is trivial (and usually automatic) to transition an entire container into a particular SELinux context and/or cgroup (Docker has –security-opt, OpenShift PID-based groups, etc.).
- Operating system behavior need not be locked down, only the container/host relationship.
- The container is---usually---placed on a virtual network and/or interface (often the container runtime environment even has supplemental lock-down capabilities).
- Containers naturally provide for experimentation. You can easily launch a container with a varying set of kernel capabilities.
Most frameworks for launching containers do so with sensible "base" SELinux types. For example, both Docker and systemd-nspawn (when using SELinux under RHEL or Fedora) launch all containers with variations of svirt types based on previous work with libvirt. Additionally, many container launchers also borrow libvirt's philosophy of giving each launched container unique Multi-Category Security (MCS) labels that can optionally be set by the admin. Combined with read-only mounting and the fact that an admin only needs to worry about container/host interactions, this MCS functionality can go a long way towards restricting an applications behavior.
For this application, it is straight-forward to:
- Label the static, image, and database stores with unique MCS labels (e.g. c1, c2, and c3).
- Launch the nginx container with labels and binding options (i.e. :ro) appropriate for reading only the image and static stores (-v /path:/path:ro and --security-opt=label:level:s0:c1,c2 for Docker).
- Launch the node.js container binding the image store read/write and with a label giving it only access to that store.
- Launching the mariadb container with only the data persistence store mounted read/write and with a label giving it access only to that store.
Should you need to go beyond what MCS can offer, most container frameworks support launching containers with specific SELinux types. Even when working with derived or original SELinux types, containers make everything easier as you need only worry about the interactions between the container and host.
With containers, there are many tools for restricting intra-container communication. Alternatively, for all container frameworks that give each container a unique IP, iptables can also be applied directly. With iptables---for example---it is easy to restrict:
- The nginx container from speaking anything but HTTP to the nginx container and HTTPS to the outside world.
- Block the node.js container from doing anything but speaking HTTP to the nginx container and using the database port of the mariadb container.
- Block mariadb from doing anything but receiving request from the node.js container on it's database port.
For preventing DDOS or other resource-based attacks, we can use the container launchers built-in tools (e.g. Docker's ulimit options) or cgroups directly. Either way it is easy to---for example---restrict the node.js and mariadb containers to some hard resource limit (40% of RAM, 20% of CPU and so on).
Finally, container frameworks combined with unit tests are a great way for finding a restricted set of kernel capabilities with which to run an application layer. Whether the framework encourages starting with a minimal set and building up (systemd-nspawn) or with a larger set and letting you selectively drop (Docker), it's easy to keep launching containers until you find a restricted---but workable---collection.
The configuration to isolation ratio of the above work is extremely high compared to "manual" SELinux/cgroup/iptables isolation. There is also much less to "go wrong" as it is much easier to understand the container/host relationship and its needs than it is to understand the process/OS relationship. Among other upsides, the above configuration: prevents a compromised nginx from altering any data on the host (including the image-store and database), prevents a compromised mariadb from altering anything other than the database, and---depending on what exact kernel capabilities are absolutely required---may go a long way towards prevention of privilege escalation.
While containers do not allow for any forms of isolation not already possible, in practice they make configuring isolation much simpler. They limit isolation to container/host instead of process/OS. By binding containers to virtual networks or interfaces, they simplify firewall rules. Container implementations often provide sensible SELinux or other protection defaults that can be easily extended.
The trade-off is that containers expose an additional container/container attack-surface that is not as trivial to isolate.
Advantage #3: Containers Have More Limited and Explicit Dependencies
The Base Case
Containers are meant to eliminate "works for me" problems. A common cause of "works for me" problems in traditional installations is hidden dependencies. An example is a software component depending on a common command line utility without a developer knowing it. Besides creating instability over installation types, this is a security issue. A sysadmin cannot protect against a vulnerability in a component they do not know is being used.
The flip-side of unknown dependencies and of much greater concern is extraneous or cross-over components. Components needed by one portion of the stack can actually make other components not designed with them in mind extremely dangerous. Many privilege escalation flaws involve abusing access to suid programs that, while essential to some applications, are extraneous to others.
The Contained Case
Obviously, container isolation helps prevent component dependency cross-over but containers also help to minimize extraneous dependencies. Containers are not virtual machines. Containers do not not have to boot, they do not have to support interactive usage, they are usually single user, and can be simpler than a full operating system in any number of ways. Thus containers can eschew service launchers, shells, sensitive configuration files, and other cruft that serves (from an application perspective) to only serve as an attack surface.
Truly minimal custom containers will more or less look like just the top few layers of their RPM/Deb/pkg "pyramid" without any of the bottom layers. Even "general" purpose containers are undergoing a healthy "race to the bottom" to have as minimal a starting footprint as possible. The Docker version of RHEL 7, an operating system not exactly famous for minimalism, is itself less than 155 megs uncompressed.
Container isolation means that when a portion of your application stack has a dependency, that dependency's attack surface is available only to that portion of your application. This is in stark contrast to traditional installations where attack surfaces are always additive. Exploitation almost always involves chaining multiple vulnerabilities, so this advantage may be one of containers' most powerful.
A common security complaint regarding containers is that in many ways they are comparable to statically linked binaries. The flip side is that this puts pressure on developers and maintainers to minimize the size of these blobs, which minimizes their attack surface. Shellshock is a good example of the kind of vulnerability this mitigates. It is nearly impossible for a traditional container to not have a highly complex shell, but many containers ship without a shell of any kind.
Beyond containers themselves this pressure has resulted in the rise of the minimal host operating system (e.g. Atomic, CoreOS, RancherOS). This has brought a reduced attack surface (and in the case of Atomic a certain degree of immutability) to the host as well as the container.
Containers Is As Containers Do
Other security advantages of containers include working well in an immutable and/or stateless paradigms, good content auditability (especially compared to virtual machines), and---potentially---good verifiability. A single blog post can't cover all of the upsides of containers, much less the upsides and downsides. Ultimately, a large part of understanding the security impact of containers is coming to terms with the fact that containers are neither degenerate virtual machines nor superior jails. They are a unique technology whose impact needs to be assessed on its own.