Docker is the public face of Linux containers and two of Linux's unsung heroes: control groups (cgroups) and namespaces. Like virtualization, containers are appealing because they help solve two of the oldest problems to plague developers: "dependency hell" and "environmental hell."
Closely related, dependency and environmental hell can best be thought of as the chief cause of "works for me" situations. Dependency hell simply describes the complexity inherent in modern application's tangled graph of external libraries and programs they need to function. Environmental hell is the name for the operating system portion of that same problem (i.e. what wrinkles, in particular which bash implementation,on which that quick script you wrote unknowingly relies).
Namespaces provide the solution in much the same way as virtual memory simplified writing code on a multi-tenant machine: by providing the illusion that an application suite has the computer all to itself. In other words,"via isolation". When a process or process group is isolated via these new namespace features, we say they are "contained." In this way, virtualization and containers are conceptually related, but containers isolate in a completely different way and conflating the two is just the first of a series of misconceptions that must be cleared up in order to understand how to use containers as securely as possible. Virtualization involves fully isolating programs to the point that one can use Linux, for example, while another uses BSD. Containers are not so isolated. Here are a few of the ways that "containers do not contain:"
- Containers all share the same kernel. If a contained application is hijacked with a privilege escalation vulnerability, all running containers *and* the host are compromised. Similarly, it isn't possible for two containers to use different versions of the same kernel module.
- Several resources are *not* namespaced. Examples include normal ulimit systems still being needed to control resources such as filehandlers. The kernel keyring is another example of a resource that is not namespaced. Many beginning users of containers find it counter-intuitive that socket handlers can be exhausted or that kerberos credentials are shared between containers when they believe they have exclusive system access. A badly behaving process in one container could use up all the filehandles on a system and starve the other containers. Diagnosing the shared resource usage is not feasible from within
- By default, containers inherit many system-level kernel capabilities. While Docker has many useful options for restricting kernel capabilities, you need a deeper understanding of an application's needs to run it inside containers than you would if running it in a VM. The containers and the application within them will be dependent on the capabilities of the kernel on which they reside.
- Containers are not "write once, run anywhere". Since they use the host kernel, applications must be compatible with said kernel. Just because many applications don't depend on particular kernel features doesn't mean that no applications do.
For these and other reasons, Docker images should be designed and used with consideration for the host system on which they are running. By only consuming images from trusted sources, you reduce the risk of deploying containerized applications thates. Docker images should be considered as powerful as RPMs and should only be installed from sources you trust. You wouldn't expect your system to remain secured if you were to randomly install untrusted RPMs nor should you if you "docker pull" random Docker images.
In the future we will discuss the topic of untrusted images.