Understanding user namespaces with rootless containers
Table of Contents
- Introduction
- What is a user namespace in the context of containers?
- Using multiple ranges in `/etc/subuid` and `/etc/subgid`
- Using multiple users in `/etc/subuid` and `/etc/subgid`
- How are user namespaces used by rootless users who are using container tools such as `podman`, `buildah`, and `skopeo`?
- Why does the root user not use user namespaces for containers?
- Troubleshooting user namespace issues within containers.
- Checking if user namespaces are enabled
- Checking if mapping of user or group IDs is working:
- Checking if the UID or GID within the container is too large to be covered by the existing ranges for the user
Introduction
This article seeks to provide examples and explanations regarding the concept of user namespaces, specifically as they are applied to containerization technologies which leverage them when run as non-root users such as podman
, buildah
, and skopeo
.
There is some prerequisite knowledge required to fully understand the context of this article. It's suggested that you are familiar with the following concepts:
- Linux Containers
- Rootless Containers
- Linux Namespaces
- Accessing Linux Namespaces with tools such as
nsenter
What is a user namespace in the context of containers?
A user namespace is a facility within Linux that allows a non-root user to map a range of UIDs and GIDs on the host system to a different range of UIDs and GIDs within the namespace. This mapping is primarily used in containerization technologies to allow the non-root user creating and running containers to execute processes and create files as if they are a user other than themselves inside of the container.
As an example, consider a user with a UID of 1000
named "bob". This user is given, by properly configuring the /etc/subuid
and /etc/subgid
files, a range of UID's on the host to use as their own. For example, consider the below contents of /etc/subuid
and /etc/subgid
:
$ cat /etc/subuid
bob:10000:65536
$ cat /etc/subgid
bob:10000:65536
These files have three fields.
- A username or UID of the user who is receiving the mapped UIDs or GIDs.
- The starting UID or GID on the host that is used by the mapping.
- The range from the starting UID or GID that the user namespace extends.
To provide a visual mapping of the above /etc/subuid
and /etc/subid
, consider the following chart:
bob
1000 10000 165535
│ │ │
▼ ▼ ▼
┌───────┬┬────────┬────────┬────────────────────────┐
│ ││ │ │ │
│ ││ │ │ │
host │ ││ │ │ │
│ ││ │ │ │
│ ││ │ │ │
│ ││ │ │ │
└───────┴┴────────┴────────┴────────────────────────┘
│ │ │
┌───────┘ │ │
│ │ │
│ ┌───────────────┘ └────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────────────────────────────────┐
│ │
│ │
user │ │
namespace │ │
│ │
│ │
└───────────────────────────────────────────────────┘
0 1 65335
In the above example, "bob" is mapped into the user namespace as UID 0
. If the user 0
in the namespace (which would be interpreted as root
) launches a process the host would see it launched by the bob
user. If inside of the user namespace a user named appuser
with a UID of 1001
runs a program, the in-user-namespace UID of 1001
would map to the starting range of 10000
on the host plus the UID of 1001
, meaning the UID of 11001
would be used on the "host".
As we can see in the above example, any users on the host starting at UID 10000
and extending out by 65536 ID's are useable by the bob
user in user namespaces.
This mechanism is a core component of rootless containers as it allows a non-root user to execute processes and create files as different users within containers while not requiring the actual UIDs or GIDs on the host, which would pose a security risk. By specifying the ranges a user is allowed to work with via /etc/subuid
and /etc/subgid
on a system, any files or processes using ID's within the range can be attributed to the bob
user.
Using multiple ranges in `/etc/subuid` and `/etc/subgid`
Multiple ranges can be specified in the /etc/subuid
and /etc/subgid
files for one user. Consider the following example using the example above for "bob":
$ cat /etc/subuid
bob:10000:65536
bob:220000:6000
$ cat /etc/subgid
bob:10000:65536
bob:220000:6000
In this example, bob has a total allocation of 71536
total UIDs and GIDs. These UIDs and GIDs start within the namespace at 1
(with 0
being reserved as bob
's UID and GID on the host). This total of 71536
IDs is made up of two ranges on the host, the range of 10000
to 165535
on the host for IDs 1
-65536
within the user namespace, and 220000
to 226000
on the host to 65537
-71536
within the user namespace. The following visual shows how this is mapped:
bob
1000 10000 165535 220000 226000
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌───────┬┬────────┬────────┬────────────┬───────┬───┐
│ ││ │ │ │ │ │
│ ││ │ │ │ │ │
host │ ││ │ │ │ │ │
│ ││ │ │ │ │ │
│ ││ │ │ │ │ │
│ ││ │ │ │ │ │
└───────┴┴────────┴────────┴────────────┴───────┴───┘
│ │ │ │ │
┌───────┘ │ │ │ │
│ │ │ │ │
│ ┌───────────────┘ └───┬────────┘ └───┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────────┬────────────────────┐
│ │ │
│ │ │
user │ │ │
namespace │ │ │
│ │ │
│ │ │
└──────────────────────────────┴────────────────────┘
0 1 65335 71535
In this way, all user namespaces utilized in the /etc/subuid
and /etc/subgid
files are contiguous; you cannot have empty or unmapped ranges within the user namespace, although the mapped ranges may span across UID and GID ranges on the host.
Using multiple users in `/etc/subuid` and `/etc/subgid`
It is similarly possible to use multiple users within the /etc/subuid
and /etc/subuid
files. This being said, it is important to ensure no overlap of the ranges utilized, as that can lead to permission issues between the two users when containers utilize the same UIDs or GIDs on the host within containers. While this mapping may not cause immediate issues, it's highly suggested to ensure no overlaps exist to prevent potential issues.
How are user namespaces used by rootless users who are using container tools such as `podman`, `buildah`, and `skopeo`?
In a rootless container, we are executing as an arbitrary user other than root; therefore this user cannot execute processes and interact with files as any arbitrary user or group, as it would be denied permission to do so. This is where the concept of a "user namespace" is required. A user namespace represents a mapping of UIDs and GIDs on the system that this user is allowed to interact with and masquerade as.
Consider the following example using the user "bob" with the UID of 1000
on the host, using the below /etc/subuid
:
$ cat /etc/subuid
bob:10000:65536
When the use launches a container running an httpd
image, which executes the process in the container as the 1001
UID within the container:
$ podman run -d -p 8080:80 --name httpd registry.redhat.io/rhel8/httpd-24
$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
df2143c98b8e registry.redhat.io/rhel8/httpd-24:latest /usr/bin/run-http... 7 seconds ago Up 7 seconds ago 0.0.0.0:8080->80/tcp httpd
Inside of the container, the default
user is the one being utilized with a UID of 1001
and the UID mapping can be seen from inside of the container via the uid_map
file in /proc
for the process:
$ podman exec -it httpd id
uid=1001(default) gid=0(root) groups=0(root)
$ podman exec -it httpd cat /proc/self/uid_map
0 1000 1
1 10000 65536
If we create a file as this default
user in the container, and inspect it also from within the container, we can see it owned by 1001
:
$ podman exec -it httpd touch /var/www/html/testfile
$ podman exec -it httpd ls -n /var/www/html/testfile
-rw-r--r--. 1 1001 0 0 May 3 14:49 /var/www/html/testfile
However, from the host, if we manually inspect the file outside of the context of the user namespace, we can see the default
user's UID of 1001
is interpreted on the host as 11000
:
$ podman inspect httpd | jq '.[].GraphDriver.Data.UpperDir'
"/home/bob/.local/share/containers/storage/overlay/4a983d97bfa19d3cef195e52a5897fc83a9b7af4d7c5cac756ed7082c43203ff/diff"
$ ls -n /home/bob/.local/share/containers/storage/overlay/4a983d97bfa19d3cef195e52a5897fc83a9b7af
4d7c5cac756ed7082c43203ff/diff/var/www/html/testfile
-rw-r--r--. 1 11000 1000 0 May 3 10:49 /home/bob/.local/share/containers/storage/overlay/4a983d97bfa19d3cef195e52a5897fc83a9b7af4d7c5cac756ed7082c43203ff/diff/var/www/html/testfile
This is because of the aforementioned mappings; the user 11000
on the host is within the mapped range of the /etc/subuid
file as belonging to the bob
user, and bob
subsequently can create files and launch processes as this UID on the host.
This is vital for operations within containers occurring as specific users while running a rootless environment. Using this method, normal users who have properly configured /etc/subuid
and /etc/subgid
files can subsequently masquerade as other users within containers, allowing the appearance of a full Linux environment within the container but being mapped to a secure range of IDs on the host. Should a user in a container break out of their namespaced environment, they can not gain any privileges that the user "bob" would not have.
Why does the root user not use user namespaces for containers?
When running container tools such as podman
, buildah
, or skopeo
as a non-root user, special care for specific circumstances has to be made, notably that of ensuring that user and group ID's can be properly allocated and managed within the container, but still on the host be easily translatable to the user who created the container in the first place.
For example, consider running a Red Hat Apache container image as root. If we launch the container like so, we create an httpd
container which will run the httpd
process, but does so not as the root user inside of the container but as the user default
with a UID of 1001:
# podman run -d -p 8080:80 --name httpd registry.redhat.io/rhel8/httpd-24
# podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bb1b2780d132 registry.redhat.io/rhel8/httpd-24 /usr/bin/run-http... 7 seconds ago Up 6 seconds ago 0.0.0.0:8080->80/tcp httpd
# podman exec -it httpd id
uid=1001(default) gid=0(root) groups=0(root)
If this user creates a file or launches a process while running in a rootful container, it is simply seen as being performed as user 1001
on the host. To prove this point, we can create a file as the user within the container as the default
user with the UID of 1001
:
# podman exec -it httpd touch /var/www/html/testfile.txt
# podman exec -it httpd ls -lattr /var/www/html/testfile.txt
-rw-r--r--. 1 default root 0 May 3 12:41 /var/www/html/testfile.txt
Inside of the container, we can see it is clearly owned by the default
user as we'd expect. If we get the location of the file on the host itself, and not within the container, we can see that the file that was just made is still owned by 1001
directly on the host:
# podman inspect httpd | jq '.[].GraphDriver.Data.MergedDir'
"/var/lib/containers/storage/overlay/a9c449534602e0dfe51254d3353a4c51d263857b83401113a164bc17c0251844/merged"
# ls -lattr /var/lib/containers/storage/overlay/a9c449534602e0dfe51254d3353a4c51d263857b83401113a164bc17c0251844/merged/var/www/html/testfile.txt
-rw-r--r--. 1 1001 root 0 May 3 08:41 /var/lib/containers/storage/overlay/a9c449534602e0dfe51254d3353a4c51d263857b83401113a164bc17c0251844/merged/var/www/html/testfile.txt
Although the host has no configured user of 1001
, any files made by the user within the rootful container are simply written with the UID and GID of whatever the container has configured. This is not an issue with rootful containers as in most cases these users simply exist only within their limited container context and do not write to the host save for bind-mounted volumes. It is trivial for the root user on a host to masquerade as other users as the root user can simply switch UID's and GID's to whatever is desired.
Troubleshooting user namespace issues within containers.
While not every single scenario can be covered, a list of common troubleshooting steps is provided below.
If none of the below situations apply to your issue or environment, and you have a valid support agreement with Red Hat, please Contact Technical Support for further assistance.
Checking if user namespaces are enabled
User namespaces can be entirely disabled. A kernel tunable parameter allows or disallows user namespaces, with a limit of the number of namespaces possible.
The current limit of user namespaces can be seen with the following command:
$ sysctl user.max_user_namespaces
user.max_user_namespaces = 14803
As the root user, this value can be changed to 0
to disallow any user namespaces (disabling tools like podman
, skopeo
, and buildah
from being used by non-root users) or to any larger value. To set this temporarily the root user can execute the following command with any desired value:
# sysctl -w user.max_user_namespaces=63556
For persistence, ensuring the values are in the proper sysctl.d
file is necessary:
# echo "user.max_user_namespaces = 63556" >> /etc/sysctl.d/user-ns.conf
It is not possible to give an exact number for this value. Choose one appropriate to your environment and expected workloads.
Checking if mapping of user or group IDs is working:
To check if your user namespace is properly mapping from your /etc/subuid
or /etc/subgid
files, you can use the podman unshare
command to get the the current mapping.
For an /etc/subuid
file containing the following:
$ cat /etc/subuid
bob:10000:65536
The following command should produce this output:
$ podman unshare cat /proc/self/uid_map
0 1000 1
1 10000 65536
This shows that UID 0
in the container is mapped to UID 1000
on the host, and UID's 1
through 65535
are mapped starting at UID 10000
on the host and extending out 65536
UIDs, to the end of the range which is UID 165535
.
If there are errors or the range does not look exactly as it does in /etc/subuid
, ensure the syntax of the file is correct or reach out to Red Hat Support.
Checking if the UID or GID within the container is too large to be covered by the existing ranges for the user
When running podman
, skopeo
, or buildah
commands as a rootless user, a very large UID or GID might be requested for a user. This happens when the creator of the container image creates a user within the container with a UID that has exceeded the range available to your rootless user. Consider this error:
requested 1000320999:12 for /home/largeuid: lchown /home/largeuid: invalid argument
Likely the UID 1000320999
is not available to the user in the /etc/subuid
file. This would have to be resolved by extending the range of UID's for that particular user to also encompass 1000320999
, such as the following does:
bob:1000000:1000400000
The above configuration is likely not specific for your environment and will need to be adjusted. In this example of a resolution, a single namespace mapping of UID 1000000
to UID 1001400000
is mapped on the host, which encompasses UIDs 1
to 1000400000
within the container.
This solution to the problem unfortunately creates a large expanse of UID's on the host that are unusable by other users, which can create problems in environments using these UIDs for other operations such as LDAP or Active Directory. There is no current method with the /etc/subuid
or /etc/subgid
files to create a range that "skips" over IDs on the host; the entire range up to the large ID must be allocated on the host.
Comments