Understanding user namespaces with rootless containers

Updated -

Introduction

This article seeks to provide examples and explanations regarding the concept of user namespaces, specifically as they are applied to containerization technologies which leverage them when run as non-root users such as podman, buildah, and skopeo.

There is some prerequisite knowledge required to fully understand the context of this article. It's suggested that you are familiar with the following concepts:

What is a user namespace in the context of containers?

A user namespace is a facility within Linux that allows a non-root user to map a range of UIDs and GIDs on the host system to a different range of UIDs and GIDs within the namespace. This mapping is primarily used in containerization technologies to allow the non-root user creating and running containers to execute processes and create files as if they are a user other than themselves inside of the container.

As an example, consider a user with a UID of 1000 named "bob". This user is given, by properly configuring the /etc/subuid and /etc/subgid files, a range of UID's on the host to use as their own. For example, consider the below contents of /etc/subuid and /etc/subgid:

$ cat /etc/subuid
bob:10000:65536

$ cat /etc/subgid
bob:10000:65536

These files have three fields.

  • A username or UID of the user who is receiving the mapped UIDs or GIDs.
  • The starting UID or GID on the host that is used by the mapping.
  • The range from the starting UID or GID that the user namespace extends.

To provide a visual mapping of the above /etc/subuid and /etc/subid, consider the following chart:

                 bob
                 1000      10000    165535
                   │         │        │
                   ▼         ▼        ▼
           ┌───────┬┬────────┬────────┬────────────────────────┐
           │       ││        │        │                        │
           │       ││        │        │                        │
   host    │       ││        │        │                        │
           │       ││        │        │                        │
           │       ││        │        │                        │
           │       ││        │        │                        │
           └───────┴┴────────┴────────┴────────────────────────┘

                   │         │        │
           ┌───────┘         │        │
           │                 │        │
           │ ┌───────────────┘        └────────────────────────┐
           │ │                                                 │
           ▼ ▼                                                 ▼
           ┌───────────────────────────────────────────────────┐
           │                                                   │
           │                                                   │
   user    │                                                   │
namespace  │                                                   │
           │                                                   │
           │                                                   │
           └───────────────────────────────────────────────────┘
           0 1                                             65335

In the above example, "bob" is mapped into the user namespace as UID 0. If the user 0 in the namespace (which would be interpreted as root) launches a process the host would see it launched by the bob user. If inside of the user namespace a user named appuser with a UID of 1001 runs a program, the in-user-namespace UID of 1001 would map to the starting range of 10000 on the host plus the UID of 1001, meaning the UID of 11001 would be used on the "host".

As we can see in the above example, any users on the host starting at UID 10000 and extending out by 65536 ID's are useable by the bob user in user namespaces.

This mechanism is a core component of rootless containers as it allows a non-root user to execute processes and create files as different users within containers while not requiring the actual UIDs or GIDs on the host, which would pose a security risk. By specifying the ranges a user is allowed to work with via /etc/subuid and /etc/subgid on a system, any files or processes using ID's within the range can be attributed to the bob user.

Using multiple ranges in `/etc/subuid` and `/etc/subgid`

Multiple ranges can be specified in the /etc/subuid and /etc/subgid files for one user. Consider the following example using the example above for "bob":

$ cat /etc/subuid
bob:10000:65536
bob:220000:6000

$ cat /etc/subgid
bob:10000:65536
bob:220000:6000

In this example, bob has a total allocation of 71536 total UIDs and GIDs. These UIDs and GIDs start within the namespace at 1 (with 0 being reserved as bob's UID and GID on the host). This total of 71536 IDs is made up of two ranges on the host, the range of 10000 to 165535 on the host for IDs 1-65536 within the user namespace, and 220000 to 226000 on the host to 65537-71536 within the user namespace. The following visual shows how this is mapped:

                 bob
                 1000      10000    165535     220000    226000
                   │         │        │            │       │
                   ▼         ▼        ▼            ▼       ▼
           ┌───────┬┬────────┬────────┬────────────┬───────┬───┐
           │       ││        │        │            │       │   │
           │       ││        │        │            │       │   │
   host    │       ││        │        │            │       │   │
           │       ││        │        │            │       │   │
           │       ││        │        │            │       │   │
           │       ││        │        │            │       │   │
           └───────┴┴────────┴────────┴────────────┴───────┴───┘

                   │         │        │            │       │
           ┌───────┘         │        │            │       │
           │                 │        │            │       │
           │ ┌───────────────┘        └───┬────────┘       └───┐
           │ │                            │                    │
           ▼ ▼                            ▼                    ▼
           ┌──────────────────────────────┬────────────────────┐
           │                              │                    │
           │                              │                    │
   user    │                              │                    │
namespace  │                              │                    │
           │                              │                    │
           │                              │                    │
           └──────────────────────────────┴────────────────────┘
           0 1                        65335                71535

In this way, all user namespaces utilized in the /etc/subuid and /etc/subgid files are contiguous; you cannot have empty or unmapped ranges within the user namespace, although the mapped ranges may span across UID and GID ranges on the host.

Using multiple users in `/etc/subuid` and `/etc/subgid`

It is similarly possible to use multiple users within the /etc/subuid and /etc/subuid files. This being said, it is important to ensure no overlap of the ranges utilized, as that can lead to permission issues between the two users when containers utilize the same UIDs or GIDs on the host within containers. While this mapping may not cause immediate issues, it's highly suggested to ensure no overlaps exist to prevent potential issues.

How are user namespaces used by rootless users who are using container tools such as `podman`, `buildah`, and `skopeo`?

In a rootless container, we are executing as an arbitrary user other than root; therefore this user cannot execute processes and interact with files as any arbitrary user or group, as it would be denied permission to do so. This is where the concept of a "user namespace" is required. A user namespace represents a mapping of UIDs and GIDs on the system that this user is allowed to interact with and masquerade as.

Consider the following example using the user "bob" with the UID of 1000 on the host, using the below /etc/subuid:

$ cat /etc/subuid
bob:10000:65536

When the use launches a container running an httpd image, which executes the process in the container as the 1001 UID within the container:

$ podman run -d -p 8080:80 --name httpd registry.redhat.io/rhel8/httpd-24

$ podman ps -a
CONTAINER ID  IMAGE                                     COMMAND               CREATED        STATUS            PORTS                 NAMES
df2143c98b8e  registry.redhat.io/rhel8/httpd-24:latest  /usr/bin/run-http...  7 seconds ago  Up 7 seconds ago  0.0.0.0:8080->80/tcp  httpd

Inside of the container, the default user is the one being utilized with a UID of 1001 and the UID mapping can be seen from inside of the container via the uid_map file in /proc for the process:

$ podman exec -it httpd id
uid=1001(default) gid=0(root) groups=0(root)

$ podman exec -it httpd cat /proc/self/uid_map
         0       1000          1
         1      10000      65536

If we create a file as this default user in the container, and inspect it also from within the container, we can see it owned by 1001:

$ podman exec -it httpd touch /var/www/html/testfile

$ podman exec -it httpd ls -n /var/www/html/testfile
-rw-r--r--. 1 1001 0 0 May  3 14:49 /var/www/html/testfile

However, from the host, if we manually inspect the file outside of the context of the user namespace, we can see the default user's UID of 1001 is interpreted on the host as 11000:

$ podman inspect httpd | jq '.[].GraphDriver.Data.UpperDir'
"/home/bob/.local/share/containers/storage/overlay/4a983d97bfa19d3cef195e52a5897fc83a9b7af4d7c5cac756ed7082c43203ff/diff"

$ ls -n /home/bob/.local/share/containers/storage/overlay/4a983d97bfa19d3cef195e52a5897fc83a9b7af
4d7c5cac756ed7082c43203ff/diff/var/www/html/testfile
-rw-r--r--. 1 11000 1000 0 May  3 10:49 /home/bob/.local/share/containers/storage/overlay/4a983d97bfa19d3cef195e52a5897fc83a9b7af4d7c5cac756ed7082c43203ff/diff/var/www/html/testfile

This is because of the aforementioned mappings; the user 11000 on the host is within the mapped range of the /etc/subuid file as belonging to the bob user, and bob subsequently can create files and launch processes as this UID on the host.

This is vital for operations within containers occurring as specific users while running a rootless environment. Using this method, normal users who have properly configured /etc/subuid and /etc/subgid files can subsequently masquerade as other users within containers, allowing the appearance of a full Linux environment within the container but being mapped to a secure range of IDs on the host. Should a user in a container break out of their namespaced environment, they can not gain any privileges that the user "bob" would not have.

Why does the root user not use user namespaces for containers?

When running container tools such as podman, buildah, or skopeo as a non-root user, special care for specific circumstances has to be made, notably that of ensuring that user and group ID's can be properly allocated and managed within the container, but still on the host be easily translatable to the user who created the container in the first place.

For example, consider running a Red Hat Apache container image as root. If we launch the container like so, we create an httpd container which will run the httpd process, but does so not as the root user inside of the container but as the user default with a UID of 1001:

# podman run -d -p 8080:80 --name httpd registry.redhat.io/rhel8/httpd-24

# podman ps -a
CONTAINER ID  IMAGE                              COMMAND               CREATED        STATUS            PORTS                 NAMES
bb1b2780d132  registry.redhat.io/rhel8/httpd-24  /usr/bin/run-http...  7 seconds ago  Up 6 seconds ago  0.0.0.0:8080->80/tcp  httpd

# podman exec -it httpd id
uid=1001(default) gid=0(root) groups=0(root)

If this user creates a file or launches a process while running in a rootful container, it is simply seen as being performed as user 1001 on the host. To prove this point, we can create a file as the user within the container as the default user with the UID of 1001:

# podman exec -it httpd touch /var/www/html/testfile.txt

# podman exec -it httpd ls -lattr /var/www/html/testfile.txt
-rw-r--r--. 1 default root 0 May  3 12:41 /var/www/html/testfile.txt

Inside of the container, we can see it is clearly owned by the default user as we'd expect. If we get the location of the file on the host itself, and not within the container, we can see that the file that was just made is still owned by 1001 directly on the host:

# podman inspect httpd | jq '.[].GraphDriver.Data.MergedDir'
"/var/lib/containers/storage/overlay/a9c449534602e0dfe51254d3353a4c51d263857b83401113a164bc17c0251844/merged"

# ls -lattr /var/lib/containers/storage/overlay/a9c449534602e0dfe51254d3353a4c51d263857b83401113a164bc17c0251844/merged/var/www/html/testfile.txt
-rw-r--r--. 1 1001 root 0 May  3 08:41 /var/lib/containers/storage/overlay/a9c449534602e0dfe51254d3353a4c51d263857b83401113a164bc17c0251844/merged/var/www/html/testfile.txt

Although the host has no configured user of 1001, any files made by the user within the rootful container are simply written with the UID and GID of whatever the container has configured. This is not an issue with rootful containers as in most cases these users simply exist only within their limited container context and do not write to the host save for bind-mounted volumes. It is trivial for the root user on a host to masquerade as other users as the root user can simply switch UID's and GID's to whatever is desired.

Troubleshooting user namespace issues within containers.

While not every single scenario can be covered, a list of common troubleshooting steps is provided below.

If none of the below situations apply to your issue or environment, and you have a valid support agreement with Red Hat, please Contact Technical Support for further assistance.

Checking if user namespaces are enabled

User namespaces can be entirely disabled. A kernel tunable parameter allows or disallows user namespaces, with a limit of the number of namespaces possible.

The current limit of user namespaces can be seen with the following command:

$ sysctl user.max_user_namespaces
user.max_user_namespaces = 14803

As the root user, this value can be changed to 0 to disallow any user namespaces (disabling tools like podman, skopeo, and buildah from being used by non-root users) or to any larger value. To set this temporarily the root user can execute the following command with any desired value:

# sysctl -w user.max_user_namespaces=63556

For persistence, ensuring the values are in the proper sysctl.d file is necessary:

# echo "user.max_user_namespaces = 63556" >> /etc/sysctl.d/user-ns.conf

It is not possible to give an exact number for this value. Choose one appropriate to your environment and expected workloads.

Checking if mapping of user or group IDs is working:

To check if your user namespace is properly mapping from your /etc/subuid or /etc/subgid files, you can use the podman unshare command to get the the current mapping.

For an /etc/subuid file containing the following:

$ cat /etc/subuid
bob:10000:65536

The following command should produce this output:

$ podman unshare cat /proc/self/uid_map
         0       1000          1
         1      10000      65536

This shows that UID 0 in the container is mapped to UID 1000 on the host, and UID's 1 through 65535 are mapped starting at UID 10000 on the host and extending out 65536 UIDs, to the end of the range which is UID 165535.

If there are errors or the range does not look exactly as it does in /etc/subuid, ensure the syntax of the file is correct or reach out to Red Hat Support.

Checking if the UID or GID within the container is too large to be covered by the existing ranges for the user

When running podman, skopeo, or buildah commands as a rootless user, a very large UID or GID might be requested for a user. This happens when the creator of the container image creates a user within the container with a UID that has exceeded the range available to your rootless user. Consider this error:

requested 1000320999:12 for /home/largeuid: lchown /home/largeuid: invalid argument

Likely the UID 1000320999 is not available to the user in the /etc/subuid file. This would have to be resolved by extending the range of UID's for that particular user to also encompass 1000320999, such as the following does:

bob:1000000:1000400000

The above configuration is likely not specific for your environment and will need to be adjusted. In this example of a resolution, a single namespace mapping of UID 1000000 to UID 1001400000 is mapped on the host, which encompasses UIDs 1 to 1000400000 within the container.

This solution to the problem unfortunately creates a large expanse of UID's on the host that are unusable by other users, which can create problems in environments using these UIDs for other operations such as LDAP or Active Directory. There is no current method with the /etc/subuid or /etc/subgid files to create a range that "skips" over IDs on the host; the entire range up to the large ID must be allocated on the host.

Comments