Why is podman exec failing with the error message "exceeded num_locks (2048)" in Red Hat Enterprise Linux ?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9
  • Red Hat Enterprise Linux 10
  • Podman

Issue

  • Why does podman exec fail with the error message "exceeded num_locks (2048)" in Red Hat Enterprise Linux ?

    Error:error allocating lock for new container: allocation failed; exceeded num_locks (2048).
    
  • What is podman system renumber ?

Resolution

Rootfull container

  1. Copy the containers.conf file to /etc/containers/

    # cp /usr/share/containers/containers.conf /etc/containers/containers.conf
    
  2. Uncomment the num_locks in /etc/containers/containers.conf and change the value to 4096.

    num_locks = 4096
    
  3. Move the lock file from the /dev/shm or delete them if it is not need:

    # mv /dev/shm/libpod_lock /tmp
    
  4. Before starting new containers, locks must be renumbered:

    # podman system renumber
    
  5. Remove all unused containers (both dangling and unreferenced), pods, networks, and optionally, volumes from local storage.

    # podman system prune
    # podman volume prune
    
  6. Restart all containers after cleanup. All containers should start without any errors.

Rootless container

  1. Edit the containers.conf file and increase the number of locks to 4096 for the currently logged in user:

    # cat $HOME/.config/containers/containers.conf
    num_locks = 4096
    
  2. Move the lock file from the /dev/shm OR delete them if it is not need.

    $ mv /dev/shm/libpod_rootless_lock_$UID /tmp/
    
  3. Before starting new containers, locks must be renumbered:

    # podman system renumber
    
  4. Restart all containers after cleanup. All containers should start without any errors.

Addtional Information:

  1. For mirror registry using Quay: if the number of created ephemeral volumes is too high, Quay (and optionally Quay database) containers might not start. Removing all volumes with the prune command will also remove volumes that Quay depends on, which will result in the complete loss of registry. If you're using OMR Quay, please take extra precaution during volume pruning so these named volumes do not get removed. These volumes usually contain the word -storage in them, to list them out you can use the following command:

    # podman volume ls | awk 'NR>1{ print $2 }' | grep -v "storage"
    

    To remove volumes interactively you can use the following command:

    # podman volume ls | awk 'NR>1{ print $2 }' | grep -v "storage" | xargs -p -n1 podman volume rm
    

    Omitting the -p flag in xargs will delete all volumes one at a time non-interactively.

  2. For RHOSP, please refer to the following article: RHOSP 15: minor undercloud upgrade breaks podman. RHEL: podman upgrade and reboot breaks existing containers

Root Cause

  • Every Podman container gets a lock at creation time.
  • The maximum number controlled by the num_locks parameter in containers.conf.
  • When these locks get exhausted, creation of new containers are not possible until some existing containers are removed.
  • To avoid this , increasing the number of locks in containers.conf and executing podman system renumber helps prepare new locks (and reallocate lock numbers to fit the new struct).
  • The default num_lock value is 2048.
  • More information on the locks and their renumbering can be found in the official Podman documentation.

Diagnostic Steps

Rootfull container

  • Check all existing volumes.
# podman volume ls | wc -l
2048
  • Error on executing the container
# podman run --rm -it registry.redhat.io/ubi9/ubi:latest
Error: error allocating lock for new container: allocation failed; exceeded num_locks (2048)

Rootless container

  • Switch to the rootless container user.
$ machinectl shell -q awx@.host
  • Check all existing volumes.
$ podman volume ls | wc -l
2048
  • Error on executing the container
$ podman run registry.redhat.io/ansible-automation-platform-23/ee-supported-rhel8:latest date
Error: error creating named volume "6953b0427c0fedc39ae319f5926effdd9d0d475f7777274e62588a92618a1b1b": allocating lock for new volume: allocation failed; exceeded num_locks (2048)
$

Cleanup of the dangling volumes, networks and containers can be done by executing podman system prune. Because this command removes everything listed, it should be used with caution, especially if you have real workloads running on the VM:

$ podman system prune
WARNING! This command removes:
    - all stopped containers
    - all networks not used by at least one container
    - all dangling images
    - all dangling build cache

Are you sure you want to continue? [y/N] y
Deleted Images
Total reclaimed space: 0B

$ podman volume ls | wc -l
2048
$

Same caution is advised for podman volume prune command, it will remove any dangling volumes, not just ephemeral ones but named ones as well if they are not connected to any particular container. Deleting wrong volumes can cause data loss. Please review the output of podman volume ls before deleting volumes.

$ podman volume prune
WARNING! This will remove all volumes not used by at least one container. The following volumes will be removed:
0008cff4e2efb55b645296b5ced791a977624a1a6bf17ce5598a5cabab65f418
0062839868ad9fa9da3bf2433715ecce2544022f0642d13f32073c63c860a681
...
ffb0629475bd9568c70d1704d7985ab008cdbc7970b281552ef0e68a2b7a8c1b
ffd6004adfb2d8b2984343b4f56714b647b508c5f2ebcb97d2e070535df5d9fa
$ 

$ podman volume ls | wc -l
0

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments