Unable to create large amount to ports to OpenStack cloud

Solution In Progress - Updated -

Environment

  • Red Hat OpenStack Platform 10.0
  • Red Hat OpenStack Platform 13.0

Issue

  • Unable to create a large number of ports in RHOSP environment.
  • Client use case requires OpenStack to handle around 16000 ports in a cloud.
  • When stress testing an OpenStack deployment by creating a large number of Neutron resources, One of the three controllers start producing errors like below:

  • /var/log/openvswitch/ovs-vswitchd.log

    netlink_socket|ERR|connect(0): Argument list too long
    

Resolution

Workaround

  1. Increase the file descriptor limit for ovs-vswitchd

    # mkdir /etc/systemd/system/ovs-vswitchd.service.d/
    # echo -e "[Service]\nLimitNOFILE=262144" > /etc/systemd/system/ovs-vswitchd.service.d/limit_no_file.conf
    # systemctl daemon-reload
    # systemctl restart openvswitch
    
  2. Limit the n-handler-threads to 13 at most.

    # ovs-vsctl set Open_vSwitch $(ovs-vsctl show | head -n 1) other_config:n-handler-threads=13
    

Note:

  • By default, ovs creates (cpu_cores) * 3/4 handler threads per port and each threads consumes 1 fds. The thread number should be adjusted based on the number of cores and ovs ports created in each controller nodes. This behavior has been changed in openvswitch-2.9.0-66.

Root Cause

  • The below error states that for every network resource create request there is a requirement of a file descriptor to handle the request.

    netlink_socket|ERR|connect(0): Argument list too long
    
  • The file descriptor gets increased just because of large number of port create requested.

  • In this scenario, the number of ports added to kernel data path are less in number, which is the reason you are not able to make the stress testing beyond a limit.

Diagnostic Steps

  • Test environment has 3 controller nodes and 49 compute nodes.
  • Stresstesting is done with a custom Python application, which uses openstacksdk to make API calls. Pseudocode version of the test scenario is as follows:

    repeat:
      server := nova.create_server()
      for [1..16]:
        network := neutron.create_network()
        neutron.create_subnet(network)
        neutron.create_and_attach_port(server, network)
    
  • stress test has been created for 260 instances, 4200 networks+subnets, and 12100 ports

  • ovs-vswitchd has about 120k open files and the 'nofile' limit is configured to 262k

    # ls -1 /proc/`pgrep ovs-vswitchd`/fd | wc -l
    127470
    
    # cat /proc/`pgrep ovs-vswitchd`/limits | grep "Max open files"
    Max open files        262144      262144      files
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments