When rebooting one of our networkers, the second begins logging messages of the form "haproxy-metadata-proxy-75b51f21-bcc6-46e0-8576-6cfcf82dd7c6[52355]: Proxy listener reached system FD limit at 11. Please check system tunables." and becomes unusable

Solution In Progress - Updated -

Issue

  • One of our OSP 13 network nodes became unresponsive this morning. On the console, we saw messages of the form:
audit: backlog limit exceeded

and AVCs logging is piling up quickly in /var/log/audit/audit.log*:

[root@overcloud-controller-0 audit audit]# ls -tlr
total 38452
-r--------. 1 root root 8388727 Mar 23 10:21 audit.log.4
-r--------. 1 root root 8388685 Mar 23 11:44 audit.log.3
-r--------. 1 root root 8388749 Mar 23 13:26 audit.log.2
-r--------. 1 root root 8388872 Mar 23 14:16 audit.log.1
-rw-------. 1 root root 5802944 Mar 23 14:45 audit.log
[root@overcloud-controller-0 audit]# wc -l *
   26883 audit.log
   28869 audit.log.1
   35549 audit.log.2
   37034 audit.log.3
   39155 audit.log.4
  167490 total
  • We were not able to log into the server. When rebooted it came up fine, but during the reboot process the second networker began to fail with messages of the form:
haproxy-metadata-proxy-75b51f21-bcc6-46e0-8576-6cfcf82dd7c6[52355]: Proxy listener reached system FD limit at 11. Please check system tunables
  • At this point, it is no longer possible to connect to the host via ssh, and the volume of logging on the console makes the console unusable. This is a critical problem: either network node should be able to operate when the other node is offline.

  • Note that due to another issue, we are running a custom kernel and have perf tracing enabled.

  • Errors similar to this are seen in /var/log/containers/neutron/dhcp-agent.log:

2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent [-] Unable to reload_allocations dhcp for b7607204-3717-4293-9565-e9edfa29a01b.: OSError: [Errno 23] Too many open files in system: '/var/lib/neutron/dhcp/b7607204-3717-4293-9565-e9edfa29a01b/tmpRsbj4W'
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 530, in reload_allocations
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     self._spawn_or_reload_process(reload_with_HUP=True)
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 467, in _spawn_or_reload_process
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     self._output_config_files()
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 511, in _output_config_files
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     self._output_opts_file()
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 946, in _output_opts_file
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     file_utils.replace_file(name, '\n'.join(options))
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron_lib/utils/file.py", line 60, in replace_file
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib64/python2.7/tempfile.py", line 458, in NamedTemporaryFile
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib64/python2.7/tempfile.py", line 239, in _mkstemp_inner
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/eventlet/green/os.py", line 109, in open
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent OSError: [Errno 23] Too many open files in system: '/var/lib/neutron/dhcp/b7607204-3717-4293-9565-e9edfa29a01b/tmpRsbj4W'
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent 
  • Error similar to this are seen in /var/log/containers/neutron/metadata-agent.log:
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent [-] Unexpected error.: ConnectionError: HTTPConnectionPool(host='overcloud.internalapi.localdomain', port=8775): Max retries exceeded with url: /2009-04-04/meta-data/local-ipv4 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f54014b0490>: Failed to establish a new connection: [Errno 23] Too many open files in system',))
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/metadata/agent.py", line 207, in _proxy_request
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 518, in request
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 639, in send
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 502, in send
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent ConnectionError: HTTPConnectionPool(host='overcloud.internalapi.localdomain', port=8775): Max retries exceeded with url: /2009-04-04/meta-data/local-ipv4 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f54014b0490>: Failed to establish a new connection: [Errno 23] Too many open files in system',))
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent 
  • Error similar to this are seen in /var/log/containers/neutron/l3-agent.log:
2020-03-23 12:03:01.377 28366 ERROR neutron.agent.linux.utils [-] Rootwrap error running command: ['ip', 'netns', 'exec', 'qrouter-5f85a98f-bf26-4b39-8e3d-d4b26abad392', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/5f85a98f-bf26-4b39-8e3d-d4b26abad392/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/5f85a98f-bf26-4b39-8e3d-d4b26abad392.pid', '-r', '/var/lib/neutron/ha_confs/5f85a98f-bf26-4b39-8e3d-d4b26abad392.pid-vrrp']: OSError: [Errno 23] Too many open files in system
2020-03-23 12:04:01.377 28366 ERROR neutron.agent.linux.external_process [-] keepalived for router with uuid 5f85a98f-bf26-4b39-8e3d-d4b26abad392 not found. The process should not have died
2020-03-23 12:04:01.378 28366 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 5f85a98f-bf26-4b39-8e3d-d4b26abad392
  • Error similar to this are seen in /var/log/containers/neutron/openvwitch-agent.log:
2020-03-23 10:23:57.402 28499 ERROR oslo.messaging._drivers.impl_rabbit [-] [38bfac9a-5756-4b39-aaf8-d8dfad1360e3] AMQP server on rabbitmq.internalapi.localdomain:5672 is unreachable: [Errno 23] Too many open files in system. Trying again in 1 seconds.: error: [Errno 23] Too many open files in system

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In