When rebooting one of our networkers, the second begins logging messages of the form "haproxy-metadata-proxy-75b51f21-bcc6-46e0-8576-6cfcf82dd7c6[52355]: Proxy listener reached system FD limit at 11. Please check system tunables." and becomes unusable

Solution In Progress - Updated -

Issue

  • One of our OSP 13 network nodes became unresponsive this morning. On the console, we saw messages of the form:
audit: backlog limit exceeded

and AVCs logging is piling up quickly in /var/log/audit/audit.log*:

[root@overcloud-controller-0 audit audit]# ls -tlr
total 38452
-r--------. 1 root root 8388727 Mar 23 10:21 audit.log.4
-r--------. 1 root root 8388685 Mar 23 11:44 audit.log.3
-r--------. 1 root root 8388749 Mar 23 13:26 audit.log.2
-r--------. 1 root root 8388872 Mar 23 14:16 audit.log.1
-rw-------. 1 root root 5802944 Mar 23 14:45 audit.log
[root@overcloud-controller-0 audit]# wc -l *
   26883 audit.log
   28869 audit.log.1
   35549 audit.log.2
   37034 audit.log.3
   39155 audit.log.4
  167490 total
  • We were not able to log into the server. When rebooted it came up fine, but during the reboot process the second networker began to fail with messages of the form:
haproxy-metadata-proxy-75b51f21-bcc6-46e0-8576-6cfcf82dd7c6[52355]: Proxy listener reached system FD limit at 11. Please check system tunables
  • At this point, it is no longer possible to connect to the host via ssh, and the volume of logging on the console makes the console unusable. This is a critical problem: either network node should be able to operate when the other node is offline.

  • Note that due to another issue, we are running a custom kernel and have perf tracing enabled.

  • Errors similar to this are seen in /var/log/containers/neutron/dhcp-agent.log:

2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent [-] Unable to reload_allocations dhcp for b7607204-3717-4293-9565-e9edfa29a01b.: OSError: [Errno 23] Too many open files in system: '/var/lib/neutron/dhcp/b7607204-3717-4293-9565-e9edfa29a01b/tmpRsbj4W'
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 530, in reload_allocations
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     self._spawn_or_reload_process(reload_with_HUP=True)
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 467, in _spawn_or_reload_process
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     self._output_config_files()
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 511, in _output_config_files
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     self._output_opts_file()
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 946, in _output_opts_file
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent     file_utils.replace_file(name, '\n'.join(options))
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron_lib/utils/file.py", line 60, in replace_file
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib64/python2.7/tempfile.py", line 458, in NamedTemporaryFile
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib64/python2.7/tempfile.py", line 239, in _mkstemp_inner
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/eventlet/green/os.py", line 109, in open
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent OSError: [Errno 23] Too many open files in system: '/var/lib/neutron/dhcp/b7607204-3717-4293-9565-e9edfa29a01b/tmpRsbj4W'
2020-03-23 12:01:44.621 28367 ERROR neutron.agent.dhcp.agent 
  • Error similar to this are seen in /var/log/containers/neutron/metadata-agent.log:
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent [-] Unexpected error.: ConnectionError: HTTPConnectionPool(host='overcloud.internalapi.localdomain', port=8775): Max retries exceeded with url: /2009-04-04/meta-data/local-ipv4 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f54014b0490>: Failed to establish a new connection: [Errno 23] Too many open files in system',))
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/metadata/agent.py", line 207, in _proxy_request
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 518, in request
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 639, in send
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 502, in send
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent ConnectionError: HTTPConnectionPool(host='overcloud.internalapi.localdomain', port=8775): Max retries exceeded with url: /2009-04-04/meta-data/local-ipv4 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f54014b0490>: Failed to establish a new connection: [Errno 23] Too many open files in system',))
2020-03-23 07:08:16.753 19504 ERROR neutron.agent.metadata.agent 
  • Error similar to this are seen in /var/log/containers/neutron/l3-agent.log:
2020-03-23 12:03:01.377 28366 ERROR neutron.agent.linux.utils [-] Rootwrap error running command: ['ip', 'netns', 'exec', 'qrouter-5f85a98f-bf26-4b39-8e3d-d4b26abad392', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/5f85a98f-bf26-4b39-8e3d-d4b26abad392/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/5f85a98f-bf26-4b39-8e3d-d4b26abad392.pid', '-r', '/var/lib/neutron/ha_confs/5f85a98f-bf26-4b39-8e3d-d4b26abad392.pid-vrrp']: OSError: [Errno 23] Too many open files in system
2020-03-23 12:04:01.377 28366 ERROR neutron.agent.linux.external_process [-] keepalived for router with uuid 5f85a98f-bf26-4b39-8e3d-d4b26abad392 not found. The process should not have died
2020-03-23 12:04:01.378 28366 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 5f85a98f-bf26-4b39-8e3d-d4b26abad392
  • Error similar to this are seen in /var/log/containers/neutron/openvwitch-agent.log:
2020-03-23 10:23:57.402 28499 ERROR oslo.messaging._drivers.impl_rabbit [-] [38bfac9a-5756-4b39-aaf8-d8dfad1360e3] AMQP server on rabbitmq.internalapi.localdomain:5672 is unreachable: [Errno 23] Too many open files in system. Trying again in 1 seconds.: error: [Errno 23] Too many open files in system

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content