Ceph: RGW failing S3 transaction with HTTP 503 response
Issue
- RGW failing S3 transaction with HTTP 503 response
- From the RGW logs:
2020-06-23 09:11:59.661 7fe7087d7700 10 req 118010 0.000s s3:list_buckets scheduling with dmclock client=3 cost=1
2020-06-23 09:11:59.661 7fe7087d7700 0 req 118010 0.000s s3:list_buckets Scheduling request failed with -2218 // #define ERR_RATE_LIMITED 2218
Nov 07 19:45:04 data-xx-08 ceph-fcd6677e-xx-yy-zz-e0d55e53cea4-rgw-ssl-data-xx-08-sxdklt[14380]: 2022-11-07T19:45:04.347+0000 7f26afa26700 1 beast: 0x7f268655c600: 172.31.100.2 - - [07/Nov/2022:19:45:04.346 +0000] "GET /data/804/6a558494-xx-yy-zz-9297eb9bfeb4/d/06883/794 HTTP/1.1" 503 185 - latency=0.001000011s
Nov 07 19:45:04 data-xx-08 ceph-fcd6677e-xx-yy-zz-e0d55e53cea4-rgw-ssl-data-xx-08-sxdklt[14380]: 2022-11-07T19:45:04.487+0000 7f26afa26700 1 beast: 0x7f268655c600: 172.31.100.2 - - [07/Nov/2022:19:45:04.486 +0000] "GET /data/346/6a558494-xx-yy-zz-9297eb9bfeb4/d/23834/713 HTTP/1.1" 503 185 - latency=0.001000011s
Nov 07 19:45:04 data-xx-08 ceph-fcd6677e-xx-yy-zz-e0d55e53cea4-rgw-ssl-data-xx-08-sxdklt[14380]: 2022-11-07T19:45:04.588+0000 7f26afa26700 1 beast: 0x7f268655c600: 172.31.100.2 - - [07/Nov/2022:19:45:04.587 +0000] "GET /data/804/6a558494-xx-yy-zz-9297eb9bfeb4/d/06883/794 HTTP/1.1" 503 185 - latency=0.000000000s
- Excessive number of CLOSE-WAIT connections to the RGW
[data-xx-08 ~]# ss -anp | grep radosgw | grep "CLOSE-WAIT" | head
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:48100 users:(("radosgw",pid=2406701,fd=1645))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.1:33928 users:(("radosgw",pid=2406701,fd=1292))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:34637 users:(("radosgw",pid=2406701,fd=1421))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:33513 users:(("radosgw",pid=2406701,fd=1206))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:47564 users:(("radosgw",pid=2406701,fd=2512))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:44240 users:(("radosgw",pid=2406701,fd=1715))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.1:40585 users:(("radosgw",pid=2406701,fd=1699))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:45868 users:(("radosgw",pid=2406701,fd=1175))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:45845 users:(("radosgw",pid=2406701,fd=934))
tcp CLOSE-WAIT 25 0 172.31.100.168:444 172.31.100.20:54824 users:(("radosgw",pid=2406701,fd=2421))
[data-xx-08 ~]# ss -anp | grep radosgw | grep "CLOSE-WAIT" -c
1019
- There is an HA Proxy Load Balancer between the application server(s) and the Ceph RGW's (Rados GateWays)
- The HA Proxy does NOT use the options
timeout client
andoption http-server-close
Environment
Red Hat Ceph Storage (RHCS) 5.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.