OpenStack manage Ceph High Slow requests.
Issue
-
We are seeing very high slow request on OpenStack 13 managed ceph cluster, which is also fluctuating the state of ceph cluster health.
-
This creating problem to provision cluster on OpenStack environment, could anyone please help to investigate on this.
Every 2.0s: ceph -s Sun May 10 10:11:01 2020
cluster:
id: 0508166a-302c-11e7-bf96-141877347430
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
1 slow requests are blocked > 32 sec. Implicated osds 10
services:
mon: 3 daemons, quorum overcloud-controller-0,overcloud-controller-2,overcloud-controller-1
mgr: overcloud-controller-1(active), standbys: overcloud-controller-0, overcloud-controller-2
osd: 265 osds: 264 up, 264 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 13344 pgs
objects: 2.92M objects, 11.0TiB
usage: 32.7TiB used, 254TiB / 287TiB avail
pgs: 13344 active+clean
io:
client: 517MiB/s rd, 106MiB/s wr, 3.20kop/s rd, 10.27kop/s wr
- The following osds are logging suboptimal requests:
[root@overcloud-controller-0 ~]# grep -r 'slow' /var/log/messages|awk '/subop/ {split($NF,a,","); for(i=1;b=a[i];i++) { print "osd."b}; next; } { print $10}' | grep -v mon|sort -g | uniq -c | sort -k1 -n -r | head
7150 0
138 osd.205
98 osd.253
98 osd.216
50 osd.51
40 osd.76
40 osd.171
32 osd.49
28 osd.28
20 osd.83
- Slow requests are being logged:
[root@overcloud-controller-0 ~]# grep -r 'slow request' /var/log/messages|sed -e 's/^.*currently //' -e 's/from.*$//' | sort -g | uniq -c | sort -k1 -n -r | head
638 sub_op_commit_rec
36 op_applied
1 May 10 10:11:04 overcloud-controller-0 journal: debug 2020-05-10 10:11:04.089447 7faa0618d700 0 log_channel(cluster) log [INF] : Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec. Implicated osds 10)
1 May 10 10:11:04 overcloud-controller-0 journal: cluster 2020-05-10 10:11:04.089456 mon.overcloud-controller-0 mon.0 10.10.10.10:6789/0 217610 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec. Implicated osds 10)
1 May 10 10:11:04 overcloud-controller-0 docker: debug 2020-05-10 10:11:04.089447 7faa0618d700 0 log_channel(cluster) log [INF] : Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec. Implicated osds 10)
1 May 10 10:11:04 overcloud-controller-0 docker: cluster 2020-05-10 10:11:04.089456 mon.overcloud-controller-0 mon.0 10.10.10.10:6789/0 217610 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec. Implicated osds 10)
1 May 10 10:10:58 overcloud-controller-0 journal: debug 2020-05-10 10:10:58.030673 7faa0618d700 0 log_channel(cluster) log [WRN] : Health check failed: 1 slow requests are blocked > 32 sec. Implicated osds 10 (REQUEST_SLOW)
1 May 10 10:10:58 overcloud-controller-0 journal: cluster 2020-05-10 10:10:58.030683 mon.overcloud-controller-0 mon.0 10.10.10.10:6789/0 217609 : cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec. Implicated osds 10 (REQUEST_SLOW)
1 May 10 10:10:58 overcloud-controller-0 docker: debug 2020-05-10 10:10:58.030673 7faa0618d700 0 log_channel(cluster) log [WRN] : Health check failed: 1 slow requests are blocked > 32 sec. Implicated osds 10 (REQUEST_SLOW)
1 May 10 10:10:58 overcloud-controller-0 docker: cluster 2020-05-10 10:10:58.030683 mon.overcloud-controller-0 mon.0 10.10.10.10:6789/0 217609 : cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec. Implicated osds 10 (REQUEST_SLOW)
Environment
- Red Hat OpenStack Platform (RHOSP)
- Red Hat Ceph Storage 3.3 (RHCS)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.