OpenStack FFU from 10 to 13 times out when Ceph allocated in one or more OSDs more than 250 PGs
Environment
OpenStack FFU from OSP10 to OSP13.
Issue
When running Fast Forward Upgrade for OpenStack from 10 to 13, the Ceph upgrade might time out in case the max PGs per OSD warning limit is surpassed.
Resolution
Create a Heat environment file with the following snippet:
parameter_defaults:
CephConfigOverrides:
global:
mon_max_pg_per_osd: 400
The new limit needs to be higher than the highest number of PGs allocated on any given OSD. In this example, it was set to 400.
The newly created environment file should be appended with -e
to the overcloud ffwd-upgrade prepare
, overcloud ceph-upgrade run
and overcloud ffwd-upgrade converge
commands and also reused for any subsequent overcloud deploy
command given in the future or merged into any of the pre-existing custom environment files created to configure the Ceph deployment.
Root Cause
The number of PGs allocated on one or more of the OSDs in the Ceph Jewel cluster, initially deployed with OSP10, is higher than 250, which is the default limit set in Ceph Luminous.
Diagnostic Steps
The following command can be used to gather the exact number of PGs allocated in every OSD:
# ceph osd df tree
It will produce an output similar to the following:
# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS TYPE NAME
-1 0.17532 - 180GiB 73.5GiB 99.9MiB 0B 73.4GiB 106GiB 40.84 1.00 - root default
-7 0.05844 - 60.0GiB 24.3GiB 33.3MiB 0B 24.3GiB 35.6GiB 40.59 0.99 - host ceph-0
0 hdd 0.01169 1.00000 12.0GiB 3.82GiB 6.56MiB 0B 3.81GiB 8.18GiB 31.82 0.78 60 osd.0
4 hdd 0.01169 1.00000 12.0GiB 8.62GiB 6.75MiB 0B 8.61GiB 3.38GiB 71.84 1.76 71 osd.4
6 hdd 0.01169 1.00000 12.0GiB 3.06GiB 6.56MiB 0B 3.05GiB 8.94GiB 25.48 0.62 43 osd.6
10 hdd 0.01169 1.00000 12.0GiB 5.65GiB 6.69MiB 0B 5.64GiB 6.35GiB 47.08 1.15 53 osd.10
13 hdd 0.01169 1.00000 12.0GiB 3.20GiB 6.75MiB 0B 3.20GiB 8.79GiB 26.71 0.65 61 osd.13
-3 0.05844 - 60.0GiB 25.2GiB 33.3MiB 0B 25.1GiB 34.8GiB 41.97 1.03 - host ceph-1
2 hdd 0.01169 1.00000 12.0GiB 1.33GiB 6.56MiB 0B 1.32GiB 10.7GiB 11.09 0.27 41 osd.2
5 hdd 0.01169 1.00000 12.0GiB 5.18GiB 6.88MiB 0B 5.17GiB 6.82GiB 43.17 1.06 50 osd.5
8 hdd 0.01169 1.00000 12.0GiB 10.7GiB 6.56MiB 0B 10.7GiB 1.30GiB 89.20 2.18 73 osd.8
11 hdd 0.01169 1.00000 12.0GiB 3.65GiB 6.69MiB 0B 3.64GiB 8.34GiB 30.43 0.75 66 osd.11
14 hdd 0.01169 1.00000 12.0GiB 4.32GiB 6.62MiB 0B 4.31GiB 7.68GiB 35.99 0.88 58 osd.14
-5 0.05844 - 60.0GiB 24.0GiB 33.3MiB 0B 23.9GiB 36.0GiB 39.95 0.98 - host ceph-2
1 hdd 0.01169 1.00000 12.0GiB 5.02GiB 6.56MiB 0B 5.01GiB 6.98GiB 41.83 1.02 58 osd.1
3 hdd 0.01169 1.00000 12.0GiB 7.06GiB 6.62MiB 0B 7.05GiB 4.94GiB 58.85 1.44 65 osd.3
7 hdd 0.01169 1.00000 12.0GiB 3.81GiB 6.75MiB 0B 3.81GiB 8.18GiB 31.79 0.78 62 osd.7
9 hdd 0.01169 1.00000 12.0GiB 3.97GiB 6.62MiB 0B 3.96GiB 8.03GiB 33.06 0.81 58 osd.9
12 hdd 0.01169 1.00000 12.0GiB 4.10GiB 6.75MiB 0B 4.10GiB 7.89GiB 34.20 0.84 45 osd.12
TOTAL 180GiB 73.5GiB 99.9MiB 0B 73.4GiB 106GiB 40.84
MIN/MAX VAR: 0.27/2.18 STDDEV: 18.96
Where for each OSD the number of PGs hosted is shown under the PGS
column.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments