Optimize ceilometer DB takes long time (+24 hrs)

Solution Verified - Updated -

Environment

Red Hat OpenStack

Issue

  • The ceilometer-expirer command is get hang state after +10 hours while it try to shrink the ceilometer DB size (sample.ibd).

Resolution

  • In primary steps, please refer KCS 2497621 article which is recommnded to configure for ceilometer-expirer.
  • If the ceilometer-expirer command gets hang after a long run, then it requires to manually delete the sample table from ceilometer DB.
$ MariaDB [ceilometer]> mysql -e "truncate table sample" ceilometer

Note- Sometimes due to larger size truncate command will not help to erase the data from sample table.

Workaround solution to truncate the sample table from ceilometer DB.

[1] Please verify the current status of ceilometer, gnocchi and aodh in the node.

# sudo systemctl list-unit-files | awk '/gnocchi|aodh|ceil/ {print $1}' | while read service;do echo $service ; sudo systemctl status $service; done

[2] Please disable and stop the running services of ceilometer, gnocchi and aodh

systemctl stop <service-name>
systemctl disable <service-name>

[3] Remove the config files for wsgi and then restart httpd.

# mkdir /root/httpd_conf_gnocchi
# mv /etc/httpd/conf.d/{10-aodh_wsgi.conf,10-ceilometer_wsgi.conf,10-gnocchi_wsgi.conf} /root/httpd_conf_gnocchi/
# systemctl restart httpd

[4] Rename the sample table to a dummy table like (temp1) and create new sample table dumy table (temp1). During rename steps, if the SQL rename command take more time then you need to restart the mariadb service and retry the below steps.

# mysql ceilometer
MariaDB [ceilometer]> rename table sample to temp1;
MariaDB [ceilometer]> create table sample like temp1;
MariaDB [ceilometer]> drop table temp1;

[5] Once sample table has truncated, then please start/enable stop services executed in step [2].

[6] Make sure after truncate the sample table, below ib data file size get reduced.

ls -lh /var/lib/mysql/ceilometer/sample.idb

Root Cause

  • In larger deployments sample.ibd increased to +25GB if the CRON job has not set properly which described in the article 2497621.
  • As the "ceilometer-expirer" operation would take too long and sometimes even fail because the transaction was too large

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments