Chapter 5. Capacity metering using the Telemetry service
The OpenStack Telemetry service provides usage metrics that can be leveraged for billing, charge-back, and show-back purposes. Such metrics data can also be used by third-party applications to plan for capacity on the cluster and can also be leveraged for auto-scaling virtual instances using OpenStack heat. For more information, see Auto Scaling for Instances.
The combination of ceilometer and Gnocchi can be used for monitoring and alarms. This is supported on small-size clusters and with known limitations. For real-time monitoring, Red Hat OpenStack Platform ships with agents that provide metrics data, and can be consumed by separate monitoring infrastructure and applications. For more information, see Monitoring Tools Configuration.
5.1. Viewing measures
To list all the measures for a particular resource:
# openstack metric measures show --resource-id UUID METER_NAME
To list only measures for a particular resource, within a range of timestamps:
# openstack metric measures show --aggregation mean --start START_TIME --end STOP_TIME --resource-id UUID METER_NAME
Where START_TIME and END_TIME are in the form iso-dateThh:mm:ss.
5.2. Creating measures
You can use measures to send data to the Telemetry service, and they do not need to correspond to a previously-defined meter. For example:
# gnocchi measures add -m 2015-01-12T17:56:23@42 --resource-id UUID METER_NAME
5.3. Example: Viewing cloud usage measures
This example shows the average memory usage of all instances for each project.
openstack metrics measures aggregation --resource-type instance --groupby project_id -m memoryView L3 --resource-id UUID
5.4. Example: Viewing L3 cache usage
If your Intel hardware and libvirt version supports Cache Monitoring Technology (CMT), you can use the cpu_l3_cache
meter to monitor the amount of L3 cache used by an instance.
Monitoring the L3 cache requires the following:
-
cmt
in theLibvirtEnabledPerfEvents
parameter. -
cpu_l3_cache
in thegnocchi_resources.yaml
file. -
cpu_l3_cache
in the Ceilometerpolling.yaml
file.
Enable L3 Cache Monitoring
To enable L3 cache monitoring:
Create a YAML file for telemetry (for example,
ceilometer-environment.yaml
) and addcmt
to theLibvirtEnabledPerfEvents
parameter.parameter_defaults: LibvirtEnabledPerfEvents: cmt
Launch the overcloud with this YAML file.
#!/bin/bash openstack overcloud deploy \ --templates \ <additional templates> \ -e /home/stack/ceilometer-environment.yaml
Verify that
cpu_l3_cache
is enabled in Gnocchi on the Compute node.$ sudo -i # podman exec -ti ceilometer_agent_compute cat /etc/ceilometer/gnocchi_resources.yaml | grep cpu_l3_cache
Verify that
cpu_l3_cache
is enabled for Telemetry polling.# podman exec -ti ceilometer_agent_compute cat /etc/ceilometer/polling.yaml | grep cpu_l3_cache
If
cpu_l3_cache
is not enabled for Telemetry, enable it, and restart the service.# podman exec -ti ceilometer_agent_compute echo " - cpu_l3_cache" >> /etc/ceilometer/polling.yaml # podman exec -ti ceilometer_agent_compute pkill -HUP -f "ceilometer.*master process"
NoteThis podman change will not persist over a reboot.
After you have launched a guest instance on this compute node, you can use the gnocchi measures show
command to monitor the CMT metrics.
(overcloud) [stack@undercloud-0 ~]$ gnocchi measures show --resource-id a6491d92-b2c8-4f6d-94ba-edc9dfde23ac cpu_l3_cache +---------------------------+-------------+-----------+ | timestamp | granularity | value | +---------------------------+-------------+-----------+ | 2017-10-25T09:40:00+00:00 | 300.0 | 1966080.0 | | 2017-10-25T09:45:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T09:50:00+00:00 | 300.0 | 2129920.0 | | 2017-10-25T09:55:00+00:00 | 300.0 | 1966080.0 | | 2017-10-25T10:00:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T10:05:00+00:00 | 300.0 | 2195456.0 | | 2017-10-25T10:10:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T10:15:00+00:00 | 300.0 | 1998848.0 | | 2017-10-25T10:20:00+00:00 | 300.0 | 2097152.0 | | 2017-10-25T10:25:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T10:30:00+00:00 | 300.0 | 1966080.0 | | 2017-10-25T10:35:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T10:40:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T10:45:00+00:00 | 300.0 | 1933312.0 | | 2017-10-25T10:50:00+00:00 | 300.0 | 2850816.0 | | 2017-10-25T10:55:00+00:00 | 300.0 | 2359296.0 | | 2017-10-25T11:00:00+00:00 | 300.0 | 2293760.0 | +---------------------------+-------------+-----------+
5.5. Viewing existing alarms
To list the existing Telemetry alarms, use the aodh
command. For example:
# aodh alarm list +--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+ | alarm_id | type | name | state | severity | enabled | +--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+ | 922f899c-27c8-4c7d-a2cf-107be51ca90a | gnocchi_aggregation_by_resources_threshold | iops-monitor-read-requests | insufficient data | low | True | +--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
To list the meters assigned to a resource, specify the UUID of the resource (an instance, image, or volume, among others). For example:
# gnocchi resource show 5e3fcbe2-7aab-475d-b42c-a440aa42e5ad
5.6. Creating an alarm
You can use aodh
to create an alarm that activates when a threshold value is reached. In this example, the alarm activates and adds a log entry when the average CPU utilization for an individual instance exceeds 80%. A query is used to isolate the specific instance’s id (94619081-abf5-4f1f-81c7-9cedaa872403
) for monitoring purposes:
# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name cpu_usage_high --metric cpu_util --threshold 80 --aggregation-method sum --resource-type instance --query '{"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}}' --alarm-action 'log://' +---------------------------+-------------------------------------------------------+ | Field | Value | +---------------------------+-------------------------------------------------------+ | aggregation_method | sum | | alarm_actions | [u'log://'] | | alarm_id | b794adc7-ed4f-4edb-ace4-88cbe4674a94 | | comparison_operator | eq | | description | gnocchi_aggregation_by_resources_threshold alarm rule | | enabled | True | | evaluation_periods | 1 | | granularity | 60 | | insufficient_data_actions | [] | | metric | cpu_util | | name | cpu_usage_high | | ok_actions | [] | | project_id | 13c52c41e0e543d9841a3e761f981c20 | | query | {"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}} | | repeat_actions | False | | resource_type | instance | | severity | low | | state | insufficient data | | state_timestamp | 2016-12-09T05:18:53.326000 | | threshold | 80.0 | | time_constraints | [] | | timestamp | 2016-12-09T05:18:53.326000 | | type | gnocchi_aggregation_by_resources_threshold | | user_id | 32d3f2c9a234423cb52fb69d3741dbbc | +---------------------------+-------------------------------------------------------+
To edit an existing threshold alarm, use the aodh alarm update
command. For example, to increase the alarm threshold to 75%:
# aodh alarm update --name cpu_usage_high --threshold 75
5.7. Disabling or deleting an alarm
To disable an alarm:
# aodh alarm update --name cpu_usage_high --enabled=false
To delete an alarm:
# aodh alarm delete --name cpu_usage_high
5.8. Example: Monitoring the disk activity of instances
The following example demonstrates how to use an Aodh alarm to monitor the cumulative disk activity for all the instances contained within a particular project.
1. Review the existing projects, and select the appropriate UUID of the project you need to monitor. This example uses the admin
tenant:
$ openstack project list +----------------------------------+----------+ | ID | Name | +----------------------------------+----------+ | 745d33000ac74d30a77539f8920555e7 | admin | | 983739bb834a42ddb48124a38def8538 | services | | be9e767afd4c4b7ead1417c6dfedde2b | demo | +----------------------------------+----------+
2. Use the project’s UUID to create an alarm that analyses the sum()
of all read requests generated by the instances in the admin
tenant (the query can be further restrained with the --query
parameter).
# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name iops-monitor-read-requests --metric disk.read.requests.rate --threshold 42000 --aggregation-method sum --resource-type instance --query '{"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}}' +---------------------------+-----------------------------------------------------------+ | Field | Value | +---------------------------+-----------------------------------------------------------+ | aggregation_method | sum | | alarm_actions | [] | | alarm_id | 192aba27-d823-4ede-a404-7f6b3cc12469 | | comparison_operator | eq | | description | gnocchi_aggregation_by_resources_threshold alarm rule | | enabled | True | | evaluation_periods | 1 | | granularity | 60 | | insufficient_data_actions | [] | | metric | disk.read.requests.rate | | name | iops-monitor-read-requests | | ok_actions | [] | | project_id | 745d33000ac74d30a77539f8920555e7 | | query | {"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}} | | repeat_actions | False | | resource_type | instance | | severity | low | | state | insufficient data | | state_timestamp | 2016-11-08T23:41:22.919000 | | threshold | 42000.0 | | time_constraints | [] | | timestamp | 2016-11-08T23:41:22.919000 | | type | gnocchi_aggregation_by_resources_threshold | | user_id | 8c4aea738d774967b4ef388eb41fef5e | +---------------------------+-----------------------------------------------------------+
5.9. Example: Monitoring CPU usage
If you want to monitor an instance’s performance, you would start by examining the Gnocchi database to identify which metrics you can monitor, such as memory or CPU usage. For example, run gnocchi resource show
against an instance to identify which metrics can be monitored:
Query the available metrics for a particular instance UUID:
$ gnocchi resource show --type instance d71cdf9a-51dc-4bba-8170-9cd95edd3f66 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ | created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | display_name | test-instance | | ended_at | None | | flavor_id | 14c7c918-df24-481c-b498-0d3ec57d2e51 | | flavor_name | m1.tiny | | host | overcloud-compute-0 | | id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | image_ref | e75dff7b-3408-45c2-9a02-61fbfbf054d7 | | metrics | compute.instance.booting.time: c739a70d-2d1e-45c1-8c1b-4d28ff2403ac | | | cpu.delta: 700ceb7c-4cff-4d92-be2f-6526321548d6 | | | cpu: 716d6128-1ea6-430d-aa9c-ceaff2a6bf32 | | | cpu_l3_cache: 3410955e-c724-48a5-ab77-c3050b8cbe6e | | | cpu_util: b148c392-37d6-4c8f-8609-e15fc15a4728 | | | disk.allocation: 9dd464a3-acf8-40fe-bd7e-3cb5fb12d7cc | | | disk.capacity: c183d0da-e5eb-4223-a42e-855675dd1ec6 | | | disk.ephemeral.size: 15d1d828-fbb4-4448-b0f2-2392dcfed5b6 | | | disk.iops: b8009e70-daee-403f-94ed-73853359a087 | | | disk.latency: 1c648176-18a6-4198-ac7f-33ee628b82a9 | | | disk.read.bytes.rate: eb35828f-312f-41ce-b0bc-cb6505e14ab7 | | | disk.read.bytes: de463be7-769b-433d-9f22-f3265e146ec8 | | | disk.read.requests.rate: 588ca440-bd73-4fa9-a00c-8af67262f4fd | | | disk.read.requests: 53e5d599-6cad-47de-b814-5cb23e8aaf24 | | | disk.root.size: cee9d8b1-181e-4974-9427-aa7adb3b96d9 | | | disk.usage: 4d724c99-7947-4c6d-9816-abbbc166f6f3 | | | disk.write.bytes.rate: 45b8da6e-0c89-4a6c-9cce-c95d49d9cc8b | | | disk.write.bytes: c7734f1b-b43a-48ee-8fe4-8a31b641b565 | | | disk.write.requests.rate: 96ba2f22-8dd6-4b89-b313-1e0882c4d0d6 | | | disk.write.requests: 553b7254-be2d-481b-9d31-b04c93dbb168 | | | memory.bandwidth.local: 187f29d4-7c70-4ae2-86d1-191d11490aad | | | memory.bandwidth.total: eb09a4fc-c202-4bc3-8c94-aa2076df7e39 | | | memory.resident: 97cfb849-2316-45a6-9545-21b1d48b0052 | | | memory.swap.in: f0378d8f-6927-4b76-8d34-a5931799a301 | | | memory.swap.out: c5fba193-1a1b-44c8-82e3-9fdc9ef21f69 | | | memory.usage: 7958d06d-7894-4ca1-8c7e-72ba572c1260 | | | memory: a35c7eab-f714-4582-aa6f-48c92d4b79cd | | | perf.cache.misses: da69636d-d210-4b7b-bea5-18d4959e95c1 | | | perf.cache.references: e1955a37-d7e4-4b12-8a2a-51de4ec59efd | | | perf.cpu.cycles: 5d325d44-b297-407a-b7db-cc9105549193 | | | perf.instructions: 973d6c6b-bbeb-4a13-96c2-390a63596bfc | | | vcpus: 646b53d0-0168-4851-b297-05d96cc03ab2 | | original_resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | project_id | 3cee262b907b4040b26b678d7180566b | | revision_end | None | | revision_start | 2017-11-16T04:00:27.081865+00:00 | | server_group | None | | started_at | 2017-11-16T01:09:20.668344+00:00 | | type | instance | | user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | +-----------------------+---------------------------------------------------------------------+
In this result, the
metrics
value lists the components you can monitor using Aodh alarms, for examplecpu_util
.To monitor CPU usage, you will need the
cpu_util
metric. To see more information on this metric:$ gnocchi metric show --resource d71cdf9a-51dc-4bba-8170-9cd95edd3f66 cpu_util +------------------------------------+-------------------------------------------------------------------+ | Field | Value | +------------------------------------+-------------------------------------------------------------------+ | archive_policy/aggregation_methods | std, count, min, max, sum, mean | | archive_policy/back_window | 0 | | archive_policy/definition | - points: 8640, granularity: 0:05:00, timespan: 30 days, 0:00:00 | | archive_policy/name | low | | created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | id | b148c392-37d6-4c8f-8609-e15fc15a4728 | | name | cpu_util | | resource/created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | resource/created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | resource/creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | resource/ended_at | None | | resource/id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource/original_resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource/project_id | 3cee262b907b4040b26b678d7180566b | | resource/revision_end | None | | resource/revision_start | 2017-11-17T00:05:27.516421+00:00 | | resource/started_at | 2017-11-16T01:09:20.668344+00:00 | | resource/type | instance | | resource/user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | | unit | None | +------------------------------------+-------------------------------------------------------------------+
-
archive_policy
- Defines the aggregation interval for calculating thestd, count, min, max, sum, mean
values.
-
Use Aodh to create a monitoring task that queries
cpu_util
. This task will trigger events based on the settings you specify. For example, to raise a log entry when an instance’s CPU spikes over 80% for an extended duration:aodh alarm create \ --project-id 3cee262b907b4040b26b678d7180566b \ --name high-cpu \ --type gnocchi_resources_threshold \ --description 'High CPU usage' \ --metric cpu_util \ --threshold 80.0 \ --comparison-operator ge \ --aggregation-method mean \ --granularity 300 \ --evaluation-periods 1 \ --alarm-action 'log://' \ --ok-action 'log://' \ --resource-type instance \ --resource-id d71cdf9a-51dc-4bba-8170-9cd95edd3f66 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | aggregation_method | mean | | alarm_actions | [u'log://'] | | alarm_id | 1625015c-49b8-4e3f-9427-3c312a8615dd | | comparison_operator | ge | | description | High CPU usage | | enabled | True | | evaluation_periods | 1 | | granularity | 300 | | insufficient_data_actions | [] | | metric | cpu_util | | name | high-cpu | | ok_actions | [u'log://'] | | project_id | 3cee262b907b4040b26b678d7180566b | | repeat_actions | False | | resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource_type | instance | | severity | low | | state | insufficient data | | state_reason | Not evaluated yet | | state_timestamp | 2017-11-16T05:20:48.891365 | | threshold | 80.0 | | time_constraints | [] | | timestamp | 2017-11-16T05:20:48.891365 | | type | gnocchi_resources_threshold | | user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | +---------------------------+--------------------------------------+
-
comparison-operator
- Thege
operator defines that the alarm will trigger if the CPU usage is greater than (or equal to) 80%. -
granularity
- Metrics have an archive policy associated with them; the policy can have various granularities (for example, 5 minutes aggregation for 1 hour + 1 hour aggregation over a month). Thegranularity
value must match the duration described in the archive policy. -
evaluation-periods
- Number ofgranularity
periods that need to pass before the alarm will trigger. For example, setting this value to2
will mean that the CPU usage will need to be over 80% for two polling periods before the alarm will trigger. [u’log://']
- This value will log events to your Aodh log file.NoteYou can define different actions to run when an alarm is triggered (
alarm_actions
), and when it returns to a normal state (ok_actions
), such as a webhook URL.
-
To check if your alarm has been triggered, query the alarm’s history:
aodh alarm-history show 1625015c-49b8-4e3f-9427-3c312a8615dd --fit-width +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ | timestamp | type | detail | event_id | +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ | 2017-11-16T05:21:47.850094 | state transition | {"transition_reason": "Transition to ok due to 1 samples inside threshold, most recent: 0.0366665763", "state": "ok"} | 3b51f09d-ded1-4807-b6bb-65fdc87669e4 | +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
5.10. Managing resource types
Telemetry resource types that were previously hardcoded can now be managed by the gnocchi client. You can use the Gnocchi client to create, view, and delete resource types, and you can use the Gnocchi API to update or delete attributes.
1. Create a new resource-type:
$ gnocchi resource-type create testResource01 -a bla:string:True:min_length=123 +----------------+------------------------------------------------------------+ | Field | Value | +----------------+------------------------------------------------------------+ | attributes/bla | max_length=255, min_length=123, required=True, type=string | | name | testResource01 | | state | active | +----------------+------------------------------------------------------------+
2. Review the configuration of the resource-type:
$ gnocchi resource-type show testResource01 +----------------+------------------------------------------------------------+ | Field | Value | +----------------+------------------------------------------------------------+ | attributes/bla | max_length=255, min_length=123, required=True, type=string | | name | testResource01 | | state | active | +----------------+------------------------------------------------------------+
3. Delete the resource-type:
$ gnocchi resource-type delete testResource01
You cannot delete a resource type if a resource is using it.