Chapter 12. Operational Tools (monitoring, logging & alarms)
Monitoring traditional hardware based EPC mostly involved Element Managers(EMs) that were provided by the EPC vendor along with a set of generic monitoring tools and techniques including SNMP monitoring (using snmpget and snmpwalk), SNMP traps and Syslog. However, with the advent of vEPC, the solution is multi-layered and much more complex to monitor. In addition to monitoring the vEPC at the application layer (using EM or otherwise), we now have to monitor:
- Server Hardware: Power supply, Temperature, Fans, Disk, fabric errors etc.
- Host OS: Memory, CPU, Disk and I/O errors
- OpenStack: Service daemons, Instance reachability, Volumes, Hypervisor metrics, Nova/Compute metrics, Tenant metrics, Message Queues, Keystone Tokens and Notifications
It should be noted that the VNFM, if one is present also contributes to lifecycle management of the VNFs, restarting VNFs, starting new VNFs if scaling is required.
12.1. Logging
OpenStack provides numerous log files for each component. It is an important activity to monitor these log files. As an example, let us look at /var/log/nova/nova-compute.log:
2016-12-09 17:51:51.025 44510 INFO nova.compute.resource_tracker [req-336c8469-3c29-4b55-8917-36db3303bb72 - - - - -] Auditing locally available compute resources for node overcloud-compute-0.localdomain
2016-12-09 17:51:51.307 44510 INFO nova.compute.resource_tracker [req-336c8469-3c29-4b55-8917-36db3303bb72 - - - - -] Total usable vcpus: 48, total allocated vcpus: 0
2016-12-09 17:51:51.307 44510 INFO nova.compute.resource_tracker [req-336c8469-3c29-4b55-8917-36db3303bb72 - - - - -] Final resource view: name=overcloud-compute-0.localdomain phys_ram=130950MB used_ram=2048MB phys_disk=372GB used_disk=0GB total_vcpus=48 used_vcpus=0 pci_stats=[]
2016-12-09 17:51:51.325 44510 INFO nova.compute.resource_tracker [req-336c8469-3c29-4b55-8917-36db3303bb72 - - - - -] Compute_service record updated for overcloud-compute-0.localdomain:overcloud-compute-0.localdomain
If the compute node is healthy and is able to communicate with the controller node, we should see periodic log messages “Auditing locally available compute resources for node…” type of logs followed by actual report providing a snapshot of resources available on that compute node. If for some reason nova service is not healthy on that compute node or communication between the controller node and this compute node has failed, we will stop seeing these updates in nova-compute.log file. Of course this can also be observed on OpenStack Horizon dashboard when VMs are not schedules on the compute nodes that are considered “Out of service” by the controller node.
A complete list of log files can be found on Red Hat customer portal under each release. For Red Hat OpenStack Platfrom 10 it can be found at https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/paged/logging-monitoring-and-troubleshooting-guide/.
12.2. Monitoring
OpenStack also provides various KPIs that should be monitored apart from the log files.
With OpenStack, monitoring the metrics and events listed in the table below enables mobile operators with insights into the performance, availability and overall health of OpenStack.
Most useful metrics the following:
- hypervisor_load
- running_vms
- free_disk_gb
- free_ram_mb
- queue memory
- queue consumers
- consumer_utilisation
Details and a more complete listing of OpenStack KPIs can be found at https://www.datadoghq.com/blog/openstack-monitoring-nova/?ref=wikipedia.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.