Chapter 13. Hardening infrastructure and virtualization

You can harden your physical and virtual environment to better protect against internal and external threats.

13.1. Harware for Red Hat OpenStack Platform

When you add hardware for use in your cloud environment, ensure that it is supported that hardware virtualization is supported. Disable hardware features that you do not use.

Procedure

  1. Check Certified hardware for Red Hat OpenStack to ensure your hardware is supported.
  2. Check that hardware virtualization is available and enabled:

    cat /proc/cpuinfo | egrep "vmx|svm"
  3. Ensure that all firmware is up-to-date on your hardware platform. See hardware vendor documentation for details.

13.2. Software updates in a cloud environment

Keep Red Hat OpenStack Platform (RHOSP) updated for security, performance and supportability.

  • If there are kernel updates included when you update, you must reboot the physical system or instance that you updated.
  • Update OpenStack Image (glance) images to ensure that newly created instances have the latest updates.
  • If you selectively update packages on RHOSP, ensure that all security updates are included. For more information about the latest vulnerabilities and security updates, see:

13.3. Updating SSH keys in your OpenStack environment

Additionally, you must update your SSH keys if they are less than 2048 bits in the following scenarios:

  • You are upgrading your Red Hat OpenStack Platform (RHOSP) cluster to RHEL 9.2 during an upgrade from RHOSP 16.2 to 17.1
  • You are updating from a RHOSP 17.0 or 17.1 minor release to the latest RHOSP 17.1 minor release

Run the ssh_key_rotation.yaml Ansible Playbook to safely automate the process of rotating your SSH keys. On the overcloud, backup keys are stored in the following directory:

/home/{{ ansible_user_id }}/backup_keys/{{ ansible_date_time.epoch }}/authorized_keys"

Prerequisites

  • You have a fully installed RHOSP environment.

Procedure

  1. Log in to RHOSP director:

    ssh stack@director
  2. Run the ansible-playbook ssh_key_rotation.yaml:

    $  ansible-playbook \
    -i /home/stack/overcloud-deploy/<stack_name>/tripleo-ansible-inventory.yaml \ 1
    -e undercloud_backup_folder=/home/stack/overcloud_backup_keys \ 2
    /usr/share/ansible/tripleo-playbooks/ssh_key_rotation.yaml
    1
    Replace <stack_name> with the name of your overcloud stack.
    2
    Specify the backup directory of your choice in the stack home directory.
    Note

    If you have a single-celled deployment, you have completed this procedure. If you have more than one cell, you must continue.

  3. Rerun the playbook for each additional cell that you have:

    ansible-playbook \
    -i /home/stack/overcloud-deploy/<stack_name>/tripleo-ansible-inventory.yaml \ 1
    -e rotate_undercloud_key=false \ 2
    -e ansible_ssh_private_key_file=/home/stack/overcloud_backup_keys/id_rsa 3
    tripleo-ansible/playbooks/ssh_key_rotation.yaml
    1
    Replace <stack_name> with the stack name of the cell. Each cell has a different name.
    2
    You must ensure that the undercloud key is not rotated by setting the rotate_undercloud_key parameter to false.
    3
    Point to your SSH backup key so that you can authenticate into the Compute hosts in other cells after the old SSH key is rotated out.

13.4. Limiting hardware and software features

Enable only the hardware and software features that you use, so that less code is exposed to the possibility of attack. There are some features that you should only enable in trusted environments.

PCI passthrough
PCI passthrough allows an instance to have direct access to a PCI device on a node. An instance with PCI device access may allow a malicious actor to make modifications to the firmware. Additionally, some PCI devices have direct memory access (DMA). When you give an instance control over a device with DMA, it can gain arbitrary physical memory access.

You must enable PCI passthrough for specific use cases, such as Network Functions Virtualization (NFV). Do not enable PCI passthrough unless it is necessary for your deployment.

Kernel same-page merging
Kernel same-page merging (KSM) is a feature that reduces the use of memory through the deduplication and sharing of memory pages. When two or more virtual machines have identical pages in memory, those pages can be shared allowing for higher density. Memory deduplication strategies are vulnerable to side-channel attacks and should only be used in trusted environments. In Red Hat OpenStack Platform, KSM is disabled by default.

13.5. Selinux on Red Hat OpenStack Platform

Security-Enhanced Linux (SELinux) is an implementation of mandatory access control (MAC). MAC limits the impact of an attack by restricting what a process or application is permitted to do on a system. For more information on SELinux, see What is SELinux?.

SELinux policies have been pre-configured for Red Hat OpenStack Platform (RHOSP) services. On RHOSP, SELinux is configured to run each QEMU process under a separate security context. In RHOSP, SELinux policies help protect hypervisor hosts and instances against the following threats:

Hypervisor threats
A compromised application running within an instance attacks the hypervisor to access underlying resources. If an instance is able to access the hypervisor OS, physical devices and other applications can become targets. This threat represents considerable risk. A compromise on a hypervisor can also compromise firmware, other instances, and network resources.
Instance threats
A compromised application running within an instance attacks the hypervisor to access or control another instance and its resources, or instance file images. The administrative strategies for protecting real networks do not apply directly to virtual environments. Because every instance is a process labeled by SELinux, there is a security boundary around each instance, enforced by the Linux kernel.

In RHOSP, instance image files on disk are labeled with SELinux data type svirt_image_t. When the instance is powered on, SELinux appends a random numerical identifier to the image. A random numerical identifier can prevent a compromised OpenStack instance from gaining unauthorized access to other containers. SELinux is capable of assigning up to 524,288 numeric identifiers on each hypervisor node.

13.6. Investigating containerized services

The OpenStack services that come with Red Hat OpenStack Platform run within containers. Containerization allows for the development and upgrade of services without dependency related conflicts. When a service runs within a container, potential vulnerabilities to that service are also contained.

You can get information about the service that are running in your environment by using the following steps:

Procedure

  • Use `podman inspect to get information, such as bind mounted host directories:

    Example:

    $ sudo podman inspect <container_name> | less

    Replace <container_name> with the name of your container. For example, nova compute.

  • Check the logs for the service located in /var/log/containers:

    Example:

    sudo less /var/log/containers/nova/nova-compute.log
  • Run an interactive CLI session within the container:

    Example:

    podman exec -it nova_compute /bin/bash
    Note

    You can make changes to the service for testing purposes directly within the container. All changes are lost when the container is restarted.

13.7. Making temporary changes to containerized services

You can make changes to containerized services that persist when the container is restarted, but that do not affect the permanent configuration of your Red Hat OpenStack Platform (RHOSP) cluster. This is useful for testing configuration changes, or enabling debug-level logs when troubleshooting. You can revert changes manually. Alternatively, running a redeploy on your RHOSP cluster resets all parameters to their permanent configurations.

Use configuration files that are located in /var/lib/config-data/puppet-generated/[service] to make temporary changes to a service. The following example enables debugging on the nova service:

Procedure

  1. Edit the nova.conf configuration file that is bind mounted to the nova_compute container. Set the value of the debug parameter to True:

    $ sudo sed -i 's/^debug=.*/debug=True' \
    /var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf
    Warning

    Configuration files for OpenStack files are ini files with multiple sections, such as [DEFAULT] and [database]. Parameters that are unique to each section might not be unique across the entire file. Use sed with caution. You can check to see if a parameter appears more than once in a configuration file by running egrep -v "^$|^#" [configuration_file] | grep [parameter].

  2. Restart the nova container:

    sudo podman restart nova_compute

13.8. Making permanent changes to containerized services

You can make permanent changes to containerized services in Red Hat OpenStack Platform (RHOSP) services with heat. Use an existing template that you used when you first deployed RHOSP, or create a new template to add to your deployment script. In the following example, the private key size for libvirt is increased to 4096.

Procedure

  1. Create a new yaml template called libvirt-keysize.yaml, and use the LibvirtCertificateKeySize parameter to increase the default value from 2048 to 4096.

    cat > /home/stack/templates/libvirt-keysize.yaml
    parameter_defaults:
            LibvirtCertificateKeySize: 4096
    EOF
  2. Add the libvirt-keysize.yaml configuration file to your deployment script:

    openstack overcloud deploy --templates \
    ...
    -e /home/stack/templates/libvirt-keysize.yaml
    ...
  3. Rerun the deployment script:

    ./deploy.sh

13.9. Firmware updates

Physical servers use complex firmware to enable and operate server hardware and lights-out management cards, which can have their own security vulnerabilities, potentially allowing system access and interruption. To address these, hardware vendors will issue firmware updates, which are installed separately from operating system updates. You will need an operational security process that retrieves, tests, and implements these updates on a regular schedule, noting that firmware updates often require a reboot of physical hosts to become effective.

13.10. Use SSH banner text

You can set a banner that displays a console message to all users that connect over SSH. You can add banner text to /etc/issue using the following parameters in an environment file. Consider customizing this sample text to suit your requirements.

resource_registry:
  OS::TripleO::Services::Sshd:
    /usr/share/openstack-tripleo-heat-templates/deployment/sshd/sshd-baremetal-puppet.yaml

parameter_defaults:
  BannerText: |
   ******************************************************************
   * This system is for the use of authorized users only. Usage of  *
   * this system may be monitored and recorded by system personnel. *
   * Anyone using this system expressly consents to such monitoring *
   * and is advised that if such monitoring reveals possible        *
   * evidence of criminal activity, system personnel may provide    *
   * the evidence from such monitoring to law enforcement officials.*
   ******************************************************************

To apply this change to your deployment, save the settings as a file called ssh_banner.yaml, and then pass it to the overcloud deploy command as follows. The <full environment> indicates that you must still include all of your original deployment parameters. For example:

    openstack overcloud deploy --templates \
      -e <full environment> -e  ssh_banner.yaml

13.11. Audit for system events

Maintaining a record of all audit events helps you establish a system baseline, perform troubleshooting, or analyze the sequence of events that led to a certain outcome. The audit system is capable of logging many types of events, such as changes to the system time, changes to Mandatory/Discretionary Access Control, and creating/deleting users or groups.

Rules can be created using an environment file, which are then injected by director into /etc/audit/audit.rules. For example:

    resource_registry:
      OS::TripleO::Services::AuditD: /usr/share/openstack-tripleo-heat-templates/deployment/auditd/auditd-baremetal-puppet.yaml
    parameter_defaults:
      AuditdRules:
        'Record Events that Modify User/Group Information':
          content: '-w /etc/group -p wa -k audit_rules_usergroup_modification'
          order  : 1
        'Collects System Administrator Actions':
          content: '-w /etc/sudoers -p wa -k actions'
          order  : 2
        'Record Events that Modify the Systems Mandatory Access Controls':
          content: '-w /etc/selinux/ -p wa -k MAC-policy'
          order  : 3

13.12. Manage firewall rules

Firewall rules are automatically applied on overcloud nodes during deployment, and are intended to only expose the ports required to get OpenStack working. You can specify additional firewall rules as needed. For example, to add rules for a Zabbix monitoring system:

parameter_defaults:
  ControllerExtraConfig:
    ExtraFirewallRules:
      '301 allow zabbix':
      dport: 10050
      proto: tcp
      source: 10.0.0.8
Note

When you do not set the action parameter, the result is accept. You can only set the action parameter to drop, insert, or append.

You can also add rules that restrict access. The number used during rule definition will determine the rule’s precedence. For example, RabbitMQ’s rule number is 109 by default. If you want to restrain it, you switch it to use a lower value:

parameter_defaults:
  ControllerParameters
    ExtraFirewallRules:
      '098 allow rabbit from internalapi network':
        dport: [4369,5672,25672]
        proto: tcp
        source: 10.0.0.0/24
      '099 drop other rabbit access:
        dport: [4369,5672,25672]
        proto: tcp
        action: drop

In this example, 098 and 099 are arbitrarily chosen numbers that are lower than RabbitMQ’s rule number 109. To determine a rule’s number, you can inspect the iptables rule on the appropriate node; for RabbitMQ, you would check the controller:

iptables-save
[...]
-A INPUT -p tcp -m multiport --dports 4369,5672,25672 -m comment --comment "109 rabbitmq" -m state --state NEW -j ACCEPT

Alternatively, you can extract the port requirements from the puppet definition. For example, RabbitMQ’s rules are stored in puppet/services/rabbitmq.yaml:

    ExtraFirewallRules:
      '109 rabbitmq':
        dport:
          - 4369
          - 5672
          - 25672

The following parameters can be set for a rule:

  • dport: The destination port associated to the rule.
  • sport: The source port associated to the rule.
  • proto: The protocol associated to the rule. Defaults to tcp
  • action: The action policy associated to the rule. Defaults to INSERT and sets the jump to ACCEPTS.
  • state: Array of states associated to the rule. Default to [NEW]
  • source: The source IP address associated to the rule.
  • interface: The network interface associated to the rule.
  • chain: The chain associated to the rule. Default to INPUT
  • destination: The destination cidr associated to the rule.

13.13. Intrusion detection with AIDE

AIDE (Advanced Intrusion Detection Environment) is a file and directory integrity checker. It is used to detect incidents of unauthorized file tampering or changes. For example, AIDE can alert you if system password files are changed.

AIDE works by analyzing system files and then compiling an integrity database of file hashes. The database then serves as a comparison point to verify the integrity of the files and directories and detect changes.

The director includes the AIDE service, allowing you to add entries into an AIDE configuration, which is then used by the AIDE service to create an integrity database. For example:

  resource_registry:
    OS::TripleO::Services::Aide:
      /usr/share/openstack-tripleo-heat-templates/deployment/aide/aide-baremetal-ansible.yaml

  parameter_defaults:
    AideRules:
      'TripleORules':
        content: 'TripleORules = p+sha256'
        order: 1
      'etc':
        content: '/etc/ TripleORules'
        order: 2
      'boot':
        content: '/boot/ TripleORules'
        order: 3
      'sbin':
        content: '/sbin/ TripleORules'
        order: 4
      'var':
        content: '/var/ TripleORules'
        order: 5
      'not var/log':
        content: '!/var/log.*'
        order: 6
      'not var/spool':
        content: '!/var/spool.*'
        order: 7
      'not nova instances':
        content: '!/var/lib/nova/instances.*'
        order: 8
Note

The above example is not actively maintained or benchmarked, so you should select the AIDE values that suit your requirements.

  1. An alias named TripleORules is declared to avoid having to repeatedly out the same attributes each time.
  2. The alias receives the attributes of p+sha256. In AIDE terms, this reads as the following instruction: monitor all file permissions p with an integrity checksum of sha256.

For a complete list of attributes available for AIDE’s config files, see the AIDE MAN page at https://aide.github.io/.

Complete the following to apply changes to your deployment:

  1. Save the settings as a file called aide.yaml in the /home/stack/templates/ directory.
  2. Edit the aide.yaml environment file to have the parameters and values suitable for your environment.
  3. Include the /home/stack/templates/aide.yaml environment file in the openstack overcloud deploy command, along with all other necessary heat templates and environment files specific to your environment:

    openstack overcloud deploy --templates
    ...
    -e /home/stack/templates/aide.yaml

13.13.1. Using complex AIDE rules

Complex rules can be created using the format described previously. For example:

    MyAlias = p+i+n+u+g+s+b+m+c+sha512

The above would translate as the following instruction: monitor permissions, inodes, number of links, user, group, size, block count, mtime, ctime, using sha256 for checksum generation.

Note, the alias should always have an order position of 1, which means that it is positioned at the top of the AIDE rules and is applied recursively to all values below.

Following after the alias are the directories to monitor. Note that regular expressions can be used. For example we set monitoring for the var directory, but overwrite with a not clause using ! with '!/var/log.*' and '!/var/spool.*'.

13.13.2. Additional AIDE values

The following AIDE values are also available:

AideConfPath: The full POSIX path to the aide configuration file, this defaults to /etc/aide.conf. If no requirement is in place to change the file location, it is recommended to stick with the default path.

AideDBPath: The full POSIX path to the AIDE integrity database. This value is configurable to allow operators to declare their own full path, as often AIDE database files are stored off node perhaps on a read only file mount.

AideDBTempPath: The full POSIX path to the AIDE integrity temporary database. This temporary files is created when AIDE initializes a new database.

AideHour: This value is to set the hour attribute as part of AIDE cron configuration.

AideMinute: This value is to set the minute attribute as part of AIDE cron configuration.

AideCronUser: This value is to set the linux user as part of AIDE cron configuration.

AideEmail: This value sets the email address that receives AIDE reports each time a cron run is made.

AideMuaPath: This value sets the path to the Mail User Agent that is used to send AIDE reports to the email address set within AideEmail.

13.13.3. Cron configuration for AIDE

The AIDE director service allows you to configure a cron job. By default, it will send reports to /var/log/audit/; if you want to use email alerts, then enable the AideEmail parameter to send the alerts to the configured email address. Note that a reliance on email for critical alerts can be vulnerable to system outages and unintentional message filtering.

13.13.4. Considering the effect of system upgrades

When an upgrade is performed, the AIDE service will automatically regenerate a new integrity database to ensure all upgraded files are correctly recomputed to possess an updated checksum.

If openstack overcloud deploy is called as a subsequent run to an initial deployment, and the AIDE configuration rules are changed, the director AIDE service will rebuild the database to ensure the new config attributes are encapsulated in the integrity database.

13.14. Review SecureTTY

SecureTTY allows you to disable root access for any console device (tty). This behavior is managed by entries in the /etc/securetty file. For example:

  resource_registry:
    OS::TripleO::Services::Securetty: ../puppet/services/securetty.yaml

  parameter_defaults:
    TtyValues:
      - console
      - tty1
      - tty2
      - tty3
      - tty4
      - tty5
      - tty6

13.15. CADF auditing for Identity Service

A thorough auditing process can help you review the ongoing security posture of your OpenStack deployment. This is especially important for keystone, due to its role in the security model.

Red Hat OpenStack Platform has adopted Cloud Auditing Data Federation (CADF) as the data format for audit events, with the keystone service generating CADF events for Identity and Token operations. You can enable CADF auditing for keystone using KeystoneNotificationFormat:

  parameter_defaults:
    KeystoneNotificationFormat: cadf

13.16. Review the login.defs values

To enforce password requirements for new system users (non-keystone), director can add entries to /etc/login.defs by following these example parameters:

  resource_registry:
    OS::TripleO::Services::LoginDefs: ../puppet/services/login-defs.yaml

  parameter_defaults:
    PasswordMaxDays: 60
    PasswordMinDays: 1
    PasswordMinLen: 5
    PasswordWarnAge: 7
    FailDelay: 4