Best Practices / Deployment Guide SAP HANA on Red Hat Virtualization 4.4

Updated -

The guide in this PDF attachment describes how to deploy SAP HANA as a supported workload on Red Hat Virtualization (RHV) versions 4.4 with Intel Xeon Scalable Platform 2nd and 3rd Generation CPUs (Cascade and Cooper Lake). The document contains information about SAP HANA hardware requirements and best practices, and includes examples of SAP HANA and RHV-specific configuration settings and deployment options to consider when using the two products together. You can also download the scripts associated with this guide.

For guidance on deploying SAP HANA as a supported workload on Red Hat Virtualization (RHV) versions 4.2 and 4.3 with Intel Xeon Scalable Platform 1st and 2nd Generation CPUs (Skylake and Cascade Lake) see this document. Download the associated scripts here.


For their application servers SAP seems to suggest noop scheduler for KVM VMs: is that suggestion outdated or do HANAs behave better with deadline?

I don't know why SAP suggests this. In general using deadline for both hypervisor and VM is typically most sensible from our experience. That is why we are doing the SAP HANA cert like this as well.

For the 4.1 virtualization we talked about the pinning and someone from qemu/libvirtd suggested to use 1<>1 cpu pinning instead of pinning the real core and the ht core. Did you test that and it came out inferior or did you forget to change it for the 4.2 guide? currently: 0#4,116_1#4,116 direct: 0#4_1#116

No, the script still works the same. What we do propose though is to disable HT completely due to security concerns, see chapter 5 in the guide, or as direct link MDS - Microarchitectural Data Sampling

The doc itself shows the config used to reach the SAP performance goals.

Hi, I still have the issue that booting with a large number of hugepages takes many minutes and just shows a blank screen. People actually thought the system was hanging and rebooted it during that phase. Recently I discovered that I can just have RHV dynamically allocate the hugepages. Would this be a good idea to add into this guide?

I would advise against that. Mainly because the dynamic allocation is not necessarily even across all NUMA nodes - leading to a NUMA imbalance within the SAP HANA VM.

As such defining the hugepages at boot is most sensible.

Cheers, Martin

I am currently using dynamic hugepages for the sap application servers after red hat support suggest to use them because of performance issues. This lead to several bugzillas and issues on my end. I do not think that feature is mature enough for customer use. So I would say, yes you are right, do not use dynamic hugepages :) One issue remains though, if I preallocate hugepages at boot people who are not familiar with the systems thing it's hanging. Allocation takes minutes and during that allocation it's just showing a blank screen on console. A 'warning' on screen or something like that would be really helpful :)

I am not sure why but on kernel 3.10.0-957.41.1.el7.x86_64 either the modprobe or the cd command failed and it created the two files in /usr/lib/tuned/sap-hana-kvm-guest -- maybe use set -e or explicitly check return values

set -e

if [ "$1" == "start" ]; then
    modprobe cpuidle-haltpoll
    cd /sys/module/cpuidle_haltpoll/parameters/
    echo 800000 > guest_halt_poll_ns
    echo 200000 > guest_halt_poll_grow_start

modprobe cpuidle-haltpoll seems to kill my VMs on kernel 3.10.0-957.41.1.el7.x86_64 -- I'll open a support case

Nevermind... I only looked at the changelog of the kernel which states

- [x86] cpuidle-haltpoll: vcpu hotplug support (Marcelo Tosatti) [1776288 1771849] 

and I couldn't open any of the bugs because they are private, but is more specific

Guest crash after load cpuidle-haltpoll driver (BZ#1776288)

I'll test the latest kernel

Please note that the interface for the haltpoll driver has slightly changed. Use the following script that can handle both, old and new location:



if [ "$1" == "start" ]; then
    modprobe cpuidle-haltpoll
    if [ -e /sys/module/cpuidle_haltpoll/parameters/ ]; then
        echo $guest_halt_poll_ns > /sys/module/cpuidle_haltpoll/parameters/guest_halt_poll_ns
        echo $guest_halt_poll_grow_start > /sys/module/cpuidle_haltpoll/parameters/guest_halt_poll_grow_start
    elif [ -e /sys/module/haltpoll/parameters/ ]; then
        echo $guest_halt_poll_ns > /sys/module/haltpoll/parameters/guest_halt_poll_ns
        echo $guest_halt_poll_grow_start > /sys/module/haltpoll/parameters/guest_halt_poll_grow_start

In case your VM crashed when you load the driver, something else is wrong and opening a bug is the right way to go.

You should still fail in case modprobe fails or in case if and elif conditions both fail or create a warning/whatever. Otherwise issues could go unnoticed.

Well, you are right. I'll consider that in a future version of that script.

Watch out for 7.9 update, hypervisor usable memory goes down about 2%, this lead to some issues for us :)

Are there any plans to also get a hci scenario certified?

Important note: While it's not specifically mentioned in the PDF, passing an HBA to a guest means that the HBA is no longer accessible to the hypervisor. Therefore, you must either have dedicated HBAs for passthrough, independent of those in use by the hypervisors to access FibreChannel storage domains, or you must use other forms of storage for the storage domains.

Thanks, Allison, for the feedback - our onsite SAP team has confirmed that this is indeed the case and we've added a statement to that effect in the PDF.

Any updates for rhv 4.4?

Hello Klaas - this is currently a WIP and the SAP Alliance team is working on it. Thanks.

Do you develop this in public? I would like to review the suggested changes.