Debugging a kernel in QEMU/libvirt - Part II

February 24, 201711-minute read

This article was originally published on the Red Hat Customer Portal. The information may no longer be current.

This blog has previously shown how to configure a Red Hat Enterprise Linux system for kernel debugging, it expects that the system has been configured, have the source code matching the installed kernel version handy, and the reader is ready to follow along.

This should not be running on a productions system as system function interruption is guaranteed.

The particular problem that will be investigated is CVE-2016-9793. As discussed on the Oss-security list, this vulnerability was classified as an integer overflow, which must be addressed.

Eric Dumazet describes the patch as (taken from the commit that attempts to fix the flaw):

$ git show b98b0bc8c431e3ceb4b26b0dfc8db509518fb290
commit b98b0bc8c431e3ceb4b26b0dfc8db509518fb290
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Dec 2 09:44:53 2016 -0800

    net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

    CAP_NET_ADMIN users should not be allowed to set negative
    sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
    corruptions, crashes, OOM...

    Note that before commit 82981930125a ("net: cleanups in
    sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
    and SO_RCVBUF were vulnerable.

    This needs to be backported to all known linux kernels.

    Again, many thanks to syzkaller team for discovering this gem.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Andrey Konovalov <andreyknvl@google.com>
    Signed-off-by: David S.  Miller <davem@davemloft.net>

diff --git a/net/core/sock.c b/net/core/sock.c
index 5e3ca41..00a074d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -715,7 +715,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
                val = min_t(u32, val, sysctl_wmem_max);
 set_sndbuf:
                sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
-               sk->sk_sndbuf = max_t(u32, val * 2, SOCK_MIN_SNDBUF);
+               sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
                /* Wake up sending tasks if we upped the value.  */
                sk->sk_write_space(sk);
                break;
@@ -751,7 +751,7 @@ set_rcvbuf:
                 * returning the value we actually used in getsockopt
                 * is the most desirable behavior.
                 */
-               sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
+               sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
                break;

        case SO_RCVBUFFORCE:

The purpose of the investigation is to determine if this flaw affects the shipped kernels.

User interaction with the kernel happen through syscalls and ioctls. In this case, the issue is the setsockopt syscall. This specific call ends up being handled in a function named sock_setsockopt as shown in the patch.
The flaw is not always clearly documented in patches, but in this case the area that the patch modifies is an ideal place to start looking.

Investigating sock_setsockopt function

The sock_setsockopt code shown below has the relevant parts highlighted in an attempt to explain key concepts that complicate the investigation of this flaw.

A capabilities check is the first step that must be overcome when attempting to force set the snd_buff size. Inspecting the function sock_setsockopt code, a capable() check enforces the process has CAP_NET_ADMIN privilege to force buffer sizes. The attack vector is reduced by requiring this capability but not entirely mitigated. The root user by default has these capabilities, and it can be granted to binaries that run by other users. The relevant section of code is:

         if (!capable(CAP_NET_ADMIN)) {
             ret = -EPERM;
             break;
          }

The reproducer would need to have CAP_NET_ADMIN capabilities/permissions to run setsockopt() with the SO_RCVBUFFORCE parameter. To read more about Linux kernel capabilities checkout the setcap (8) man page and the capabilities (7) man page.

We can see from the patch and surrounding discussion that it is possible to set the size of sk->sndbuf to be negative. Following the flow of code, it would then enter the max_t macro before being assigned. The patch explicitly changes the max_t macros type to be cast.

Using the GDB debugger and setting a breakpoint will show how various sizes values affect the final value of sk->sndbuf.

Integer overflows

The patch shows that the type used in the max_t macro compare was changed from u32 (unsigned 32 bit integer) to int (signed 32 bit integer). Before we make assumptions or do any kind of investigation, we can make a hypothesis that the problem exists with the outcome of the max_t.

Here is the definition of max_t:

#define max_t(type, x, y) ({            \
    type __max1 = (x);            \
    type __max2 = (y);            \
    __max1 > __max2 ? __max1: __max2; })

My understanding of the max_t macro is that it would cast both the second and third parameters to the type specified by the first parameter returning __max1 if __max1 was greater than __max2. The unintended side affect would be that when casting to an unsigned type the comparison would turn negative values into large integer values.

It may be tempting to program the relevant macro, type definitions, and operations on the variables into a small C program to test. Resist! Armed with your kernel debugging knowledge and a small C program to exercise the code, we can see how the tool-chain decided to create this code.

For the test case, we'll need to consider using values that will test how the compiler and architecture deals with these kind of overflows. Input that would could create overflows or final negative values should be used as test cases.

Building the test case

To exercise this particular section of code (before the patch) we can build a small reproducer in C. Feel free to choose a language and write your test code in which you can set the socket options with the same way.

#include <stdio.h>
#include <limits.h>
#include <linux/types.h>
#include <sys/types.h>

#include <sys/socket.h>
#include <stdio.h>
#include <error.h>
#include <errno.h>
#include <string.h>

int main(int argc, char **argv)
{
    int sockfd, sendbuff;
    socklen_t optlen;
    int res = 0;
    int i = 0;

    /* Boundary values used to test our hypothesis */
        int val[] = {INT_MIN , INT_MIN + 100, INT_MIN + 200, -200 , 0 , 200 , INT_MAX - 200, INT_MAX - 100, INT_MAX};

    sockfd = socket(AF_INET, SOCK_DGRAM, 0);

    if(sockfd == -1) {
         printf("Error: %s", strerror(errno));
    }

    for (i = 0 ; i < 7; i++ ) {

        sendbuff = (val[i] / 2.0);

        printf("== Setting the send buffer to %d\n", sendbuff);

        if (setsockopt(sockfd, SOL_SOCKET, SO_SNDBUFFORCE, &sendbuff, sizeof(sendbuff)) == -1) {
          printf("SETSOCKOPT ERROR: %s\n", strerror(errno));
        }

        if (getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &sendbuff, &optlen) == -1) {
          printf("GETSOCKOPT ERROR: %s\n", strerror(errno));
        }
        else {
          printf("getsockopt returns buffer size: %d\n", sendbuff);
        }
    }

    printf("DONE\n");

 return 0;
}

Compile the reproducer:

[user@target /tmp/]# gcc setsockopt-integer-overflow-ver-001.c -o setsockopt-reproducer

And set the capability of CAP_NET_ADMIN on the binary:

[user@target /tmp/]# setcap CAP_NET_ADMIN+ep setsockopt-reproducer

If there are exploit creators (or flaw reporters) in the audience, understand that naming your files as reproducer.c and reproducer.py ends up getting confusing, please attempt to create a unique name for files. This can save time when searching through the 200 reproducer.c laying around the file system.

Saving time

Virtual machines afford programmers the ability to save the system state for immediate restore. This allows the system to return to a "known good state" if it was to panic or become corrupted. Libvirt calls this kind of snapshot a "System Checkpoint" style snapshot.

The virt-manager GUI tool in Red Hat Enterprise Linux 7 did not support creating system checkpoints in the GUI. The command line interface is able to create system-checkpoint snapshots by:

# virsh snapshot-create-as RHEL-7.2-SERVER snapshot-name-1

To restore the system to the snapshot run run the command:

# virsh snapshot-revert RHEL-7.2-SERVER snapshot-name-1

If the system is running Fedora 20 or newer, and you prefer to use GUI tools, Cole Robinson has written an article which shows how to create system checkpoint style snapshots from within the virt-manager.

The advantage of snapshots is that you can restore your system back to a working state in case of file system corruption, which can otherwise force you to reinstall from scratch.

Debugging and inspecting

GDB contains a "Text User Interface" mode which allows for greater insights into the running code. Start GDB in the "Text User Interface Mode" and connect to the running qemu/kernel using gdb as shown below:

gdb -tui ~/kernel-debug/var/lib/kernel-3.10.0-327.el7/boot/vmlinux

<gdb prelude here>

(gdb) dir ./usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/
(gdb) set architecture i386:x86-64:intel
(gdb) set disassembly-flavor intel
(gdb) target remote localhost:1234

The extra line beginning with dir points GDB to the location of the source used in creating the binary. This allows GDB to show the current line of execution. This directory tree was created when extracting the kernel-debuginfo-package using rpm2cpio.

GDB should appear similar to the below screenshot:

The TUI mode will show the source code at the top and the command line interactive session at the bottom window. The TUI can be customized further and this is left as an exercise to the reader.

Inspecting the value

The plan was to inspect the value at the time of writing to the sk->sk_sndbuf to determine how different parameters would affect the final value.

We will set a breakpoint in GDB to stop at that position and print out the value of sk->sk_sndbuf.

set_sndbuf:
        sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
>>>>>>>>sk->sk_sndbuf = max_t(u32, val * 2, SOCK_MIN_SNDBUF);  
        /* Wake up sending tasks if we upped the value.  */
        sk->sk_write_space(sk);
        break;

    case SO_SNDBUFFORCE:
        if (!capable(CAP_NET_ADMIN)) {
            ret = -EPERM;
            break;
        }
        goto set_sndbuf;

The line which assigns the sk->snd_buf value is line 704 in net/core/sock.c. To set a breakpoint on this line we can issue the "break" command to gdb with the parameters of where it should break.

Additional commands have been appended that will run every time the breakpoint has been hit. In this demonstration the breakpoint will print the value of sk->sk_sndbuf and resume running.

If you are not seeing the (gdb) prompt, hit ctrl + c to interrupt the system; pausing the system. While the system is suspended in gdb mode it will not take keyboard input or continue any processing.

(gdb) break net/core/sock.c:703
Breakpoint 1 at 0xffffffff81516ede: file net/core/sock.c, line 703.
(gdb) commands
Type commands for breakpoint(s) 4, one per line.
End with a line saying just "end".
>p sk->sk_sndbuf
>cont
>end
(gdb)

The "command" directive is similar to a function that will be run each time the most recently set breakpoint is run. The 'continue' directive at the (gdb) prompt to resume processing on the target system.

The plan was to show a binary compare of val to inspect the comparison, however this value was optimized out. GCC would allow us to inspect the 'val' directly if we were to step through the assembly and inspect the registers at the time of comparison. Doing so, however, is beyond the scope of this document.

Lets give it a simple test running the reproducer against the code with a predictable, commonly used value. Start another terminal, connect to the target node and run the command:

[user@target]# ./setsockopt-reproducer 
Setting the send buffer to -1073741824
getsockopt buffer size: -2147483648
Setting the send buffer to -1073741774
getsockopt buffer size: -2147483548
Setting the send buffer to -1073741724
getsockopt buffer size: -2147483448
Setting the send buffer to -100
getsockopt buffer size: -200
Setting the send buffer to 0
getsockopt buffer size: 4608
Setting the send buffer to 100
getsockopt buffer size: 4608
Setting the send buffer to 1073741723
getsockopt buffer size: 2147483446
Setting the send buffer to 1073741773
getsockopt buffer size: 2147483546

At this time there should be a breakpoint showing as executed in the gdb terminal printing out the value every time the function passes net/core/sock.c line 704.

Breakpoint 4, sock_setsockopt (sock=sock@entry=0xffff88003c57b680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffce597f1a0 "",
    optlen=optlen@entry=4) at net/core/sock.c:704
$9 = 212992

The above example shows $? = ______ as the output of the command that we have created. Each dollar ($N) shown in output correspond to the values iterated through in the test-case code.

int val[] = {INT_MIN , INT_MIN + 1, -1 , 0 , 1 , INT_MAX - 1, INT_MAX};

Listed below is the complete output of the example script:

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "",
    optlen=optlen@entry=4) at net/core/sock.c:704
$1 = 212992

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "2",
    optlen=optlen@entry=4) at net/core/sock.c:704
$2 = -2147483648

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "d",
    optlen=optlen@entry=4) at net/core/sock.c:704
$3 = -2147483548

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>,
    optval@entry=0x7ffe3ea373c4 "\234\377\377\377\003", optlen=optlen@entry=4) at net/core/sock.c:704
$4 = -2147483448

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "",
    optlen=optlen@entry=4) at net/core/sock.c:704
$5 = -200

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "d",
    optlen=optlen@entry=4) at net/core/sock.c:704
$6 = 4608

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "\233\377\377?\003",
    optlen=optlen@entry=4) at net/core/sock.c:704
$7 = 4608

Breakpoint 1, sock_setsockopt (sock=sock@entry=0xffff8800366cf680, level=level@entry=1, optname=optname@entry=32, optval=<optimized out>, optval@entry=0x7ffe3ea373c4 "\315\377\377?\003",
    optlen=optlen@entry=4) at net/core/sock.c:704
$8 = 2147483446

Conclusion

As we can see, the final values of sk->sk_sndbuf can be below zero if an application manages to set the value incorrectly. There are many areas of the kernel that use sk->sndbuf where the most obvious of places is the tcp_sndbuf_expand function. This value is used and memory is allocated based on this size.

This is going to be marked as vulnerable in EL7. I leave this as an exercise for the reader to do their own confirmation on other exploits they may be interested in.

Troubleshooting:

Listed below are a number of problems that first time users have run into. Please leave problems in the comments and I may edit this article to aid others in finding the solution faster.

Problem: Can't connect to gdb?
Solution: Use netstat to check the port is open and listening on the host. Add a rule in the firewall to allow incoming connections to this port.

Problem: GDB doesn't allow me to type?
Solution: Hit Ctrl + C to interrupt the current system, enter your command, type 'continue' to resume the hosts execution.

Problem: Breakpoint is set, but it never gets hit?
Solution: Its likely that you have a booted kernel and source code mismatch, check to see the running kernel matches the source code/line number that has been set.

Problem: The ssh connection drops while running the code!
Solution: If the target system remains in gdbs interrupted mode for too long networked connections to the system can be dropped. Try and connect to the host via "virsh console SOMENAME" to get a non-networked console. You may need to setup a serial console on the host if one is not present.

Additional thanks to:
- Doran Moppert (GDB assistance!)
- Prasad Pandit (Editing)
- Fabio Olive Leite (Editing)

About the author

Browse by channel

Explore all channels

Select a language

Debugging a kernel in QEMU/libvirt - Part II

This article was originally published on the Red Hat Customer Portal. The information may no longer be current.

Investigating sock_setsockopt function

Integer overflows

Building the test case

Saving time

Debugging and inspecting

Inspecting the value

Conclusion

Troubleshooting:

About the author

More like this

Browse by channel

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links