Array allocation in C++
This technical article covers a subtlety in C++ array allocation and how we changed the GNU C++ compiler to deal with it properly. When a programmer writes
T *p = new T[3];
the C++ compiler allocates room for at least three copies of objects of type T on the heap. These objects require 3 * sizeof(T) bytes. For this example, assume sizeof(T) is 12, then it is straightforward to allocate 36 bytes (for example, using malloc). But what happens if the array length is 3937053355 (or 16909515400900422315 on a 64-bit architecture)? Then 47244640260 bytes are required. This number cannot be expressed in 32-bits, so if 32-bit arithmetic is used to perform the multiplication, the result is a mere 4. Unless special care is taken, a C++ implementation will provide a pointer to a heap area that is much too small for holding the requested number of objects (4 bytes instead of 47244640260 bytes).
This defect has been present in the GNU C++ compiler since Michael Tiemann wrote it in 1987. It took a while until it was universally recognized as a security vulnerability, after I reported it in a 2002 RUS-CERT advisory. A similar vulnerability in the calloc function from libc was quickly fixed, but about a decade passed until my correction for the C++ variant of the vulnerability was accepted into GCC, 25 years after the bug was originally introduced into the compiler. (The std::vector template in the C++ standard library provided by GCC is not affected.)
What makes this vulnerability so difficult to address? Most systems use the Itanium C++ ABI to define the binary interface of C++-compiled code, and the runtime library part of the ABI provides array allocation functions, but GCC does not call them. (They cannot be used directly for multi-dimensional array allocations anyway.) Instead, GCC expands most of the code which performs the allocation inline, except for the actual heap allocation:
- The size is computed first: the object size and the array sizes from all dimensions are multiplied together.
- Depending on the type, array deallocation requires a size hint. This information is stored in a cookie, in front of the actual array, so the cookie size is added to the allocation size if necessary.
- The computed allocation size in bytes is passed to the low-level operator new[] implementations, called _Znam or _Znaj in their mangled form (depending on the architecture). These functions are not expanded inline, and take only a single parameter, the requested size in bytes. They usually allocate storage from the C heap using malloc. If the allocation fails, malloc returns NULL, and the C++ functions throw a std::bad_alloc exception.
- If the allocation was successful, more expanded code calls constructors to initialize the array elements. (If one of the constructor calls throws an exception, the objects constructed so far are destructed again, the allocated memory block is freed, and the exception is propagated.) The cookie is set at this point as well.
Ideally, we would like to change the _Znam/_Znaj runtime library functions (which reside in a dynamic shared object), but at this point, it is no longer possible to detect any overflow. Instead, we have to add yet another step to the generated code before the others:
- Compute a size limit for the allocation. Take the size of half of the address space minus one, subtract the cookie size, and divide it by the size of the type times all the inner array elements. Compare the requested size against this size limit. If it is exceeded, set the computed size (in bytes) to (size_t)-1 and jump to step 3.
For plain C++ as standardized in 1998, the size limit is always a compile-time constant. The GNU C++ compiler supports extensions which allows multi-dimensional array allocations with a variable in more than one dimension, so it was not always possible to reduce the limit to a constant. The GCC developers chose to remove the extensions as they are incompatible with this fix and not part of the C++ standard. Let's see how this works out in practice. Here is a simple, non-POD class and a function which just wraps the operator new[] call, so that we can see the generated code.
struct T { int a, b, c; T(); ~T(); }; T * alloc(unsigned long n) { return new T[n]; }
On ARM, GCC 4.7 generates the following instructions:
00000000 <_Z5allocm>: 0: push {r4, r5, r6, r7, r8, lr} 4: mov r5, r0 8: add r0, r0, r0, lsl #1 ; 1. multiply by 3 c: lsl r0, r0, #2 ; 1. multiply by 4 10: add r0, r0, #8 ; 2. add cookie size 14: bl 0 <_Znaj> ; 3. call allocator 18: sub r4, r5, #1 ; 4. initialize array 1c: mov r3, #12 ; 4. : 20: cmn r4, #1 ; 4. : (plus 30 more instructions not reproduced here)
In contrast, GCC 4.8 will generate:
00000000 <_Z5allocm>: 0: cmp r0, #0xaa00000 ; 0. compare with limit 4: push {r4, r5, r6, r7, r8, lr} 8: mov r4, r0 c: addls r0, r0, r0, lsl #1 ; 1. multiply by 3 10: lslls r0, r0, #2 ; 1. multiply by 4 14: addls r0, r0, #8 ; 2. add cookie 18: mvnhi r0, #0 ; 0. replace with -1 1c: bl 0 <_Znaj> ; 3. call allocator 20: sub r8, r4, #1 ; 4. initialize array 24: mov r3, #12 ; 4. : 28: cmn r8, #1 ; 4. : (additional instructions for step 4 follow)
In the GCC 4.8 example above, the first instruction (cmp) is the comparison against the size limit. The magic constant #0xaa00000 is the size of half of the address space, minus the cookie size, divided by element size. A better integer approximation would be the number #0xaaaaaaa, but #0xaa00000 is a more conservative choice, only retaining the top-most seven bits, which can be represented directly in the instruction stream, using the ARM barrel shifter. The additional instructions which implement the multiplication by twelve are unchanged, except that they are predicated and executed only if the comparison results in less-than-or-equal (indicated by the "ls" suffix). In the overflow case, the mvnhi instruction replaces the size with -1 ("hi" indicates a "greater-than" predicate, "mvn" means "move bit-wise complement", so 0 turns to -1).
Even when looking only at the code for steps 1 to 3, two additional instructions are not that bad. By clever use of the condition flags, it might be possible to use just a single additional instruction in some cases, but the GCC optimizers cannot do this yet.
This change is planned for the next GCC release, version 4.8, due in 2013. Red Hat Enterprise Linux 7 and Fedora 18 are scheduled to ship with a patched GCC which includes this change, and a few critical packages (such as Firefox) are expected to be built using this compiler. All Fedora 18 updates are expected to include this fix. For Fedora 19, a mass rebuild with the patched compiler is planned.
EDIT 2012-11-01: Update text based on Laszlo Ersek's comment.
Comments