Position Independent Executables (PIE) use randomization as an exploit mitigation technique against attacks on return oriented programming. In my previous post I discussed the effects that PIE has on ELF binaries and how they are executed. In this entry I will discuss how I gathered information about program startup times and share some of my findings. The Linux loader has a great feature that allows you to gain some insight into what actions are taken during a program's execution. I used this feature when attempting to measure the impact PIE had on application startup times. This was chosen as the time that is spent in the linker resolving symbols is largely out of the programmer's control.
To collect statistics about the loaders performance you can prefix program execution with LD_DEBUG=statistics. This provides detailed information about the runtime statistics that pertain to the loader. Consider the following example:
$ LD_DEBUG=statistics ./pie-example 21180: 21180: runtime linker statistics: 21180: total startup time in dynamic loader: 714700 clock cycles 21180: time needed for relocation: 6958 clock cycles (.9%) 21180: number of relocations: 0 21180: number of relocations from cache: 0 21180: number of relative relocations: 1 21180: time needed to load objects: 274946 clock cycles (38.4%)
From this output you can gain an interesting insight into the differences between PIE and standard executables. In my original paper I examined the impact that PIE had on different types of applications, specifically focusing on the time spent in the loader during program startup. This was done by collecting thousands of samples of the statistical information output by LD_DEBUG in single user mode. One of the commands I looked at was 'sudo'. This is a command that is regularly executed and has the setuid bit set, so it serves as a good example. The difference I found in the time spent in the loader in clock cycles is shown in the figures below:
The results indicate there is a clear time shift of approximately 0.5µ second between the standard and PIE versions of sudo. Another key difference is the number of relative relocations that occur in each application as shown below:
In the testing I did this resulted in an average overhead of 16% during the programs startup phase. Given the test system runs at around 0.357 nanosecond per clock cycle, the overhead converts to roughly 0.1985 milliseconds. A figure that I would not lose any sleep over; however, some of you might. The key difference here is the relative relocations by each version of the program. To reduce the overhead we need to try and reduce this figure.
Revisiting the original sample application, if we want to reduce the number of relative relocations we need to figure out where they are occuring. Examining the ELF binary will give some indication as to the cause of this problem:
$ readelf -r pie-example Relocation section '.rela.dyn' at offset 0x370 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 0000002005f0 000000000008 R_X86_64_RELATIVE 0000000000200620 Relocation section '.rela.plt' at offset 0x388 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200610 000200000007 R_X86_64_JUMP_SLO 0000000000000000 quit + 0 000000200618 000300000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
The R_X86_64_RELATIVE type indicates a relative relocation offset is 200620. The symbol this relates to can be found by searching the disassembly of a binary:
$ objdump -D pie-example | grep 200620 -m 1 0000000000200620 : <message>
This relocation is for the message string that we are printing to the screen. The way that the message variable has been declared is actually important when it comes to relative relocations. So what happens if we declare the string differently?
const char *message = "Hello World";
int main(int argc, const char *argv, const char *envp)
The above would result in an additional relocation as the declaration is a pointer to a global string that is read only. In addition to this, a relocation is needed to locate the content in the .data segment.
$ readelf -r pie-example Relocation section '.rela.dyn' at offset 0x370 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200608 000000000008 R_X86_64_RELATIVE 0000000000200638 000000200638 000000000008 R_X86_64_RELATIVE 0000000000000440
Changing the string declaration to static const char message, a local variable, or using a preprocessor macro will result in no relative relocations for the PIE implementation. This is because the string "Hello World" will be placed in the read-only data section (.rodata) of the binary. As the name would suggest this means that the content cannot change or be written to. As a result, a decision can be made about the location of the message string at compile/link time as opposed to runtime. There is likely to be room to move when it comes to optimization in most programs. If you are interested in learning about other optimizations in this area, I strongly recommend Uli Drepper's paper How to write shared libraries.
I would argue, however, that in most cases such tedious levels of optimization would not be necessary. In the testing that I did the performance overhead in program startup ranged from 0.1985 milliseconds to 11 milliseconds. Which is minimal when compared with the benefits that PIE gives you against return oriented programming based attacks. I hope you enjoy my paper and consider using PIE in your project. In future posts I will investigate some of the other security features that GCC provides in the area of hardening executables.