How to reproduce a condition which invokes the OOM-Killer ?
Environment
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
Issue
- How to reproduce a condition which invokes the OOM-Killer ?
Resolution
Free output :
# free -m
total used free shared buffers cached
Mem: 1999 1819 180 0 94 910
-/+ buffers/cache: 813 1186
Swap: 4095 0 4095
-
There is almost 2GB of memory and out of that 910MB memory is cached( that means alomost 50% of memory is cached), system is using 99% of RAM.
-
Following are the overcommit parameters.
$ cat /proc/sys/vm/overcommit_memory
$ cat /proc/sys/vm/overcommit_ratio
50
The following program will allocate all the memory but will not use it. Just it will allocate the memory.
memtest.c
#include <stdio.h>
#include <stdlib.h>
int main (void) {
int n = 0;
while (1) {
if (malloc(1<<20) == NULL) {
printf("malloc failure after %d MiB\n", n);
return 0;
}
printf ("got %d MiB\n", ++n);
}
}
$ gcc memtest1.c
$ ./a.out
got 570528 MiB
got 570529 MiB
got 570530 MiB
got 570531 MiBKilled
- Kernel allowed upto 557MB of RAM (Kernel has overcommited the memory) we have used vm.overcommit_memory = 0 parameter.
Following are the snipped log messages:
#less /var/log/messages
6792kB unstable:0kB bounce:0kB writeback\_tmp:0kB pages\_scanned:160 all_unreclaimable? no
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827694] lowmem_reserve[]: 0 0 0 0
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827699] Node 0 DMA: 3*4kB 4*8kB 8*16kB 9*32kB 10*64kB 10*128kB 2*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8012kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827711] Node 0 DMA32: 377*4kB 21*8kB 2*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 5740kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827723] 19644 total pagecache pages
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827725] 1378 pages in swap cache
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827728] Swap cache stats: add 1114112, delete 1112734, find 9660/15265
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827730] Free swap = 0kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827732] Total swap = 4194300kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836840] 521855 pages RAM
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836843] 9983 pages reserved
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836845] 17279 pages shared
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836847] 494732 pages non-shared
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836852] Out of memory: kill process 6299 (a.out) score 154833937 or a child
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836857] Killed process 6299 (a.out) vsz:619335748kB, anon-rss:535344kB, file-rss:92kB
- System was running in Low memory and it has killed a.out proces.
Free output
# free -m
total used free shared buffers cached
Mem: 1999 455 1543 0 9 126
-/+ buffers/cache: 319 1680
Swap: 4095 354 3741
#echo "2" /proc/sys/vm/overcommit_memory
#echo "100" /proc/sys/vm/overcommit_ratio <<< Here your system has failed.
- Following program will start using the memory:
memtest2.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (void) {
int n = 0;
char *p;
while (1) {
if ((p = malloc(1<<20)) == NULL) {
printf("malloc failure after %d MiB\n", n);
return 0;
}
memset (p, 0, (1<<20));
printf ("got %d MiB\n", ++n);
}
}
#gcc memtest2.c
#./a.out
got 4511 MiB
got 4512 MiB
malloc failure after 4512 MiB
-
That means system allowed me to use upto 4.5GB of memory. This is because of overcommit_memory=2 and overcommit_ratio=100. (swap+100% of memory).
-
After running this program system became very slow and slugish but it has not crashed. Then OOM killer came and killed correct process.
#less /var/log/messages
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836847] 494732 pages non-shared
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836852] Out of memory: kill process 6299 (a.out) score 154833937 or a child
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836857] Killed process 6299 (a.out) vsz:619335748kB, anon-rss:535344kB, file-rss:92kB
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: The canary thread is apparently starving. Taking action.
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Demoting known real-time threads.
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2336 of process 2333 (/usr/bin/pulseaudio).
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2335 of process 2333 (/usr/bin/pulseaudio).
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2333 of process 2333 (/usr/bin/pulseaudio).
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Demoted 3 threads
- Still system is on and running.
Conclusion:
- Setting these overcommit parameters are safe. It won't cause any issue.
- Also refer What is the logic behind killing processes during an Out of Memory situation?
Reference
- http://www.win.tue.nl/~aeb/linux/lk/lk-9.html
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments