How to reproduce a condition which invokes the OOM-Killer ?

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6

Issue

  • How to reproduce a condition which invokes the OOM-Killer ?

Resolution

Free output :

# free -m 
                  total         used        free  shared  buffers  cached
Mem:          1999        1819        180          0          94         910  
-/+ buffers/cache:      813       1186  
Swap:         4095          0       4095
  • There is almost 2GB of memory and out of that 910MB memory is cached( that means alomost 50% of memory is cached), system is using 99% of RAM.

  • Following are the overcommit parameters.

 $ cat /proc/sys/vm/overcommit_memory  

 $ cat /proc/sys/vm/overcommit_ratio  
 50

The following program will allocate all the memory but will not use it. Just it will allocate the memory.

memtest.c

 #include <stdio.h>
 #include <stdlib.h>

 int main (void) {  
         int n = 0;  

         while (1) {  
                 if (malloc(1<<20) == NULL) {  
                         printf("malloc failure after %d MiB\n", n);  
                         return 0;  
                 }  
                 printf ("got %d MiB\n", ++n);  
         }  
 }  



 $ gcc memtest1.c  
 $ ./a.out  

 got 570528 MiB  
 got 570529 MiB  
 got 570530 MiB  
 got 570531 MiBKilled
  • Kernel allowed upto 557MB of RAM (Kernel has overcommited the memory) we have used vm.overcommit_memory = 0 parameter.
    Following are the snipped log messages:
 #less /var/log/messages  

 6792kB unstable:0kB bounce:0kB writeback\_tmp:0kB pages\_scanned:160 all_unreclaimable? no  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827694] lowmem_reserve[]: 0 0 0 0  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827699] Node 0 DMA: 3*4kB 4*8kB 8*16kB 9*32kB 10*64kB 10*128kB 2*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8012kB  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827711] Node 0 DMA32: 377*4kB 21*8kB 2*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 5740kB  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827723] 19644 total pagecache pages  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827725] 1378 pages in swap cache  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827728] Swap cache stats: add 1114112, delete 1112734, find 9660/15265  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827730] Free swap  = 0kB  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.827732] Total swap = 4194300kB  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836840] 521855 pages RAM  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836843] 9983 pages reserved  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836845] 17279 pages shared  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836847] 494732 pages non-shared  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836852] Out of memory: kill process 6299 (a.out) score 154833937 or a child  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836857] Killed process 6299 (a.out) vsz:619335748kB, anon-rss:535344kB, file-rss:92kB
  • System was running in Low memory and it has killed a.out proces.

Free output

 # free -m  
                   total         used        free     shared    buffers     cached  
 Mem:          1999        455       1543               0            9          126  
 -/+ buffers/cache:      319       1680  
 Swap:        4095        354       3741
 #echo "2"  /proc/sys/vm/overcommit_memory  
 #echo "100"  /proc/sys/vm/overcommit_ratio <<< Here your system has failed.
  • Following program will start using the memory:

memtest2.c

 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>

 int main (void) {  
         int n = 0;  
         char *p;  

         while (1) {  
                 if ((p = malloc(1<<20)) == NULL) {  
                         printf("malloc failure after %d MiB\n", n);  
                         return 0;  
                 }  
                 memset (p, 0, (1<<20));  
                 printf ("got %d MiB\n", ++n);  
         }  
 }  


 #gcc memtest2.c  
 #./a.out  
 got 4511 MiB  
 got 4512 MiB  
 malloc failure after 4512 MiB
  • That means system allowed me to use upto 4.5GB of memory. This is because of overcommit_memory=2 and overcommit_ratio=100. (swap+100% of memory).

  • After running this program system became very slow and slugish but it has not crashed. Then OOM killer came and killed correct process.

 #less /var/log/messages  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836847] 494732 pages non-shared  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836852] Out of memory: kill process 6299 (a.out) score 154833937 or a child  
 Dec  6 00:19:23 dhcp1-109 kernel: [15358.836857] Killed process 6299 (a.out) vsz:619335748kB, anon-rss:535344kB, file-rss:92kB  
 Dec  6 00:29:19 dhcp1-109 rtkit-daemon[2166]: The canary thread is apparently starving. Taking action.  
 Dec  6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Demoting known real-time threads.  
 Dec  6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2336 of process 2333 (/usr/bin/pulseaudio).  
 Dec  6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2335 of process 2333 (/usr/bin/pulseaudio).  
 Dec  6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2333 of process 2333 (/usr/bin/pulseaudio).  
 Dec  6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Demoted 3 threads
  • Still system is on and running.

Conclusion:

Reference

  • http://www.win.tue.nl/~aeb/linux/lk/lk-9.html

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments