What is the logic behind killing processes during an Out of Memory situation?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 4, 5, 6

Issue

  • What is the logic behind killing processes during an Out of Memory situation?
  • Why my Oracle processes are killed in an Out of Memory situation?

Resolution

Per the kernel source code, a simplified explanation of the OOM-killer logic follows.

A function called badness() is defined to calculate points for each processes. Points are added to:

  • Processes with high memory usage
  • Niced processes

Badness points are subtracted from:

  • Processes which have been running for a long time
  • Processes which were started by superusers
  • Process with direct hardware access

The process with the highest number of badness points will be killed, unless it is already in the midst of freeing up memory on its own. (Note that if a processes has 0 points it can not be killed.)

The kernel will wait for some time to see if enough memory is freed by killing one process. If enough memory is not freed, the OOM-kills will continue until enough memory is freed or until there are no candidate processes left to kill. If the kernel is out of memory and is unable to find a candidate process to kill, it panics with a message like:

Kernel panic - not syncing: Out of memory and no killable processes...

Root Cause


static unsigned long badness(struct task_struct *p, unsigned long uptime) { unsigned long points, cpu_time, run_time, s; if (!p->mm) return 0; if (p->flags & PF_MEMDIE) return 0; /* * The memory size of the process is the basis for the badness. */ points = p->mm->total_vm; /* * CPU time is in tens of seconds and run time is in thousands * of seconds. There is no particular reason for this other than * that it turned out to work very well in practice. */ cpu_time = (p->utime + p->stime) >> (SHIFT_HZ + 3); if (uptime >= p->start_time.tv_sec) run_time = (uptime - p->start_time.tv_sec) >> 10; else run_time = 0; s = int_sqrt(cpu_time); if (s) points /= s; s = int_sqrt(int_sqrt(run_time)); if (s) points /= s; /* * Niced processes are most likely less important, so double * their badness points. */ */ if (task_nice(p) 0) points *= 2; /* * Superuser processes are usually more important, so we make it * less likely that we kill those. */ if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) || p->uid == 0 || p->euid == 0) points /= 4; /* * We don't want to kill a process with direct hardware access. * Not only could that mess up the hardware, but usually users * tend to only have this flag set on applications they think * of as important. */ if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) points /= 4; #ifdef DEBUG printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\\n", p->pid, p->comm, points); #endif return points; }

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.