Issue Summary -CPU Spike during DB2 database backup

Issue

DB2 Database server becomes unresponsive.

The CPU run queue captured by vmstat will show the normal handful of processes waiting jump into the hundreds or thousands.

CPU will eventually spike as well, but not necessarily right away.

During these events, straces have shown an increased number of process doing busy-wait loops and asking to be put back in the run-queue, as opposed to normal.

18217 15:34:20.497739 select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
18217 15:34:20.507766 <... select resumed> ) = 0 (Timeout)
18217 15:34:20.521358 sched_yield( <unfinished ...>
18217 15:34:20.531511 <... sched_yield resumed> ) = 0
18217 15:34:20.541386 sched_yield( <unfinished ...>
18217 15:34:20.551405 <... sched_yield resumed> ) = 0
18217 15:34:20.561107 sched_yield( <unfinished ...>
18217 15:34:20.571119 <... sched_yield resumed> ) = 0
18217 15:34:20.581060 sched_yield( <unfinished ...>
18217 15:34:20.590966 <... sched_yield resumed> ) = 0
18217 15:34:20.601051 sched_yield( <unfinished ...>
18217 15:34:20.610843 <... sched_yield resumed> ) = 0
18217 15:34:20.620418 select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
18217 15:34:20.630127 <... select resumed> ) = 0 (Timeout)
18217 15:34:20.640086 sched_yield( <unfinished ...>
18217 15:34:20.658969 <... sched_yield resumed> ) = 0
18217 15:34:20.668837 sched_yield( <unfinished ...>
18217 15:34:20.678556 <... sched_yield resumed> ) = 0
18217 15:34:20.688694 sched_yield( <unfinished ...>
18217 15:34:20.698502 <... sched_yield resumed> ) = 0
18217 15:34:20.708221 sched_yield( <unfinished ...>
18217 15:34:20.718445 <... sched_yield resumed> ) = 0
18217 15:34:20.728176 sched_yield( <unfinished ...>
18217 15:34:20.738251 <... sched_yield resumed> ) = 0
18217 15:34:20.748080 select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
18217 15:34:20.759658 <... select resumed> ) = 0 (Timeout)
18217 15:34:20.769284 sched_yield( <unfinished ...>
18217 15:34:20.788357 <... sched_yield resumed> ) = 0
18217 15:34:20.798316 sched_yield( <unfinished ...>
18217 15:34:20.807992 <... sched_yield resumed> ) = 0
18217 15:34:20.817618 sched_yield( <unfinished ...>
18217 15:34:20.827431 <... sched_yield resumed> ) = 0
18217 15:34:20.837477 sched_yield( <unfinished ...>
18217 15:34:20.851028 <... sched_yield resumed> ) = 0
18217 15:34:20.860788 sched_yield( <unfinished ...>
18217 15:34:20.870275 <... sched_yield resumed> ) = 0
18217 15:34:20.880015 select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
18217 15:34:20.893136 <... select resumed> ) = 0 (Timeout)

Current Action Plan

This is looking very much like an application issue.

Issue can correct itself or go away. Customer's current off-hours efforts are to gather an strace as the issue is going away and the server is normalizing.

The hope is to find out what why these processes are spinning in a busy-wait loop and what they're doing after that gets them out of it.

Environment

RHEL 5.7 (2.6.18.274.el5)
DB2 UDB
DB2 filesystems are on EMC SAN storage, with a Veritas Vxfs filesystem. (... using 1k blocksize)
HP bl460g6 2-socket 4-core hyperthreaded server - Version: Intel(R) Xeon(R) CPU X5560 @ 2.80GHz

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Issue Summary -CPU Spike during DB2 database backup

Issue

Current Action Plan

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Current Action Plan

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links