Kernel panic when drop_pagecache_sb() and prune_icache() run concurrently.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.3
  • Kernel 2.6.18-128.el5

Issue

  • After operating "echo 1 > /proc/sys/vm/drop_caches", server gets panic.
  • Kernel panic with following call traces,
BUG: unable to handle kernel paging request at virtual address 00100104
 printing eip:
c0487cfc
*pde = 32f52067
Oops: 0002 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipt_REJECT ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac ipv6 xfrm_nalgo crypto_api lp floppy sg snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm parport_pc tg3 parport ide_cd i2c_i801 snd_timer pcspkr cdrom i2c_core libphy serio_raw snd soundcore snd_page_alloc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c0487cfc>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-128.el5 #1) 
EIP is at __iget+0x24/0x3a
eax: d02cde50   ebx: d02cde48   ecx: 00100100   edx: 00200200
esi: f7dbb200   edi: db37ba70   ebp: e479ab40   esp: dc0b1f34
ds: 007b   es: 007b   ss: 0068
Process sh (pid: 31131, ti=dc0b1000 task=e00a1000 task.ti=dc0b1000)
Stack: c049330b 00000001 c067f448 f7c89940 c04933c9 00000002 c0429be8 b7f5e000 
       dc0b1f64 dc0b1fa4 b7f5e000 00000001 00000002 e479ab40 c0429c2c b7f5e000 
       00000002 c0429c3f 00000002 dc0b1fa4 c0472c1f dc0b1fa4 e479ab40 fffffff7 
Call Trace:
 [<c049330b>] drop_pagecache+0x55/0xea
 [<c04933c9>] drop_caches_sysctl_handler+0x29/0x3c
 [<c0429be8>] do_rw_proc+0xaa/0xee
 [<c0429c2c>] proc_writesys+0x0/0x16
 [<c0429c3f>] proc_writesys+0x13/0x16
 [<c0472c1f>] vfs_write+0xa1/0x143
 [<c0473211>] sys_write+0x3c/0x63
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code: cd f9 ff 59 5b 5b c3 89 c2 8b 40 24 85 c0 74 05 f0 ff 42 24 c3 f0 ff 42 24 f6 82 38 01 00 00 0f 75 18 8d 42 08 8b 4a 08 8b 50 04 <89> 51 04 89 0a ba 80 69 68 c0 e8 5f 35 06 00 ff 0d e4 8b 7b c0 
EIP: [<c0487cfc>] __iget+0x24/0x3a SS:ESP 0068:dc0b1f34

Resolution

  • This is known issue and private bug has been filed.
  • This bug was tracked in the private bugzilla
    500164 - Possible panic when drop_pagecache_sb() and prune_icache() run concurrently.
  • The issue has been resolved with an errata RHSA-2009:1243-3

Root Cause

  • The cause is that drop_pagecache_sb() and one of the inode release operations
    (ex. prune_icache()) can handle the identical inode when they concurrently run.
  • Therefore, drop_pagecache_sb() can handle an inode whose i_list was set
    LIST_POISON1 (=0x00100100) and LIST_POISON2 (=0x00200200) by one of
    the inode release operations.
  • Hence, if drop_pagecache_sb() refers LIST_POISON1 which is an illegal value
    as a pointer, the system panics.

Diagnostic Steps

Kernel Ring Buffer:

crash> log
[....]
BUG: unable to handle kernel paging request at virtual address 00100104
 printing eip:
c0487cfc
*pde = 32f52067
Oops: 0002 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipt_REJECT ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac ipv6 xfrm_nalgo crypto_api lp floppy sg snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm parport_pc tg3 parport ide_cd i2c_i801 snd_timer pcspkr cdrom i2c_core libphy serio_raw snd soundcore snd_page_alloc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c0487cfc>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-128.el5 #1) 
EIP is at __iget+0x24/0x3a
eax: d02cde50   ebx: d02cde48   ecx: 00100100   edx: 00200200
esi: f7dbb200   edi: db37ba70   ebp: e479ab40   esp: dc0b1f34
ds: 007b   es: 007b   ss: 0068
Process sh (pid: 31131, ti=dc0b1000 task=e00a1000 task.ti=dc0b1000)
Stack: c049330b 00000001 c067f448 f7c89940 c04933c9 00000002 c0429be8 b7f5e000 
       dc0b1f64 dc0b1fa4 b7f5e000 00000001 00000002 e479ab40 c0429c2c b7f5e000 
       00000002 c0429c3f 00000002 dc0b1fa4 c0472c1f dc0b1fa4 e479ab40 fffffff7 
Call Trace:
 [<c049330b>] drop_pagecache+0x55/0xea
 [<c04933c9>] drop_caches_sysctl_handler+0x29/0x3c
 [<c0429be8>] do_rw_proc+0xaa/0xee
 [<c0429c2c>] proc_writesys+0x0/0x16
 [<c0429c3f>] proc_writesys+0x13/0x16
 [<c0472c1f>] vfs_write+0xa1/0x143
 [<c0473211>] sys_write+0x3c/0x63
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code: cd f9 ff 59 5b 5b c3 89 c2 8b 40 24 85 c0 74 05 f0 ff 42 24 c3 f0 ff 42 24 f6 82 38 01 00 00 0f 75 18 8d 42 08 8b 4a 08 8b 50 04 <89> 51 04 89 0a ba 80 69 68 c0 e8 5f 35 06 00 ff 0d e4 8b 7b c0 
EIP: [<c0487cfc>] __iget+0x24/0x3a SS:ESP 0068:dc0b1f34

Backtraces:

crash> bt
PID: 31131  TASK: e00a1000  CPU: 0   COMMAND: "sh"
 #0 [dc0b1e50] crash_kexec at c0442d02
 #1 [dc0b1e94] die at c04064c6
 #2 [dc0b1ec4] do_page_fault at c0611187
 #3 [dc0b1efc] error_code (via page_fault) at c0405a87
    EAX: d02cde50  EBX: d02cde48  ECX: 00100100  EDX: 00200200  EBP: e479ab40 
    DS:  007b      ESI: f7dbb200  ES:  007b      EDI: db37ba70
    CS:  0060      EIP: c0487cfc  ERR: ffffffff  EFLAGS: 00010246 
 #4 [dc0b1f30] __iget at c0487cfc
 #5 [dc0b1f34] drop_pagecache at c0493306
 #6 [dc0b1f44] drop_caches_sysctl_handler at c04933c4
 #7 [dc0b1f4c] do_rw_proc at c0429be5
 #8 [dc0b1f78] proc_writesys at c0429c3a
 #9 [dc0b1f84] vfs_write at c0472c1d
#10 [dc0b1f9c] sys_write at c047320c
#11 [dc0b1fb8] system_call at c0404f10
    EAX: ffffffda  EBX: 00000001  ECX: b7f5e000  EDX: 00000002 
    DS:  007b      ESI: 00000002  ES:  007b      EDI: b7f5e000
    SS:  007b      ESP: bfb07d18  EBP: bfb07d38
    CS:  0073      EIP: 00706402  ERR: 00000004  EFLAGS: 00000246 

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments