Race in sysfs while fast unloading/loading drivers is causing system to crash
Environment
- Red Hat Enterprise Linux 6
Issue
- System is crashing during online firmware update, using
hpsumutility - Crash during firmware update on a HP ProLiant DL585 G7
- System panic with following traces :
BUG: unable to handle kernel paging request at ffffffffa040ec5a
IP: [<ffffffff8129353b>] strcmp+0xb/0x30
PGD 1a87067 PUD 1a8b063 PMD 10394e2067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 2
Modules linked in: nx_xport(+)(U) iptable_filter ip_tables ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase netconsole configfs autofs4 jbd dm_round_robin dm_multipath uinput e1000e serio_raw acpi_ipmi ipmi_si ipmi_msghandler hpilo ixgbe dca ptp pps_core mdio raid0 k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom qla2xxx scsi_transport_fc scsi_tgt hpsa ata_generic pata_acpi pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netxen_nic]
Pid: 28867, comm: insmod Not tainted 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL585 G7
RIP: 0010:[<ffffffff8129353b>] [<ffffffff8129353b>] strcmp+0xb/0x30
RSP: 0018:ffff880c3910fc58 EFLAGS: 00010282
RAX: 0000000000000064 RBX: ffff880436a890a0 RCX: ffff88083a75c250
RDX: 0000000000000065 RSI: ffffffffa040ec5a RDI: ffff881439fd8a60
RBP: ffff880c3910fc58 R08: 0000000000000001 R09: 0000000000000280
R10: 0000000000000010 R11: 0000000000000000 R12: ffff881439fd8a60
R13: ffff88143a5c5b50 R14: ffff880c3910fcf8 R15: 00000000fffffffe
FS: 00007f7dda6c5700(0000) GS:ffff88104e600000(0000) knlGS:00000000f777f8e0
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa040ec5a CR3: 0000001439e06000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 28867, threadinfo ffff880c3910e000, task ffff880c39a14aa0)
Stack:
ffff880c3910fc78 ffffffff8120b098 ffff880c3910fcf8 ffff88143a5c5b50
<d> ffff880c3910fca8 ffffffff8120c2bb ffff880c3910fcb8 ffff880c3910fcf8
<d> ffff88083a75c250 ffff88143a5c5b50 ffff880c3910fce8 ffffffff8120c37c
Call Trace:
[<ffffffff8120b098>] sysfs_find_dirent+0x28/0x50
[<ffffffff8120c2bb>] __sysfs_add_one+0x2b/0xc0
[<ffffffff8120c37c>] sysfs_add_one+0x2c/0xd0
[<ffffffff8120c89b>] sysfs_do_create_link+0x12b/0x170
[<ffffffff8120c913>] sysfs_create_link+0x13/0x20
[<ffffffff8136c442>] driver_sysfs_add+0x62/0x90
[<ffffffff8136c6c2>] driver_probe_device+0x82/0x3e0
[<ffffffff8136cacb>] __driver_attach+0xab/0xb0
[<ffffffff8136ca20>] ? __driver_attach+0x0/0xb0
[<ffffffff8136b8c4>] bus_for_each_dev+0x64/0x90
[<ffffffff8136c37e>] driver_attach+0x1e/0x20
[<ffffffff8136b0f8>] bus_add_driver+0x1e8/0x2b0
[<ffffffff8136cd26>] driver_register+0x76/0x140
[<ffffffff815293e5>] ? printk+0x41/0x44
[<ffffffff812b0316>] __pci_register_driver+0x56/0xd0
[<ffffffffa00360c6>] unm_init_module+0xc6/0x183 [nx_xport]
[<ffffffffa0036000>] ? unm_init_module+0x0/0x183 [nx_xport]
[<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
[<ffffffff810bfff1>] sys_init_module+0xe1/0x250
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 84 ff 40 88 39 74 0d 48 83 c1 01 48 83 ea 01 75 e7 c6 01 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 40 00 0f b6 07 <0f> b6 16 48 83 c7 01 48 83 c6 01 38 d0 75 0e 84 c0 75 ea 31 c0
RIP [<ffffffff8129353b>] strcmp+0xb/0x30
RSP <ffff880c3910fc58>
CR2: ffffffffa040ec5a
Resolution
- This is a known issue, which can cause the system to panic when fast unloading and loading of kernel modules is being performed.
- Red Hat is tracking this issue within internal bugzilla #880438
- Please contact Red Hat Technical Support to know more about the status of the BUG.
Root Cause
- A race in module unloading/loading code was detected, causing the system to panic when module unload has not released all
sysfsfiles when new module is being loaded, causing the system to panic.
Diagnostic Steps
- Vmocre analysis is as follows :
#4 [ffff886fd1f359f0] __bad_area_nosemaphore at ffffffff81046ea5
#5 [ffff886fd1f35a40] bad_area_nosemaphore at ffffffff81046f73
#6 [ffff886fd1f35a50] __do_page_fault at ffffffff810476d1
#7 [ffff886fd1f35b70] do_page_fault at ffffffff815144be
#8 [ffff886fd1f35ba0] page_fault at ffffffff81511875
[exception RIP: strcmp+11]
RIP: ffffffff8127eb2b RSP: ffff886fd1f35c58 RFLAGS: 00010286
RAX: 0000000000000064 RBX: ffff881062880490 RCX: ffff888fe99dab50
RDX: 0000000000000065 RSI: ffffffffa034b17a RDI: ffff88afe9e8c740
RBP: ffff886fd1f35c58 R8: 0000000000000001 R9: 0000000000000240
R10: 000000000000003b R11: 0000000000002000 R12: ffff88afe9e8c740
R13: ffff88afe885f490 R14: ffff886fd1f35cf8 R15: 00000000fffffffe
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff886fd1f35c60] sysfs_find_dirent at ffffffff811fa818
#10 [ffff886fd1f35c80] __sysfs_add_one at ffffffff811fba3b
#11 [ffff886fd1f35cb0] sysfs_add_one at ffffffff811fbafc
#12 [ffff886fd1f35cf0] sysfs_do_create_link at ffffffff811fc01b
#13 [ffff886fd1f35d50] sysfs_create_link at ffffffff811fc093
#14 [ffff886fd1f35d60] driver_sysfs_add at ffffffff8135f312
#15 [ffff886fd1f35d90] driver_probe_device at ffffffff8135f492
#16 [ffff886fd1f35dc0] __driver_attach at ffffffff8135f75b
#17 [ffff886fd1f35df0] bus_for_each_dev at ffffffff8135ea64
#18 [ffff886fd1f35e30] driver_attach at ffffffff8135f24e
#19 [ffff886fd1f35e40] bus_add_driver at ffffffff8135e298
#20 [ffff886fd1f35e80] driver_register at ffffffff8135faa6
#21 [ffff886fd1f35ec0] __pci_register_driver at ffffffff8129aa16
#22 [ffff886fd1f35ef0] init_module at ffffffffa01c80c6 [nx_xport]
#23 [ffff886fd1f35f20] do_one_initcall at ffffffff8100204c
#24 [ffff886fd1f35f50] sys_init_module at ffffffff810b7611
crash> struct sysfs_addrm_cxt ffff886fd1f35cf8
struct sysfs_addrm_cxt {
parent_sd = 0xffff888fe99dab50, <<------- (1)
parent_inode = 0xffff88106990f048,
removed = 0x0,
cnt = 0
}
struct sysfs_dirent {
s_count = {
counter = 0x1f <<---------- No. of references
},
s_active = {
counter = 0x0 <<-------- Non -ve.
},
s_parent = 0xffff888fe99cbf40,
s_name = 0xffff886fe9566380 "0000:04:00.0",
inode_node = {
rb_parent_color = 0xffff884fe958c539,
rb_right = 0x0,
rb_left = 0x0
},
name_node = {
rb_parent_color = 0xffff884fe958c551,
rb_right = 0x0,
rb_left = 0x0
},
u = {
completion = 0x0,
removed_list = 0x0
},
{
crash> rd ffff88afe9e8c740
ffff88afe9e8c740: ff00726576697264 driver
crash>
crash> px ((struct sysfs_addrm_cxt *)0xffff886fd1f35cf8)->parent_sd->s_dir.name_tree.rb_node
$7 = (struct rb_node *) 0xffff888fe99daca0
- So if reading through the tree one can see
crash> tree -t rbtree -s sysfs_dirent.s_name -o sysfs_dirent.name_node -N 0xffff888fe99daca0
ffff888fe99dac70
s_name = 0xffffffff817cbb72 "msi_bus"
ffff88afe958bd00
s_name = 0xffffffff8180ea45 "irq"
ffff882fe9580490
s_name = 0xffffffff817e0975 "device"
ffff888fe99dabe0
s_name = 0xffffffff817dd299 "class"
ffff886fe956a5b0
s_name = 0xffffffff817cbb5d "broken_parity_status"
ffff886fe959ac70
s_name = 0xffffffff817d7f02 "config"
ffff884fe958c400
s_name = 0xffffffff8179f795 "enable"
ffff881062880490
tree: invalid kernel virtual address: ffffffffa034b17a type: "gdb_readmem_callback" <<------------- (2)
Which probably
crash> dis -lr strcmp+0xb
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/lib/string.c: 248
0xffffffff8127eb20 <strcmp>: push %rbp
0xffffffff8127eb21 <strcmp+0x1>: mov %rsp,%rbp
0xffffffff8127eb24 <strcmp+0x4>: nopl 0x0(%rax)
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/lib/string.c: 252
0xffffffff8127eb28 <strcmp+0x8>: movzbl (%rdi),%eax
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/lib/string.c: 253
0xffffffff8127eb2b <strcmp+0xb>: movzbl (%rsi),%edx <<----------
RSI: ffffffffa034b17a <<--------- (2)
From (2) Looks created the issue.
tree: invalid kernel virtual address: ffffffffa034b17a type: "gdb_readmem_callback"
s_name = 0xffffffffa034b17a <Address 0xffffffffa034b17a out of bounds>
ffff886fe956a490
s_name = 0xffff884fe957b600 "firmware_node"
ffff88cfe99d72e0
s_name = 0xffffffff817cbb31 "local_cpus"
ffff88efe9a92880
s_name = 0xffffffff817cbb3c "local_cpulist"
ffff881068af5520
s_name = 0xffffffff817cbb4a "modalias"
ffff88efe9a927f0
s_name = 0xffffffff817de801 "resource"
ffff88afe958bd90
s_name = 0xffffffff817caab9 "remove"
ffff882fe9580520
s_name = 0xffffffff817cbb53 "numa_node"
ffff888fe99dad00
s_name = 0xffff886fe9d0db60 "power"
ffff88cfe99d7370
s_name = 0xffffffff817cbb08 "rescan"
ffff8810672cc520
s_name = 0xffffffff817da87c "reset"
ffff88cfe99d7250
s_name = 0xffffffff817caac8 "uevent"
ffff886fe956a520
s_name = 0xffffffff817cbb20 "subsystem_device"
ffff88efe9ac25b0
s_name = 0xffff88cfe88e7800 "resource4"
ffff88afe95f12e0
s_name = 0xffff888fe9a13480 "resource0"
ffff884fe958c490
s_name = 0xffff882fe957f420 "subsystem"
ffff884fe958c370
s_name = 0xffffffff817cbb0f "subsystem_vendor"
ffff881068af55b0
s_name = 0xffffffff817e2938 "vendor"
struct sysfs_dirent {
s_count = {
counter = 3
},
s_active = {
counter = 0
},
s_parent = 0xffff888fe99dab50, <<----------- Parent looks fine to me From (1) and rest seems corrupted to me.
s_name = 0xffffffffa034b17a tree: invalid kernel virtual address: ffffffffa034b17a type: "gdb_readmem_callback"
tree: invalid kernel virtual address: ffffffffa034b17a type: "gdb_readmem_callback"
<Address 0xffffffffa034b17a out of bounds>,
inode_node = {
rb_parent_color = 18446612202764682553,
rb_right = 0x0,
rb_left = 0x0
},
name_node = {
rb_parent_color = 18446612475531543601,
rb_right = 0x0,
rb_left = 0x0
},
u = {
completion = 0x0,
removed_list = 0x0
},
{
s_dir = {
kobj = 0xffffffffa034bd20,
subdirs = 0,
inode_tree = {
rb_node = 0x0
},
name_tree = {
rb_node = 0x0
}
},
s_symlink = {
target_sd = 0xffffffffa034bd20
},
s_attr = {
attr = 0xffffffffa034bd20,
open = 0x0
},
s_bin_attr = {
bin_attr = 0xffffffffa034bd20,
buffers = {
first = 0x0
}
}
- Looking at the logs, it seems the problem happened while installing the the driver
NX PCI driver: Major #246
BUG: unable to handle kernel paging request at ffffffffa034b17a
IP: [<ffffffff8127eb2b>] strcmp+0xb/0x30
PGD 1a87067 PUD 1a8b063 PMD afe8736067 PTE 0
struct module {
state = MODULE_STATE_COMING,
list = {
next = 0xffffffffa05416a8 <__this_module+8>,
prev = 0xffffffff81aa6c10 <modules>
},
name = "nx_xport\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
mkobj = {
kobj = {
name = 0xffff88afe8bb2920 "nx_xport",
entry = {
next = 0xffff882fe94b2cc0,
prev = 0xffffffffa05416f8 <__this_module+88>
},
parent = 0xffff882fe94b2cd8,
kset = 0xffff882fe94b2cc0,
ktype = 0xffffffff81aa3a20 <module_ktype>,
sd = 0xffff88afe8ae4f40,
kref = {
refcount = {
counter = 3
}
},
state_initialized = 1,
state_in_sysfs = 1,
state_add_uevent_sent = 1,
state_remove_uevent_sent = 0,
uevent_suppress = 0
},
mod = 0xffffffffa0615d80,
drivers_dir = 0x0,
mp = 0x0
},
modinfo_attrs = 0xffff88afe77dec00,
version = 0xffff88afe8bb2fa0 "4.0.593",
srcversion = 0xffff88afe8bb2780 "38FBBBED8DE4AA5A134AC9F",
holders_dir = 0xffff88afe9e038c0,
- Moving further, last analyze one can see following processes seem to be in
Dstate
crash> ps -l | grep UN
[15560255219756] [UN] PID: 43813 TASK: ffff888fda0e3540 CPU: 24 COMMAND: "ifdown-eth"
[15560253410836] [UN] PID: 43791 TASK: ffff886fcaafcaa0 CPU: 48 COMMAND: "ifdown-eth"
[15560253246721] [UN] PID: 43797 TASK: ffff88afdf708040 CPU: 56 COMMAND: "ifdown-eth"
[15560049632356] [UN] PID: 43841 TASK: ffff888fda0e2080 CPU: 33 COMMAND: "ifdown-eth" <<----------- Most oldest one
crash> bt 43841
PID: 43841 TASK: ffff888fda0e2080 CPU: 33 COMMAND: "ifdown-eth"
#0 [ffff888fe77d3aa8] schedule at ffffffff8150ea32
#1 [ffff888fe77d3b70] __mutex_lock_slowpath at ffffffff815100de
#2 [ffff888fe77d3be0] mutex_lock at ffffffff8150ff7b
#3 [ffff888fe77d3c00] sysfs_lookup at ffffffff811fb96a
#4 [ffff888fe77d3c40] do_lookup at ffffffff81190915
#5 [ffff888fe77d3ca0] __link_path_walk at ffffffff81190bad
#6 [ffff888fe77d3d60] path_walk at ffffffff81191c5a
#7 [ffff888fe77d3da0] do_path_lookup at ffffffff81191e2b
#8 [ffff888fe77d3dd0] user_path_at at ffffffff81192ab7
#9 [ffff888fe77d3ea0] vfs_fstatat at ffffffff81186e6c
#10 [ffff888fe77d3ee0] vfs_stat at ffffffff81186fdb
#11 [ffff888fe77d3ef0] sys_newstat at ffffffff81187004
#12 [ffff888fe77d3f80] system_call_fastpath at ffffffff8100b072
RIP: 0000003d376daf25 RSP: 00007fff7b0ab4f0 RFLAGS: 00010206
RAX: 0000000000000004 RBX: ffffffff8100b072 RCX: 0000000000c532b0
RDX: 00007fff7b0ab1c0 RSI: 00007fff7b0ab1c0 RDI: 0000000000c00730
RBP: 0000000000c60020 R8: 0000000000000001 R9: 0000000000000001
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
R13: 00007fff7b0ab1c0 R14: 0000000000000001 R15: 00000000006dece0
ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b
crash> dis -lr ffffffff811fb96a
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 761
0xffffffff811fb930 <sysfs_lookup>: push %rbp
0xffffffff811fb931 <sysfs_lookup+1>: mov %rsp,%rbp
0xffffffff811fb934 <sysfs_lookup+4>: sub $0x30,%rsp
0xffffffff811fb938 <sysfs_lookup+8>: mov %rbx,-0x18(%rbp)
0xffffffff811fb93c <sysfs_lookup+12>: mov %r12,-0x10(%rbp)
0xffffffff811fb940 <sysfs_lookup+16>: mov %r13,-0x8(%rbp)
0xffffffff811fb944 <sysfs_lookup+20>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 763
0xffffffff811fb949 <sysfs_lookup+25>: mov 0x28(%rsi),%rax
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 767
0xffffffff811fb94d <sysfs_lookup+29>: mov $0xffffffff81adc080,%rdi
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 772
0xffffffff811fb954 <sysfs_lookup+36>: mov $0xfffffffffffffffe,%r13
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 761
0xffffffff811fb95b <sysfs_lookup+43>: mov %rsi,%rbx
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 763
0xffffffff811fb95e <sysfs_lookup+46>: mov 0x98(%rax),%r12
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 767
0xffffffff811fb965 <sysfs_lookup+53>: callq 0xffffffff8150ff50 <mutex_lock>
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 769
0xffffffff811fb96a <sysfs_lookup+58>: mov 0x38(%rbx),%rsi
static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
struct dentry *ret = NULL;
struct sysfs_dirent *parent_sd = dentry->d_parent->d_fsdata;
struct sysfs_dirent *sd;
struct inode *inode;
mutex_lock(&sysfs_mutex); <<---------------
sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);
crash> px sysfs_mutex
sysfs_mutex = $14 = {
count = {
counter = 0xffffffff
},
wait_lock = {
raw_lock = {
slock = 0x42b142b1
}
},
wait_list = {
next = 0xffff888fe77d3b88,
prev = 0xffff888fe77d3b88
},
owner = 0xffff886fd1f34000 <<------- Looks owner of the lock
}
crash> kmem 0xffff886fd1f34000
PID: 43867
COMMAND: "insmod"
TASK: ffff886fe8d63540 [THREAD_INFO: ffff886fd1f34000]
CPU: 40
STATE: TASK_RUNNING (PANIC)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea01875ed360 6fd1f34000 0 ffff882fe784f400 1 1c0000000000000
crash>
- Looks like this was acquired by
insmodprocess only - The above process is waiting for 679 mili seconds for the mutex
crash> eval (15560729425844 - 15560049632356)
hexadecimal: 2884d350
decimal: 679793488
octal: 5041151520
binary: 0000000000000000000000000000000000101000100001001101001101010000
crash> eval (679793488/1000000)
hexadecimal: 2a7
decimal: 679 mili seconds.
- Looks like the
insmodprocess is over CPU-40 for around 15 mili seconds
crash> eval (15560049675247 - 15560034495911)
hexadecimal: e79e48
decimal: 15179336
octal: 71717110
binary: 0000000000000000000000000000000000000000111001111001111001001000
crash> eval (15179336 /1000000)
hexadecimal: f
decimal: 15 mili seconds.
- Looks like the mutex was acquired in following function for
insmod
void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
struct sysfs_dirent *parent_sd)
{
struct inode *inode;
memset(acxt, 0, sizeof(*acxt));
acxt->parent_sd = parent_sd;
/* Lookup parent inode. inode initialization is protected by
* sysfs_mutex, so inode existence can be determined by
* looking up inode while holding sysfs_mutex.
*/
mutex_lock(&sysfs_mutex); <<--------------
inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments