Race in sysfs while fast unloading/loading drivers is causing system to crash

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 6

Issue

  • System is crashing during online firmware update, using hpsum utility
  • Crash during firmware update on a HP ProLiant DL585 G7
  • System panic with following traces :
BUG: unable to handle kernel paging request at ffffffffa040ec5a
IP: [<ffffffff8129353b>] strcmp+0xb/0x30
PGD 1a87067 PUD 1a8b063 PMD 10394e2067 PTE 0
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/online
CPU 2 
Modules linked in: nx_xport(+)(U) iptable_filter ip_tables ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase netconsole configfs autofs4 jbd dm_round_robin dm_multipath uinput e1000e serio_raw acpi_ipmi ipmi_si ipmi_msghandler hpilo ixgbe dca ptp pps_core mdio raid0 k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom qla2xxx scsi_transport_fc scsi_tgt hpsa ata_generic pata_acpi pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netxen_nic]

Pid: 28867, comm: insmod Not tainted 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL585 G7
RIP: 0010:[<ffffffff8129353b>]  [<ffffffff8129353b>] strcmp+0xb/0x30
RSP: 0018:ffff880c3910fc58  EFLAGS: 00010282
RAX: 0000000000000064 RBX: ffff880436a890a0 RCX: ffff88083a75c250
RDX: 0000000000000065 RSI: ffffffffa040ec5a RDI: ffff881439fd8a60
RBP: ffff880c3910fc58 R08: 0000000000000001 R09: 0000000000000280
R10: 0000000000000010 R11: 0000000000000000 R12: ffff881439fd8a60
R13: ffff88143a5c5b50 R14: ffff880c3910fcf8 R15: 00000000fffffffe
FS:  00007f7dda6c5700(0000) GS:ffff88104e600000(0000) knlGS:00000000f777f8e0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa040ec5a CR3: 0000001439e06000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 28867, threadinfo ffff880c3910e000, task ffff880c39a14aa0)
Stack:
 ffff880c3910fc78 ffffffff8120b098 ffff880c3910fcf8 ffff88143a5c5b50
<d> ffff880c3910fca8 ffffffff8120c2bb ffff880c3910fcb8 ffff880c3910fcf8
<d> ffff88083a75c250 ffff88143a5c5b50 ffff880c3910fce8 ffffffff8120c37c
Call Trace:
 [<ffffffff8120b098>] sysfs_find_dirent+0x28/0x50
 [<ffffffff8120c2bb>] __sysfs_add_one+0x2b/0xc0
 [<ffffffff8120c37c>] sysfs_add_one+0x2c/0xd0
 [<ffffffff8120c89b>] sysfs_do_create_link+0x12b/0x170
 [<ffffffff8120c913>] sysfs_create_link+0x13/0x20
 [<ffffffff8136c442>] driver_sysfs_add+0x62/0x90
 [<ffffffff8136c6c2>] driver_probe_device+0x82/0x3e0
 [<ffffffff8136cacb>] __driver_attach+0xab/0xb0
 [<ffffffff8136ca20>] ? __driver_attach+0x0/0xb0
 [<ffffffff8136b8c4>] bus_for_each_dev+0x64/0x90
 [<ffffffff8136c37e>] driver_attach+0x1e/0x20
 [<ffffffff8136b0f8>] bus_add_driver+0x1e8/0x2b0
 [<ffffffff8136cd26>] driver_register+0x76/0x140
 [<ffffffff815293e5>] ? printk+0x41/0x44
 [<ffffffff812b0316>] __pci_register_driver+0x56/0xd0
 [<ffffffffa00360c6>] unm_init_module+0xc6/0x183 [nx_xport]
 [<ffffffffa0036000>] ? unm_init_module+0x0/0x183 [nx_xport]
 [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
 [<ffffffff810bfff1>] sys_init_module+0xe1/0x250
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 84 ff 40 88 39 74 0d 48 83 c1 01 48 83 ea 01 75 e7 c6 01 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 40 00 0f b6 07 <0f> b6 16 48 83 c7 01 48 83 c6 01 38 d0 75 0e 84 c0 75 ea 31 c0 
RIP  [<ffffffff8129353b>] strcmp+0xb/0x30
 RSP <ffff880c3910fc58>
CR2: ffffffffa040ec5a

Resolution

  • This is a known issue, which can cause the system to panic when fast unloading and loading of kernel modules is being performed.
  • Red Hat is tracking this issue within internal bugzilla #880438
  • Please contact Red Hat Technical Support to know more about the status of the BUG.

Root Cause

  • A race in module unloading/loading code was detected, causing the system to panic when module unload has not released all sysfs files when new module is being loaded, causing the system to panic.

Diagnostic Steps

  • Vmocre analysis is as follows :
 #4 [ffff886fd1f359f0] __bad_area_nosemaphore at ffffffff81046ea5
 #5 [ffff886fd1f35a40] bad_area_nosemaphore at ffffffff81046f73
 #6 [ffff886fd1f35a50] __do_page_fault at ffffffff810476d1
 #7 [ffff886fd1f35b70] do_page_fault at ffffffff815144be
 #8 [ffff886fd1f35ba0] page_fault at ffffffff81511875
    [exception RIP: strcmp+11]
    RIP: ffffffff8127eb2b  RSP: ffff886fd1f35c58  RFLAGS: 00010286
    RAX: 0000000000000064  RBX: ffff881062880490  RCX: ffff888fe99dab50
    RDX: 0000000000000065  RSI: ffffffffa034b17a  RDI: ffff88afe9e8c740
    RBP: ffff886fd1f35c58   R8: 0000000000000001   R9: 0000000000000240
    R10: 000000000000003b  R11: 0000000000002000  R12: ffff88afe9e8c740
    R13: ffff88afe885f490  R14: ffff886fd1f35cf8  R15: 00000000fffffffe
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff886fd1f35c60] sysfs_find_dirent at ffffffff811fa818
#10 [ffff886fd1f35c80] __sysfs_add_one at ffffffff811fba3b
#11 [ffff886fd1f35cb0] sysfs_add_one at ffffffff811fbafc
#12 [ffff886fd1f35cf0] sysfs_do_create_link at ffffffff811fc01b
#13 [ffff886fd1f35d50] sysfs_create_link at ffffffff811fc093
#14 [ffff886fd1f35d60] driver_sysfs_add at ffffffff8135f312
#15 [ffff886fd1f35d90] driver_probe_device at ffffffff8135f492
#16 [ffff886fd1f35dc0] __driver_attach at ffffffff8135f75b
#17 [ffff886fd1f35df0] bus_for_each_dev at ffffffff8135ea64
#18 [ffff886fd1f35e30] driver_attach at ffffffff8135f24e
#19 [ffff886fd1f35e40] bus_add_driver at ffffffff8135e298
#20 [ffff886fd1f35e80] driver_register at ffffffff8135faa6
#21 [ffff886fd1f35ec0] __pci_register_driver at ffffffff8129aa16
#22 [ffff886fd1f35ef0] init_module at ffffffffa01c80c6 [nx_xport]
#23 [ffff886fd1f35f20] do_one_initcall at ffffffff8100204c
#24 [ffff886fd1f35f50] sys_init_module at ffffffff810b7611


crash> struct sysfs_addrm_cxt ffff886fd1f35cf8
struct sysfs_addrm_cxt {
  parent_sd = 0xffff888fe99dab50,  <<------- (1)
  parent_inode = 0xffff88106990f048, 
  removed = 0x0, 
  cnt = 0
}


struct sysfs_dirent {
  s_count = {
    counter = 0x1f  <<---------- No. of references 
  }, 
  s_active = {
    counter = 0x0 <<-------- Non -ve.
  }, 
  s_parent = 0xffff888fe99cbf40, 
  s_name = 0xffff886fe9566380 "0000:04:00.0", 
  inode_node = {
    rb_parent_color = 0xffff884fe958c539, 
    rb_right = 0x0, 
    rb_left = 0x0
  }, 
  name_node = {
    rb_parent_color = 0xffff884fe958c551, 
    rb_right = 0x0, 
    rb_left = 0x0
  }, 
  u = {
    completion = 0x0, 
    removed_list = 0x0
  }, 
  {

crash> rd ffff88afe9e8c740
ffff88afe9e8c740:  ff00726576697264                    driver

crash> 
crash> px ((struct sysfs_addrm_cxt *)0xffff886fd1f35cf8)->parent_sd->s_dir.name_tree.rb_node
$7 = (struct rb_node *) 0xffff888fe99daca0
  • So if reading through the tree one can see
crash>  tree -t rbtree -s sysfs_dirent.s_name -o sysfs_dirent.name_node -N 0xffff888fe99daca0
ffff888fe99dac70
  s_name = 0xffffffff817cbb72 "msi_bus"
ffff88afe958bd00
  s_name = 0xffffffff8180ea45 "irq"
ffff882fe9580490
  s_name = 0xffffffff817e0975 "device"
ffff888fe99dabe0
  s_name = 0xffffffff817dd299 "class"
ffff886fe956a5b0
  s_name = 0xffffffff817cbb5d "broken_parity_status"
ffff886fe959ac70
  s_name = 0xffffffff817d7f02 "config"
ffff884fe958c400
  s_name = 0xffffffff8179f795 "enable"
ffff881062880490
tree: invalid kernel virtual address: ffffffffa034b17a  type: "gdb_readmem_callback" <<------------- (2)

Which probably 

crash> dis -lr strcmp+0xb
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/lib/string.c: 248
0xffffffff8127eb20 <strcmp>:    push   %rbp
0xffffffff8127eb21 <strcmp+0x1>:        mov    %rsp,%rbp
0xffffffff8127eb24 <strcmp+0x4>:        nopl   0x0(%rax)
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/lib/string.c: 252
0xffffffff8127eb28 <strcmp+0x8>:        movzbl (%rdi),%eax
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/lib/string.c: 253
0xffffffff8127eb2b <strcmp+0xb>:        movzbl (%rsi),%edx <<----------

 RSI: ffffffffa034b17a <<--------- (2)

From (2) Looks created the issue.

tree: invalid kernel virtual address: ffffffffa034b17a  type: "gdb_readmem_callback"
  s_name = 0xffffffffa034b17a <Address 0xffffffffa034b17a out of bounds>
ffff886fe956a490
  s_name = 0xffff884fe957b600 "firmware_node"
ffff88cfe99d72e0
  s_name = 0xffffffff817cbb31 "local_cpus"
ffff88efe9a92880
  s_name = 0xffffffff817cbb3c "local_cpulist"
ffff881068af5520
  s_name = 0xffffffff817cbb4a "modalias"
ffff88efe9a927f0
  s_name = 0xffffffff817de801 "resource"
ffff88afe958bd90
  s_name = 0xffffffff817caab9 "remove"
ffff882fe9580520
  s_name = 0xffffffff817cbb53 "numa_node"
ffff888fe99dad00
  s_name = 0xffff886fe9d0db60 "power"
ffff88cfe99d7370
  s_name = 0xffffffff817cbb08 "rescan"
ffff8810672cc520
  s_name = 0xffffffff817da87c "reset"
ffff88cfe99d7250
  s_name = 0xffffffff817caac8 "uevent"
ffff886fe956a520
  s_name = 0xffffffff817cbb20 "subsystem_device"
ffff88efe9ac25b0
  s_name = 0xffff88cfe88e7800 "resource4"
ffff88afe95f12e0
  s_name = 0xffff888fe9a13480 "resource0"
ffff884fe958c490
  s_name = 0xffff882fe957f420 "subsystem"
ffff884fe958c370
  s_name = 0xffffffff817cbb0f "subsystem_vendor"
ffff881068af55b0
  s_name = 0xffffffff817e2938 "vendor"


struct sysfs_dirent {
  s_count = {
    counter = 3
  }, 
  s_active = {
    counter = 0
  }, 
  s_parent = 0xffff888fe99dab50,  <<----------- Parent looks fine to me From (1) and rest seems corrupted to me.

  s_name = 0xffffffffa034b17a tree: invalid kernel virtual address: ffffffffa034b17a  type: "gdb_readmem_callback"
tree: invalid kernel virtual address: ffffffffa034b17a  type: "gdb_readmem_callback"
<Address 0xffffffffa034b17a out of bounds>, 
  inode_node = {
    rb_parent_color = 18446612202764682553, 
    rb_right = 0x0, 
    rb_left = 0x0
  }, 
  name_node = {
    rb_parent_color = 18446612475531543601, 
    rb_right = 0x0, 
    rb_left = 0x0
  }, 
  u = {
    completion = 0x0, 
    removed_list = 0x0
  }, 
  {
    s_dir = {
      kobj = 0xffffffffa034bd20, 
      subdirs = 0, 
      inode_tree = {
        rb_node = 0x0
      }, 
      name_tree = {
        rb_node = 0x0
      }
    }, 
    s_symlink = {
      target_sd = 0xffffffffa034bd20
    }, 
    s_attr = {
      attr = 0xffffffffa034bd20, 
      open = 0x0
    }, 
    s_bin_attr = {
      bin_attr = 0xffffffffa034bd20, 
      buffers = {
        first = 0x0
      }
    }
  • Looking at the logs, it seems the problem happened while installing the the driver
NX PCI driver: Major #246
BUG: unable to handle kernel paging request at ffffffffa034b17a
IP: [<ffffffff8127eb2b>] strcmp+0xb/0x30
PGD 1a87067 PUD 1a8b063 PMD afe8736067 PTE 0


struct module {
  state = MODULE_STATE_COMING, 
  list = {
    next = 0xffffffffa05416a8 <__this_module+8>, 
    prev = 0xffffffff81aa6c10 <modules>
  }, 
  name = "nx_xport\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
  mkobj = {
    kobj = {
      name = 0xffff88afe8bb2920 "nx_xport", 
      entry = {
        next = 0xffff882fe94b2cc0, 
        prev = 0xffffffffa05416f8 <__this_module+88>
      }, 
      parent = 0xffff882fe94b2cd8, 
      kset = 0xffff882fe94b2cc0, 
      ktype = 0xffffffff81aa3a20 <module_ktype>, 
      sd = 0xffff88afe8ae4f40, 
      kref = {
        refcount = {
          counter = 3
        }
      }, 
      state_initialized = 1, 
      state_in_sysfs = 1, 
      state_add_uevent_sent = 1, 
      state_remove_uevent_sent = 0, 
      uevent_suppress = 0
    }, 
    mod = 0xffffffffa0615d80, 
    drivers_dir = 0x0, 
    mp = 0x0
  }, 
  modinfo_attrs = 0xffff88afe77dec00, 
  version = 0xffff88afe8bb2fa0 "4.0.593", 
  srcversion = 0xffff88afe8bb2780 "38FBBBED8DE4AA5A134AC9F", 
  holders_dir = 0xffff88afe9e038c0, 
  • Moving further, last analyze one can see following processes seem to be in D state
crash> ps -l | grep UN
[15560255219756] [UN]  PID: 43813  TASK: ffff888fda0e3540  CPU: 24  COMMAND: "ifdown-eth"
[15560253410836] [UN]  PID: 43791  TASK: ffff886fcaafcaa0  CPU: 48  COMMAND: "ifdown-eth"
[15560253246721] [UN]  PID: 43797  TASK: ffff88afdf708040  CPU: 56  COMMAND: "ifdown-eth"
[15560049632356] [UN]  PID: 43841  TASK: ffff888fda0e2080  CPU: 33  COMMAND: "ifdown-eth" <<----------- Most oldest one 


crash> bt 43841
PID: 43841  TASK: ffff888fda0e2080  CPU: 33  COMMAND: "ifdown-eth"
 #0 [ffff888fe77d3aa8] schedule at ffffffff8150ea32
 #1 [ffff888fe77d3b70] __mutex_lock_slowpath at ffffffff815100de
 #2 [ffff888fe77d3be0] mutex_lock at ffffffff8150ff7b
 #3 [ffff888fe77d3c00] sysfs_lookup at ffffffff811fb96a
 #4 [ffff888fe77d3c40] do_lookup at ffffffff81190915
 #5 [ffff888fe77d3ca0] __link_path_walk at ffffffff81190bad
 #6 [ffff888fe77d3d60] path_walk at ffffffff81191c5a
 #7 [ffff888fe77d3da0] do_path_lookup at ffffffff81191e2b
 #8 [ffff888fe77d3dd0] user_path_at at ffffffff81192ab7
 #9 [ffff888fe77d3ea0] vfs_fstatat at ffffffff81186e6c
#10 [ffff888fe77d3ee0] vfs_stat at ffffffff81186fdb
#11 [ffff888fe77d3ef0] sys_newstat at ffffffff81187004
#12 [ffff888fe77d3f80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003d376daf25  RSP: 00007fff7b0ab4f0  RFLAGS: 00010206
    RAX: 0000000000000004  RBX: ffffffff8100b072  RCX: 0000000000c532b0
    RDX: 00007fff7b0ab1c0  RSI: 00007fff7b0ab1c0  RDI: 0000000000c00730
    RBP: 0000000000c60020   R8: 0000000000000001   R9: 0000000000000001
    R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000001
    R13: 00007fff7b0ab1c0  R14: 0000000000000001  R15: 00000000006dece0
    ORIG_RAX: 0000000000000004  CS: 0033  SS: 002b


crash> dis -lr ffffffff811fb96a
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 761
0xffffffff811fb930 <sysfs_lookup>:      push   %rbp
0xffffffff811fb931 <sysfs_lookup+1>:    mov    %rsp,%rbp
0xffffffff811fb934 <sysfs_lookup+4>:    sub    $0x30,%rsp
0xffffffff811fb938 <sysfs_lookup+8>:    mov    %rbx,-0x18(%rbp)
0xffffffff811fb93c <sysfs_lookup+12>:   mov    %r12,-0x10(%rbp)
0xffffffff811fb940 <sysfs_lookup+16>:   mov    %r13,-0x8(%rbp)
0xffffffff811fb944 <sysfs_lookup+20>:   nopl   0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 763
0xffffffff811fb949 <sysfs_lookup+25>:   mov    0x28(%rsi),%rax
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 767
0xffffffff811fb94d <sysfs_lookup+29>:   mov    $0xffffffff81adc080,%rdi
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 772
0xffffffff811fb954 <sysfs_lookup+36>:   mov    $0xfffffffffffffffe,%r13
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 761
0xffffffff811fb95b <sysfs_lookup+43>:   mov    %rsi,%rbx
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 763
0xffffffff811fb95e <sysfs_lookup+46>:   mov    0x98(%rax),%r12
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 767
0xffffffff811fb965 <sysfs_lookup+53>:   callq  0xffffffff8150ff50 <mutex_lock>
/usr/src/debug/kernel-2.6.32-358.41.1.el6/linux-2.6.32-358.41.1.el6.x86_64/fs/sysfs/dir.c: 769
0xffffffff811fb96a <sysfs_lookup+58>:   mov    0x38(%rbx),%rsi


static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
                                struct nameidata *nd)
{
        struct dentry *ret = NULL;
        struct sysfs_dirent *parent_sd = dentry->d_parent->d_fsdata;
        struct sysfs_dirent *sd;
        struct inode *inode;

        mutex_lock(&sysfs_mutex); <<---------------

        sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);


crash> px sysfs_mutex
sysfs_mutex = $14 = {
  count = {
    counter = 0xffffffff
  }, 
  wait_lock = {
    raw_lock = {
      slock = 0x42b142b1
    }
  }, 
  wait_list = {
    next = 0xffff888fe77d3b88, 
    prev = 0xffff888fe77d3b88
  }, 
  owner = 0xffff886fd1f34000 <<------- Looks owner of the lock 
}

crash> kmem 0xffff886fd1f34000
    PID: 43867
COMMAND: "insmod"
   TASK: ffff886fe8d63540  [THREAD_INFO: ffff886fd1f34000]
    CPU: 40
  STATE: TASK_RUNNING (PANIC)

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea01875ed360 6fd1f34000                0 ffff882fe784f400  1 1c0000000000000
crash> 
  • Looks like this was acquired by insmod process only
  • The above process is waiting for 679 mili seconds for the mutex
crash> eval (15560729425844 - 15560049632356)
hexadecimal: 2884d350  
    decimal: 679793488  
      octal: 5041151520
     binary: 0000000000000000000000000000000000101000100001001101001101010000
crash> eval (679793488/1000000)
hexadecimal: 2a7  
    decimal: 679 mili seconds.
  • Looks like the insmod process is over CPU-40 for around 15 mili seconds
crash> eval (15560049675247 - 15560034495911)
hexadecimal: e79e48  
    decimal: 15179336  
      octal: 71717110
     binary: 0000000000000000000000000000000000000000111001111001111001001000
crash> eval (15179336 /1000000)
hexadecimal: f  
    decimal: 15 mili seconds. 
  • Looks like the mutex was acquired in following function for insmod
void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
                       struct sysfs_dirent *parent_sd)
{
        struct inode *inode;

        memset(acxt, 0, sizeof(*acxt));
        acxt->parent_sd = parent_sd;

        /* Lookup parent inode.  inode initialization is protected by
         * sysfs_mutex, so inode existence can be determined by
         * looking up inode while holding sysfs_mutex.
         */
        mutex_lock(&sysfs_mutex); <<--------------

        inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments