BUG: soft lockup - CPU#10 stuck for 67s! [mozStorage #5:23225]

Latest response

I'm experiencing multiple soft lockups on a RHEL 6.5 box (IBM x3650M3) since a while and can't find a solution.
Does anybody have an idea?

Kind regards

Nov 26 10:27:32 ltsp-1 kernel: BUG: soft lockup - CPU#10 stuck for 67s! [mozStorage #5:23225]
Nov 26 10:27:32 ltsp-1 kernel: Modules linked in: fuse mpt2sas scsi_transport_sas raid_class mptctl mptbase nfsd exportfs autofs4 coretemp nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc ipv6 vfat fat vhost_net macvtap macvlan tun kvm_intel kvm microcode iTCO_wdt iTCO_vendor_support serio_raw e1000e ptp pps_core bnx2 sg i2c_i801 i2c_core lpc_ich mfd_core ioatdma dca i7core_edac edac_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 26 10:27:32 ltsp-1 kernel: CPU 10
Nov 26 10:27:32 ltsp-1 kernel: Modules linked in: fuse mpt2sas scsi_transport_sas raid_class mptctl mptbase nfsd exportfs autofs4 coretemp nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc ipv6 vfat fat vhost_net macvtap macvlan tun kvm_intel kvm microcode iTCO_wdt iTCO_vendor_support serio_raw e1000e ptp pps_core bnx2 sg i2c_i801 i2c_core lpc_ich mfd_core ioatdma dca i7core_edac edac_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 26 10:27:32 ltsp-1 kernel:
Nov 26 10:27:32 ltsp-1 kernel: Pid: 23225, comm: mozStorage #5 Tainted: G I--------------- 2.6.32-431.el6.x86_64 #1 IBM System x3650 M3 -[7945M2G]-/69Y4438
Nov 26 10:27:32 ltsp-1 kernel: RIP: 0010:[] [] nfs_iocounter_wait+0xa8/0xe0 [nfs]
Nov 26 10:27:32 ltsp-1 kernel: RSP: 0018:ffff88049de4dd28 EFLAGS: 00000246
Nov 26 10:27:32 ltsp-1 kernel: RAX: 0000000000000246 RBX: ffff88049de4dd98 RCX: 000000000000406e
Nov 26 10:27:32 ltsp-1 kernel: RDX: ffff8800280347c0 RSI: 0000000000000246 RDI: 0000000000000246
Nov 26 10:27:32 ltsp-1 kernel: RBP: ffffffff8100bb8e R08: 0000000000000001 R09: 00000000ffffffff
Nov 26 10:27:32 ltsp-1 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff88049de4dd18
Nov 26 10:27:32 ltsp-1 kernel: R13: ffffffff8100b9ce R14: 000000000000f9d4 R15: 0000000000000246
Nov 26 10:27:32 ltsp-1 kernel: FS: 00007ff82d9ff700(0000) GS:ffff88003d740000(0000) knlGS:0000000000000000
Nov 26 10:27:32 ltsp-1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 10:27:32 ltsp-1 kernel: CR2: 00007f11f9c3c008 CR3: 0000000550b51000 CR4: 00000000000007e0
Nov 26 10:27:32 ltsp-1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 26 10:27:32 ltsp-1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 26 10:27:32 ltsp-1 kernel: Process mozStorage #5 (pid: 23225, threadinfo ffff88049de4c000, task ffff8802f99da080)
Nov 26 10:27:32 ltsp-1 kernel: Stack:
Nov 26 10:27:32 ltsp-1 kernel: ffff88055815f2f0 0000000000000000 0000000000000000 ffff8802f99da080
Nov 26 10:27:32 ltsp-1 kernel: ffffffff8109b320 ffff8800280347c8 ffff8800280347c8 0000000000000000
Nov 26 10:27:32 ltsp-1 kernel: 0000000000000000 ffff880628a0aa40 0000000000000006 ffff88031b9b9ef8
Nov 26 10:27:32 ltsp-1 kernel: Call Trace:
Nov 26 10:27:32 ltsp-1 kernel: [] ? wake_bit_function+0x0/0x50
Nov 26 10:27:32 ltsp-1 kernel: [] ? do_unlk+0x60/0xd0 [nfs]
Nov 26 10:27:32 ltsp-1 kernel: [] ? nfs_lock+0xfe/0x1d0 [nfs]
Nov 26 10:27:32 ltsp-1 kernel: [] ? vfs_lock_file+0x23/0x40
Nov 26 10:27:32 ltsp-1 kernel: [] ? fcntl_setlk+0x177/0x320
Nov 26 10:27:32 ltsp-1 kernel: [] ? audit_syscall_entry+0x1d7/0x200
NoNov 26 10:27:32 ltsp-1 kernel: [] ? sys_fcntl+0x197/0x530
Nov 26 10:27:32 ltsp-1 kernel: [] ? tracesys+0xd9/0xde
Nov 26 10:27:32 ltsp-1 kernel: Code: b3 09 81 65 48 8b 14 25 c0 cb 00 00 48 89 45 b8 48 89 55 a8 48 89 45 c0 0f 1f 00 ba 82 00 00 00 4c 89 ee 4c 89 e7 e8 f8 b9 ce e0 80 0b 01 8b 43 08 85 c0 74 12 48 89 df e8 95 63 ff ff 41 89
Nov 26 10:27:32 ltsp-1 kernel: Call Trace:
Nov 26 10:27:32 ltsp-1 kernel: [] ? nfs_iocounter_wait+0xa8/0xe0 [nfs]
Nov 26 10:27:32 ltsp-1 kernel: [] ? wake_bit_function+0x0/0x50
Nov 26 10:27:32 ltsp-1 kernel: [] ? do_unlk+0x60/0xd0 [nfs]
Nov 26 10:27:32 ltsp-1 kernel: [] ? nfs_lock+0xfe/0x1d0 [nfs]
Nov 26 10:27:32 ltsp-1 kernel: [] ? vfs_lock_file+0x23/0x40
Nov 26 10:27:32 ltsp-1 kernel: [] ? fcntl_setlk+0x177/0x320
Nov 26 10:27:32 ltsp-1 kernel: [] ? audit_syscall_entry+0x1d7/0x200
Nov 26 10:27:32 ltsp-1 kernel: [] ? ftrace_raw_event_sys_enter+0xd9/0x130
Nov 26 10:27:32 ltsp-1 kernel: [] ? sys_fcntl+0x197/0x530
Nov 26 10:27:32 ltsp-1 kernel: [] ? tracesys+0xd9/0xde
Nov 26 10:27:32 ltsp-1 kernel: [] ? ftrace_raw_event_sys_enter+0xd9/0x130

Responses

I'm seeing this too. 'mozStorage' is a firefox/thunderbird thingy. On my server, the BUG message may be associated with a thunderbird user with a huge inbox, but I haven't confirmed that.

Hi Karl. Difficult to diagnose from the information here so the best option here is probably to open a support case.

David, I've just now checked this page and seen your reply to my comment of July 14! Thanks for responding. I'm still seeing this problem occassionally, but as a small player in a large institution I'n not authorized to open support cases with RedHat. In any case, my impression is that it's a problem caused by Mozilla's choice of sqlite for the mozStorage DB. Also, the last RHEL6 kernel update may have mitigited the severity of the problem: rather than locking up the system for good, the kernel triggers a hung_task_timeout and recovers, see sample log messages:

Oct  2 08:26:23 chinstrap kernel: INFO: task flush-0:37:14336 blocked for more than 120 seconds.
Oct  2 08:26:23 chinstrap kernel:      Not tainted 2.6.32-431.23.3.el6.x86_64 #1
Oct  2 08:26:23 chinstrap kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  2 08:26:23 chinstrap kernel: flush-0:37    D 0000000000000003     0 14336      2 0x00000080
Oct  2 08:26:23 chinstrap kernel: ffff88105e661c00 0000000000000046 0000000000000000 0000000000000000
Oct  2 08:26:23 chinstrap kernel: ffff88105e661ce0 ffffffff810598e4 ffff88105e661c80 ffff88105e661bf0
Oct  2 08:26:23 chinstrap kernel: ffff88105e4dfab8 ffff88105e661fd8 000000000000fbc8 ffff88105e4dfab8
Oct  2 08:26:23 chinstrap kernel: Call Trace:
Oct  2 08:26:23 chinstrap kernel: [<ffffffff810598e4>] ? find_busiest_group+0x244/0x9e0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811a58b0>] ? inode_wait+0x0/0x20
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811a58be>] inode_wait+0xe/0x20
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81529e5f>] __wait_on_bit+0x5f/0x90
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811b4cd8>] inode_wait_for_writeback+0x98/0xc0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811b6225>] wb_writeback+0x205/0x410
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81528c0e>] ? thread_return+0x4e/0x760
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81084aa2>] ? del_timer_sync+0x22/0x30
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811b65d5>] wb_do_writeback+0x1a5/0x240
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811b66d3>] bdi_writeback_task+0x63/0x1b0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8109ae27>] ? bit_waitqueue+0x17/0xd0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811437e0>] ? bdi_start_fn+0x0/0x100
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81143866>] bdi_start_fn+0x86/0x100
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811437e0>] ? bdi_start_fn+0x0/0x100
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Oct  2 08:26:23 chinstrap kernel: INFO: task mozStorage #2:14621 blocked for more than 120 seconds.
Oct  2 08:26:23 chinstrap kernel:      Not tainted 2.6.32-431.23.3.el6.x86_64 #1
Oct  2 08:26:23 chinstrap kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  2 08:26:23 chinstrap kernel: mozStorage #2 D 0000000000000004     0 14621      1 0x00000080
Oct  2 08:26:23 chinstrap kernel: ffff880825045aa8 0000000000000082 ffff88087379de00 ffff880873519840
Oct  2 08:26:23 chinstrap kernel: ffffe8f7a8e69d40 0000000000000282 ffff880825f5a1b8 0000000000000282
Oct  2 08:26:23 chinstrap kernel: ffff880825c5dab8 ffff880825045fd8 000000000000fbc8 ffff880825c5dab8
Oct  2 08:26:23 chinstrap kernel: Call Trace:
Oct  2 08:26:23 chinstrap kernel: [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8111f7e0>] ? sync_page+0x0/0x50
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81529393>] io_schedule+0x73/0xc0
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8111f81d>] sync_page+0x3d/0x50
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81529e5f>] __wait_on_bit+0x5f/0x90
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8111fa53>] wait_on_page_bit+0x73/0x80
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
Oct  2 08:26:23 chinstrap kernel: [<ffffffff81135ac5>] ? pagevec_lookup_tag+0x25/0x40
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8111fe7b>] wait_on_page_writeback_range+0xfb/0x190
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8111ff3f>] filemap_fdatawait+0x2f/0x40
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811b5b00>] writeback_single_inode+0x220/0x290
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811b5ba3>] sync_inode+0x33/0x50
Oct  2 08:26:23 chinstrap kernel: [<ffffffffa0422573>] nfs_wb_all+0x43/0x50 [nfs]
Oct  2 08:26:23 chinstrap kernel: [<ffffffffa0413e7d>] nfs_sync_mapping+0x2d/0x40 [nfs]
Oct  2 08:26:23 chinstrap kernel: [<ffffffffa0411990>] do_unlk+0x40/0xd0 [nfs]
Oct  2 08:26:23 chinstrap kernel: [<ffffffffa0411d1e>] nfs_lock+0xfe/0x1d0 [nfs]
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811dac33>] vfs_lock_file+0x23/0x40
Oct  2 08:26:23 chinstrap kernel: [<ffffffff811dae87>] fcntl_setlk+0x177/0x320
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8119dd17>] sys_fcntl+0x197/0x530
Oct  2 08:26:23 chinstrap kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Oct  2 08:26:36 chinstrap utauthd:...[mozstorage timeout no longer a fatal problem]...

Thanks again for your response, and since this problem is rare I guess we'll just muddle along as we have hitherto.

Thanks for the followup, Karl. Sorry to hear you're not in a position to open a case at the moment. Hopefully with the extra info here someone from the community will recognize the issue and be able to assist.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.