BUG: soft lockup - CPU#10 stuck for 67s! [mozStorage #5:23225]
I'm experiencing multiple soft lockups on a RHEL 6.5 box (IBM x3650M3) since a while and can't find a solution.
Does anybody have an idea?
Kind regards
Nov 26 10:27:32 ltsp-1 kernel: BUG: soft lockup - CPU#10 stuck for 67s! [mozStorage #5:23225]
Nov 26 10:27:32 ltsp-1 kernel: Modules linked in: fuse mpt2sas scsi_transport_sas raid_class mptctl mptbase nfsd exportfs autofs4 coretemp nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc ipv6 vfat fat vhost_net macvtap macvlan tun kvm_intel kvm microcode iTCO_wdt iTCO_vendor_support serio_raw e1000e ptp pps_core bnx2 sg i2c_i801 i2c_core lpc_ich mfd_core ioatdma dca i7core_edac edac_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 26 10:27:32 ltsp-1 kernel: CPU 10
Nov 26 10:27:32 ltsp-1 kernel: Modules linked in: fuse mpt2sas scsi_transport_sas raid_class mptctl mptbase nfsd exportfs autofs4 coretemp nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc ipv6 vfat fat vhost_net macvtap macvlan tun kvm_intel kvm microcode iTCO_wdt iTCO_vendor_support serio_raw e1000e ptp pps_core bnx2 sg i2c_i801 i2c_core lpc_ich mfd_core ioatdma dca i7core_edac edac_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 26 10:27:32 ltsp-1 kernel:
Nov 26 10:27:32 ltsp-1 kernel: Pid: 23225, comm: mozStorage #5 Tainted: G I--------------- 2.6.32-431.el6.x86_64 #1 IBM System x3650 M3 -[7945M2G]-/69Y4438
Nov 26 10:27:32 ltsp-1 kernel: RIP: 0010:[
Nov 26 10:27:32 ltsp-1 kernel: RSP: 0018:ffff88049de4dd28 EFLAGS: 00000246
Nov 26 10:27:32 ltsp-1 kernel: RAX: 0000000000000246 RBX: ffff88049de4dd98 RCX: 000000000000406e
Nov 26 10:27:32 ltsp-1 kernel: RDX: ffff8800280347c0 RSI: 0000000000000246 RDI: 0000000000000246
Nov 26 10:27:32 ltsp-1 kernel: RBP: ffffffff8100bb8e R08: 0000000000000001 R09: 00000000ffffffff
Nov 26 10:27:32 ltsp-1 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff88049de4dd18
Nov 26 10:27:32 ltsp-1 kernel: R13: ffffffff8100b9ce R14: 000000000000f9d4 R15: 0000000000000246
Nov 26 10:27:32 ltsp-1 kernel: FS: 00007ff82d9ff700(0000) GS:ffff88003d740000(0000) knlGS:0000000000000000
Nov 26 10:27:32 ltsp-1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 10:27:32 ltsp-1 kernel: CR2: 00007f11f9c3c008 CR3: 0000000550b51000 CR4: 00000000000007e0
Nov 26 10:27:32 ltsp-1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 26 10:27:32 ltsp-1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 26 10:27:32 ltsp-1 kernel: Process mozStorage #5 (pid: 23225, threadinfo ffff88049de4c000, task ffff8802f99da080)
Nov 26 10:27:32 ltsp-1 kernel: Stack:
Nov 26 10:27:32 ltsp-1 kernel: ffff88055815f2f0 0000000000000000 0000000000000000 ffff8802f99da080
Nov 26 10:27:32 ltsp-1 kernel:
Nov 26 10:27:32 ltsp-1 kernel:
Nov 26 10:27:32 ltsp-1 kernel: Call Trace:
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
NoNov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: Code: b3 09 81 65 48 8b 14 25 c0 cb 00 00 48 89 45 b8 48 89 55 a8 48 89 45 c0 0f 1f 00 ba 82 00 00 00 4c 89 ee 4c 89 e7 e8 f8 b9 ce e0
Nov 26 10:27:32 ltsp-1 kernel: Call Trace:
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Nov 26 10:27:32 ltsp-1 kernel: [
Responses
I'm seeing this too. 'mozStorage' is a firefox/thunderbird thingy. On my server, the BUG message may be associated with a thunderbird user with a huge inbox, but I haven't confirmed that.
Hi Karl. Difficult to diagnose from the information here so the best option here is probably to open a support case.
David, I've just now checked this page and seen your reply to my comment of July 14! Thanks for responding. I'm still seeing this problem occassionally, but as a small player in a large institution I'n not authorized to open support cases with RedHat. In any case, my impression is that it's a problem caused by Mozilla's choice of sqlite for the mozStorage DB. Also, the last RHEL6 kernel update may have mitigited the severity of the problem: rather than locking up the system for good, the kernel triggers a hung_task_timeout and recovers, see sample log messages:
Oct 2 08:26:23 chinstrap kernel: INFO: task flush-0:37:14336 blocked for more than 120 seconds.
Oct 2 08:26:23 chinstrap kernel: Not tainted 2.6.32-431.23.3.el6.x86_64 #1
Oct 2 08:26:23 chinstrap kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 2 08:26:23 chinstrap kernel: flush-0:37 D 0000000000000003 0 14336 2 0x00000080
Oct 2 08:26:23 chinstrap kernel: ffff88105e661c00 0000000000000046 0000000000000000 0000000000000000
Oct 2 08:26:23 chinstrap kernel: ffff88105e661ce0 ffffffff810598e4 ffff88105e661c80 ffff88105e661bf0
Oct 2 08:26:23 chinstrap kernel: ffff88105e4dfab8 ffff88105e661fd8 000000000000fbc8 ffff88105e4dfab8
Oct 2 08:26:23 chinstrap kernel: Call Trace:
Oct 2 08:26:23 chinstrap kernel: [<ffffffff810598e4>] ? find_busiest_group+0x244/0x9e0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811a58b0>] ? inode_wait+0x0/0x20
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811a58be>] inode_wait+0xe/0x20
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81529e5f>] __wait_on_bit+0x5f/0x90
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811b4cd8>] inode_wait_for_writeback+0x98/0xc0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811b6225>] wb_writeback+0x205/0x410
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81528c0e>] ? thread_return+0x4e/0x760
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81084aa2>] ? del_timer_sync+0x22/0x30
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811b65d5>] wb_do_writeback+0x1a5/0x240
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811b66d3>] bdi_writeback_task+0x63/0x1b0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8109ae27>] ? bit_waitqueue+0x17/0xd0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811437e0>] ? bdi_start_fn+0x0/0x100
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81143866>] bdi_start_fn+0x86/0x100
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811437e0>] ? bdi_start_fn+0x0/0x100
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Oct 2 08:26:23 chinstrap kernel: INFO: task mozStorage #2:14621 blocked for more than 120 seconds.
Oct 2 08:26:23 chinstrap kernel: Not tainted 2.6.32-431.23.3.el6.x86_64 #1
Oct 2 08:26:23 chinstrap kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 2 08:26:23 chinstrap kernel: mozStorage #2 D 0000000000000004 0 14621 1 0x00000080
Oct 2 08:26:23 chinstrap kernel: ffff880825045aa8 0000000000000082 ffff88087379de00 ffff880873519840
Oct 2 08:26:23 chinstrap kernel: ffffe8f7a8e69d40 0000000000000282 ffff880825f5a1b8 0000000000000282
Oct 2 08:26:23 chinstrap kernel: ffff880825c5dab8 ffff880825045fd8 000000000000fbc8 ffff880825c5dab8
Oct 2 08:26:23 chinstrap kernel: Call Trace:
Oct 2 08:26:23 chinstrap kernel: [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8111f7e0>] ? sync_page+0x0/0x50
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81529393>] io_schedule+0x73/0xc0
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8111f81d>] sync_page+0x3d/0x50
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81529e5f>] __wait_on_bit+0x5f/0x90
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8111fa53>] wait_on_page_bit+0x73/0x80
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
Oct 2 08:26:23 chinstrap kernel: [<ffffffff81135ac5>] ? pagevec_lookup_tag+0x25/0x40
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8111fe7b>] wait_on_page_writeback_range+0xfb/0x190
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8111ff3f>] filemap_fdatawait+0x2f/0x40
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811b5b00>] writeback_single_inode+0x220/0x290
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811b5ba3>] sync_inode+0x33/0x50
Oct 2 08:26:23 chinstrap kernel: [<ffffffffa0422573>] nfs_wb_all+0x43/0x50 [nfs]
Oct 2 08:26:23 chinstrap kernel: [<ffffffffa0413e7d>] nfs_sync_mapping+0x2d/0x40 [nfs]
Oct 2 08:26:23 chinstrap kernel: [<ffffffffa0411990>] do_unlk+0x40/0xd0 [nfs]
Oct 2 08:26:23 chinstrap kernel: [<ffffffffa0411d1e>] nfs_lock+0xfe/0x1d0 [nfs]
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811dac33>] vfs_lock_file+0x23/0x40
Oct 2 08:26:23 chinstrap kernel: [<ffffffff811dae87>] fcntl_setlk+0x177/0x320
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8119dd17>] sys_fcntl+0x197/0x530
Oct 2 08:26:23 chinstrap kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Oct 2 08:26:36 chinstrap utauthd:...[mozstorage timeout no longer a fatal problem]...
Thanks again for your response, and since this problem is rare I guess we'll just muddle along as we have hitherto.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
