Thread overran stack, or stack corrupted while running XFS
Environment
- Red Hat Enterprise Linux 5.7 or later
- Red Hat Enterprise Linux 6
- XFS (Scalable File System Add-On)
- Sometimes with NFS server (
nfsd
) exporting XFS filesystem
Issue
- Kernel panic where
nfsd
crashes with XFS symbols in a very long backtrace one of the following is seen in the kernel ring buffer:- Instruction pointer is
print_context_stack
and Thread overran stack, or stack corrupted - Instruction pointer is
__schedule_bug
and scheduling while atomic
- Instruction pointer is
Resolution
Run one of the following kernels or later:
* RHEL 6.6.z: kernel-2.6.32-504.2.1.el6
* RHEL 6.5.z: kernel-2.6.32-431.46.1.el6
* RHEL 6.4.z: kernel-2.6.32-358.52.1.el6
Root Cause
- It is possible for the allocator of XFS to exceed the default kernel thread stack size. The solution was to split the allocator stack by offloading extent allocation into a workqueue so it uses a separate stack.
- Previous stack overrun issues in XFS have been addressed in BZ#693280 and BZ#918359, those fixes are available in:
kernel-2.6.32-358.28.1.el6
for EUS/AUS subscription only (i.e. RHEL6.4.x) Errata RHBA-2013:1770kernel-2.6.32-431.el6
(RHEL6.5 or later) Errata RHSA-2013:1645
- Another stack overrun issue was addressed in BZ#1020574, fix available in:
- Another stack overrun issue that was being tracked in BZ#1028831 has been addressed in:
- The above fix caused a regression which was tracked in BZ#1133304 and addressed in:
kernel-2.6.32-504.2.1.el6
kernel-2.6.32-431.46.1.el6
kernel-2.6.32-358.52.1.el6
- Another case involving direct I/O was tracked at BZ#1085148 but was thought to be a duplicate of the previous Bug 1028831.
Diagnostic Steps
- If you have an affected kernel from the Environment section and can confirm a matching backtrace, no further confirmation is needed.
- Running the debug kernel may help as it contains a second kernel stack size check.
- Look for messages indicating that the stack has overflowed.
- There were two panics. Before panic, there were ~5 simultaneous NFS writes (reading from another machine and writing to this one).
Turn on the stack depth checking functions to determine what is happening:
# mount -t debugfs nodev /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
and periodically grab the output of:
# cat /sys/kernel/debug/tracing/stack_max_size
# cat /sys/kernel/debug/tracing/stack_trace
That will report the highest stack usage to date. For example, leave this command running:
# while true ; do date ; cat /sys/kernel/debug/tracing/stack_max_size ; cat /sys/kernel/debug/tracing/stack_trace ; echo --- ; sleep 60 ; done | tee /var/log/stack_trace.log
If you see the stack max size value exceed ~7200 bytes then you may have found the culprit. If the system panics before you can get this stack trace then the information will be contained with the vmcore so it can be analysed to retrieve it.
Here is an example of the stack dumper running at 1 minute intervals, so the following dump is just a minute or so before the crash. Previously, the highest stack usage was 7192 bytes for about 4 days. This example is for a block allocation in the extent tree after allocating a data extent.
7256
------------
Depth Size Location (54 entries)
----- ---- --------
0) 7096 48 __call_rcu+0x62/0x160
1) 7048 16 call_rcu_sched+0x15/0x20
2) 7032 16 call_rcu+0xe/0x10
3) 7016 272 radix_tree_delete+0x150/0x2b0
4) 6744 32 __remove_from_page_cache+0x21/0xe0
5) 6712 64 __remove_mapping+0xa0/0x160
6) 6648 272 shrink_page_list.clone.0+0x37d/0x540
7) 6376 432 shrink_inactive_list+0x2f5/0x740
8) 5944 176 shrink_zone+0x38f/0x520
9) 5768 224 zone_reclaim+0x354/0x410
10) 5544 304 get_page_from_freelist+0x694/0x820
11) 5240 256 __alloc_pages_nodemask+0x111/0x850
12) 4984 48 kmem_getpages+0x62/0x170
13) 4936 112 cache_grow+0x2cf/0x320
14) 4824 112 cache_alloc_refill+0x202/0x240
15) 4712 64 kmem_cache_alloc+0x15f/0x190
16) 4648 64 kmem_zone_alloc+0x9a/0xe0 [xfs]
17) 4584 32 kmem_zone_zalloc+0x1e/0x50 [xfs]
18) 4552 80 xfs_allocbt_init_cursor+0x4c/0xc0 [xfs]
19) 4472 16 xfs_allocbt_dup_cursor+0x2c/0x30 [xfs]
20) 4456 128 xfs_btree_dup_cursor+0x33/0x180 [xfs]
21) 4328 192 xfs_alloc_ag_vextent_near+0x5fc/0xb70 [xfs]
22) 4136 32 xfs_alloc_ag_vextent+0xd5/0x130 [xfs]
23) 4104 96 xfs_alloc_vextent+0x45f/0x600 [xfs]
24) 4008 160 xfs_bmbt_alloc_block+0xc5/0x1d0 [xfs]
25) 3848 240 xfs_btree_split+0xbd/0x710 [xfs]
26) 3608 96 xfs_btree_make_block_unfull+0x12d/0x190 [xfs]
27) 3512 224 xfs_btree_insrec+0x3ef/0x5a0 [xfs]
28) 3288 144 xfs_btree_insert+0x93/0x180 [xfs]
29) 3144 272 xfs_bmap_add_extent_delay_real+0xe7e/0x18d0 [xfs]
30) 2872 208 xfs_bmap_add_extent+0x3ff/0x420 [xfs]
31) 2664 432 xfs_bmapi+0xb14/0x11a0 [xfs]
32) 2232 272 xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]
33) 1960 208 xfs_iomap+0x389/0x440 [xfs]
34) 1752 32 xfs_map_blocks+0x2d/0x40 [xfs]
35) 1720 272 xfs_page_state_convert+0x2f8/0x750 [xfs]
36) 1448 80 xfs_vm_writepage+0x86/0x170 [xfs]
37) 1368 32 __writepage+0x17/0x40
38) 1336 304 write_cache_pages+0x1c9/0x4a0
39) 1032 16 generic_writepages+0x24/0x30
40) 1016 48 xfs_vm_writepages+0x5e/0x80 [xfs]
41) 968 16 do_writepages+0x21/0x40
42) 952 128 __filemap_fdatawrite_range+0x5b/0x60
43) 824 48 filemap_write_and_wait_range+0x5a/0x90
44) 776 80 vfs_fsync_range+0x7e/0xe0
45) 696 16 vfs_fsync+0x1d/0x20
46) 680 64 nfsd_commit+0x6b/0xa0 [nfsd]
47) 616 64 nfsd3_proc_commit+0x9d/0x100 [nfsd]
48) 552 64 nfsd_dispatch+0xfe/0x240 [nfsd]
49) 488 128 svc_process_common+0x344/0x640 [sunrpc]
50) 360 32 svc_process+0x110/0x160 [sunrpc]
51) 328 48 nfsd+0xc2/0x160 [nfsd]
52) 280 96 kthread+0x96/0xa0
53) 184 184 child_rip+0xa/0x20
Another example (this is a user data extent allocation):
7272
------------
Depth Size Location (61 entries)
----- ---- --------
0) 7080 224 select_task_rq_fair+0x3be/0x980
1) 6856 112 try_to_wake_up+0x14a/0x400
2) 6744 16 wake_up_process+0x15/0x20
3) 6728 16 wakeup_softirqd+0x35/0x40
4) 6712 48 raise_softirq_irqoff+0x4f/0x90
5) 6664 48 __blk_complete_request+0x132/0x140
6) 6616 16 blk_complete_request+0x25/0x30
7) 6600 32 scsi_done+0x2f/0x60
8) 6568 48 megasas_queue_command+0xd1/0x140 [megaraid_sas]
9) 6520 48 scsi_dispatch_cmd+0x1ac/0x340
10) 6472 96 scsi_request_fn+0x415/0x590
11) 6376 32 __generic_unplug_device+0x32/0x40
12) 6344 112 __make_request+0x170/0x500
13) 6232 224 generic_make_request+0x21e/0x5b0
14) 6008 80 submit_bio+0x8f/0x120
15) 5928 112 _xfs_buf_ioapply+0x194/0x2f0 [xfs]
16) 5816 48 xfs_buf_iorequest+0x4f/0xe0 [xfs]
17) 5768 32 xlog_bdstrat+0x2a/0x60 [xfs]
18) 5736 80 xlog_sync+0x1e0/0x3f0 [xfs]
19) 5656 48 xlog_state_release_iclog+0xb3/0xf0 [xfs]
20) 5608 144 _xfs_log_force_lsn+0x1cc/0x270 [xfs]
21) 5464 32 xfs_log_force_lsn+0x18/0x40 [xfs]
22) 5432 80 xfs_alloc_search_busy+0x10c/0x160 [xfs]
23) 5352 112 xfs_alloc_get_freelist+0x113/0x170 [xfs]
24) 5240 48 xfs_allocbt_alloc_block+0x33/0x70 [xfs]
25) 5192 240 xfs_btree_split+0xbd/0x710 [xfs]
26) 4952 96 xfs_btree_make_block_unfull+0x12d/0x190 [xfs]
27) 4856 224 xfs_btree_insrec+0x3ef/0x5a0 [xfs]
28) 4632 144 xfs_btree_insert+0x93/0x180 [xfs]
29) 4488 176 xfs_free_ag_extent+0x414/0x7e0 [xfs]
30) 4312 224 xfs_alloc_fix_freelist+0xf4/0x480 [xfs]
31) 4088 96 xfs_alloc_vextent+0x173/0x600 [xfs]
32) 3992 240 xfs_bmap_btalloc+0x167/0x9d0 [xfs]
33) 3752 16 xfs_bmap_alloc+0xe/0x10 [xfs]
34) 3736 432 xfs_bmapi+0x9f6/0x11a0 [xfs]
35) 3304 272 xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]
36) 3032 208 xfs_iomap+0x389/0x440 [xfs]
37) 2824 32 xfs_map_blocks+0x2d/0x40 [xfs]
38) 2792 272 xfs_page_state_convert+0x2f8/0x750 [xfs]
39) 2520 80 xfs_vm_writepage+0x86/0x170 [xfs]
40) 2440 32 __writepage+0x17/0x40
41) 2408 304 write_cache_pages+0x1c9/0x4a0
42) 2104 16 generic_writepages+0x24/0x30
43) 2088 48 xfs_vm_writepages+0x5e/0x80 [xfs]
44) 2040 16 do_writepages+0x21/0x40
45) 2024 128 __filemap_fdatawrite_range+0x5b/0x60
46) 1896 48 filemap_write_and_wait_range+0x5a/0x90
47) 1848 320 xfs_write+0xa2f/0xb70 [xfs]
48) 1528 16 xfs_file_aio_write+0x61/0x70 [xfs]
49) 1512 304 do_sync_readv_writev+0xfb/0x140
50) 1208 224 do_readv_writev+0xcf/0x1f0
51) 984 16 vfs_writev+0x46/0x60
52) 968 208 nfsd_vfs_write+0x107/0x430 [nfsd]
53) 760 96 nfsd_write+0xe7/0x100 [nfsd]
54) 664 112 nfsd3_proc_write+0xaf/0x140 [nfsd]
55) 552 64 nfsd_dispatch+0xfe/0x240 [nfsd]
56) 488 128 svc_process_common+0x344/0x640 [sunrpc]
57) 360 32 svc_process+0x110/0x160 [sunrpc]
58) 328 48 nfsd+0xc2/0x160 [nfsd]
59) 280 96 kthread+0x96/0xa0
60) 184 184 child_rip+0xa/0x20
Another example shows an indirect consequence of the stack overrun problem. At the top of the stack is the thread_info structure and it holds details about a process including the pre-emption count. During a stack overrun this structure gets corrupted.
The first sign of trouble is the "scheduling while atomic" message and given the pre-empt count of 0xffff885f it looks like it's been overwritten by the high 32 bits of a kernel address.
BUG: scheduling while atomic: nfsd/18811/0xffff885f
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 xfs exportfs uinput power_meter sg shpchp bnx2x libcrc32c mdio dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_1fec5c9402a17fee36a363cad0278e9f_23941]
Pid: 18811, comm: nfsd Not tainted 2.6.32-358.18.1.el6.x86_64 #1
Call Trace:
BUG: unable to handle kernel paging request at fffffffd0684e0a0
IP: [<ffffffff81056904>] update_curr+0x144/0x1f0
PGD 1a87067 PUD 0
Oops: 0000 [#1] SMP
Looking closely at the stack trace at the time of the panic we see that it was scheduling out when it detected that the pre-emption count was wrong and printed a message to the console. While printing the message it received a timer interrupt that accessed the corrupted thread_info structure and caused the page fault which triggered the panic.
crash> bt
PID: 18811 TASK: ffff885fdb7c5500 CPU: 16 COMMAND: "nfsd"
#0 [ffff8800283039a0] machine_kexec at ffffffff81035d6b
#1 [ffff880028303a00] crash_kexec at ffffffff810c0e22
#2 [ffff880028303ad0] oops_end at ffffffff81511c20
#3 [ffff880028303b00] no_context at ffffffff81046c1b
#4 [ffff880028303b50] __bad_area_nosemaphore at ffffffff81046ea5
#5 [ffff880028303ba0] bad_area_nosemaphore at ffffffff81046f73
#6 [ffff880028303bb0] __do_page_fault at ffffffff810476d1
#7 [ffff880028303cd0] do_page_fault at ffffffff81513b6e
#8 [ffff880028303d00] page_fault at ffffffff81510f25
[exception RIP: update_curr+324]
RIP: ffffffff81056904 RSP: ffff880028303db8 RFLAGS: 00010082
RAX: ffff885fdb7c5500 RBX: ffffffffb098a040 RCX: ffff88302fef3240
RDX: 00000000000192d8 RSI: 0000000000000000 RDI: ffff885fdb7c5538
RBP: ffff880028303de8 R8: ffffffff8160bb65 R9: 0000000000000000
R10: 0000000000000010 R11: 0000000000000000 R12: ffff880028316768
R13: 00000000000643d2 R14: 000015e29f03c0bd R15: ffff885fdb7c5500
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff880028303df0] task_tick_fair at ffffffff81056ebb
#10 [ffff880028303e20] scheduler_tick at ffffffff8105ad01
#11 [ffff880028303e60] update_process_times at ffffffff810812fe
#12 [ffff880028303e90] tick_sched_timer at ffffffff810a80c6
#13 [ffff880028303ec0] __run_hrtimer at ffffffff8109b4ae
#14 [ffff880028303f10] hrtimer_interrupt at ffffffff8109b816
#15 [ffff880028303f90] smp_apic_timer_interrupt at ffffffff815177cb
#16 [ffff880028303fb0] apic_timer_interrupt at ffffffff8100bb93
--- <IRQ stack> ---
#17 [ffff885fb098a5d8] apic_timer_interrupt at ffffffff8100bb93
[exception RIP: vprintk+593]
RIP: ffffffff8106f341 RSP: ffff885fb098a680 RFLAGS: 00000246
RAX: 0000000000011480 RBX: ffff885fb098a710 RCX: 0000000000009f1f
RDX: ffff880028300000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffffffff8100bb8e R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000400
R13: 81eacf00e15c3280 R14: ffff885fe472eb00 R15: ffff885f00000001
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#18 [ffff885fb098a718] printk at ffffffff8150db21
#19 [ffff885fb098a778] print_trace_address at ffffffff8100f2b1
#20 [ffff885fb098a7a8] print_context_stack at ffffffff8100f4d1
#21 [ffff885fb098a818] dump_trace at ffffffff8100e4a0
#22 [ffff885fb098a8b8] show_trace_log_lvl at ffffffff8100f245
#23 [ffff885fb098a8e8] show_trace at ffffffff8100f275
#24 [ffff885fb098a8f8] dump_stack at ffffffff8150d96a
#25 [ffff885fb098a938] __schedule_bug at ffffffff8105ab56
#26 [ffff885fb098a958] thread_return at ffffffff8150e730
#27 [ffff885fb098aa18] schedule_timeout at ffffffff8150efa5
...
Looking at what is on the stack after the interrupt we see various device mapper, scsi and memory allocation routines suggesting something used up a lot of stack space before issuing I/O requests.
crash> rd -S 0xffff885fb098a000 1024
ffff885fb098a000: [task_struct] default_exec_domain
ffff885fb098a010: 0000000000000000 0000885fb098a040
ffff885fb098a020: zone_statistics+153 0000000000000001
ffff885fb098a030: [size-128] ffff88000001c9c0
ffff885fb098a040: ffff885fb098a160 0000000000000082
ffff885fb098a050: 0000000000000000 0000000000000000
ffff885fb098a060: 0000000000000002 0000000057ac6e9d
ffff885fb098a070: 00000000ea11666a 00000040ffffffff
ffff885fb098a080: 0000000000000000 ffff880000036868
ffff885fb098a090: 0000000229b2c2cb 0000000000000000
ffff885fb098a0a0: ffff880000024a18 00000037ffffffc8
ffff885fb098a0b0: ffff880000036860 0000000000000000
ffff885fb098a0c0: 69f6fac800000041 96c18ad8a7ca600e
ffff885fb098a0d0: ffff880000036868 0000000000000001
ffff885fb098a0e0: ffff885fb098a000 0000000000013200
ffff885fb098a0f0: 0000000016652c27 0000000000000000
ffff885fb098a100: [size-128] 0007122000000000
ffff885fb098a110: 0000000000000082 0000000000000010
ffff885fb098a120: a7e3a8951504bd41 ffff88000001c9c0
ffff885fb098a130: 375764b5b640d7fb 0000000000000000
ffff885fb098a140: 0000000000000002 ffff880000036860
ffff885fb098a150: [task_struct] 0000000000051220
ffff885fb098a160: ffff885fb098a2a0 __alloc_pages_nodemask+275
ffff885fb098a170: ffff88000001c9c0 45af403c00000000
ffff885fb098a180: 8708161197db24a9 99853489020d2e4b
ffff885fb098a190: d5fba7562c5b444a 2c2ab5bc559b505b
ffff885fb098a1a0: 5e27845c4364ecdc 73eca0b509e6e0d0
ffff885fb098a1b0: 282df71b1acb23ba dea542689c3d826a
ffff885fb098a1c0: f616b2a9fa898d62 593750a7e13251f1
ffff885fb098a1d0: f64e793dc2787bc9 6db8c7746d0b8851
ffff885fb098a1e0: f543a24babb8c39d f8f8aa350544a140
ffff885fb098a1f0: 20757ed9744c25e7 6c8e40cb6b5fafe1
ffff885fb098a200: 5dcee9093609e831 3ac3f6f9075de3a4
ffff885fb098a210: ab18cc6b643531c6 28d163ac9e3b05f5
ffff885fb098a220: 00000000a008d448 0000000000071220
ffff885fb098a230: ffff880000036868 0000000000000000
ffff885fb098a240: 4d335349aaa63576 420b1d28a4b28bec
ffff885fb098a250: 4289172466de744e 5acd0a4b448b33b0
ffff885fb098a260: ffff88000001c9c0 f204ec17d10c4e82
ffff885fb098a270: 90578e9679aa2212 [kmem_cache]
ffff885fb098a280: 0000000000000000 0000000000000046
ffff885fb098a290: 0000000000000086 [kmem_cache]
ffff885fb098a2a0: ffff885fb098a2d0 [kmem_cache]
ffff885fb098a2b0: 0000000000000000 [kmem_cache]
ffff885fb098a2c0: [size-64] [size-128]
ffff885fb098a2d0: ffff885fb098a340 cache_grow+535
ffff885fb098a2e0: c8389a4904d5ed95 e99a32bb00000040
ffff885fb098a2f0: 5519bcc800000000 9a1d5a8e00000000
ffff885fb098a300: 00000000fc0aa124 [size-64]
ffff885fb098a310: a566f113b4447291 transfer_objects+92
ffff885fb098a320: [size-512] [kmem_cache]
ffff885fb098a330: [size-256] [size-128]
ffff885fb098a340: ffff885fb098a3b0 cache_alloc_refill+158
ffff885fb098a350: 82ba697e592d849c 000000005f427052
ffff885fb098a360: [size-128] 0005122006cc5098
ffff885fb098a370: [size-128] [size-128]
ffff885fb098a380: 467901bececa4954 [size-128]
ffff885fb098a390: 0000000000011220 [kmem_cache]
ffff885fb098a3a0: 0000000000011220 0000000000000046
ffff885fb098a3b0: ffff885fb098a3f0 0000000000000046
ffff885fb098a3c0: 060b5ea6e8f543ba [size-128]
ffff885fb098a3d0: 0000000000011220 ffff885fb098a430
ffff885fb098a3e0: ffff885fb098a448 [size-128]
ffff885fb098a3f0: ffff885fb098a400 mempool_alloc_slab+21
ffff885fb098a400: ffff885fb098a490 mempool_alloc+99
ffff885fb098a410: [size-1024] [size-128]
ffff885fb098a420: [task_struct] 0000000081167610
ffff885fb098a430: 4c132c5358c58623 00000000869bf4f9
ffff885fb098a440: [size-128] 000492200bd03c42
ffff885fb098a450: [size-128] [size-128]
ffff885fb098a460: f34a8d61b668f395 [scsi_cmd_cache]
ffff885fb098a470: 0000000000000080 0000000000000020
ffff885fb098a480: 0000000000000000 sg_init_table+48
ffff885fb098a490: [scsi_cmd_cache] 0000000000000080
ffff885fb098a4a0: ffff885fb098a510 __sg_alloc_table+126
ffff885fb098a4b0: [sgpool-8] scsi_sg_alloc
ffff885fb098a4c0: 0000000000000fe0 0000007fe73a8800
ffff885fb098a4d0: 0000000000000000 000000018137278a
ffff885fb098a4e0: ffff885fb098a540 [scsi_cmd_cache]
ffff885fb098a4f0: [dm_rq_target_io] [sgpool-8]
ffff885fb098a500: 01ff885fe75fe800 [blkdev_queue]
ffff885fb098a510: ffff885fb098a570 swiotlb_map_sg_attrs+121
ffff885fb098a520: [scsi_cmd_cache] 0000000000000000
ffff885fb098a530: [sgpool-8] 00000002e73a8800
ffff885fb098a540: ffff885fb098a560 [size-4096]
ffff885fb098a550: swiotlb_dma_ops [sgpool-8]
ffff885fb098a560: 0000000000000001 0000000000000002
ffff885fb098a570: ffff885fb098a5c0 scsi_dma_map+144
ffff885fb098a580: ffff885fb098a5c0 ffffffff00000001
ffff885fb098a590: [size-2048] [scsi_cmd_cache]
ffff885fb098a5a0: [size-8192] [sgpool-8]
ffff885fb098a5b0: [size-1024] ffff885fe472eb00
ffff885fb098a5c0: ffff885fb098a650 _scsih_qcmd+726
ffff885fb098a5d0: 0000000000011220 ffff885f00000001
ffff885fb098a5e0: ffff885fe472eb00 81eacf00e15c3280
...
Another example of a stack overrun in XFS:
WARNING: at kernel/sched_fair.c:1846 hrtick_start_fair+0x18b/0x190() (Not tainted)
Hardware name: PowerEdge R620
Modules linked in:
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:40/0000:40:01.0/0000:41:00.0/host7/port-7:1/end_device-7:1/target7:0:1/7:0:1:5/state
CPU 1
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter sg shpchp bnx2x libcrc32c mdio microcode dcdbas sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 4326, comm: xfslogd/1 Not tainted 2.6.32-358.24.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
RIP: 0010:[<ffffffff810522cc>] [<ffffffff810522cc>] check_preempt_curr+0x1c/0x90
RSP: 0018:ffff885fdffb7c10 EFLAGS: 00010082
RAX: ffffffff8160b6a0 RBX: ffffffff81c25700 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff885fe0551540 RDI: cccccccccccccccc
RBP: ffff885fdffb7c20 R08: 0000000000000001 R09: 000000000000001f
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c25700
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff8830d9000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fcec010d000 CR3: 0000005fe2b59000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process xfslogd/1 (pid: 4326, threadinfo ffff885fdffb6000, task ffff885fdf5ce040)
Stack:
0000000000000000 ffff885fe0551540 ffff885fdffb7c90 ffffffff810637c3
<d> 00000000ffffffff 0000000000000008 0000000000016700 0000000000000000
<d> ffff8830d900fec0 0000000000000086 0000000000016700 ffff885fdf5d7070
Call Trace:
[<ffffffff810637c3>] try_to_wake_up+0x213/0x3e0
[<ffffffff810639a2>] default_wake_function+0x12/0x20
[<ffffffff81051439>] __wake_up_common+0x59/0x90
[<ffffffff81055ac8>] __wake_up+0x48/0x70
[<ffffffffa02dc71c>] xlog_state_do_callback+0x1fc/0x2b0 [xfs]
[<ffffffffa02dc84e>] xlog_state_done_syncing+0x7e/0xb0 [xfs]
[<ffffffffa02dcfc9>] xlog_iodone+0x59/0xb0 [xfs]
[<ffffffffa02f7de0>] ? xfs_buf_iodone_work+0x0/0x50 [xfs]
[<ffffffffa02f7e06>] xfs_buf_iodone_work+0x26/0x50 [xfs]
[<ffffffff81090be0>] worker_thread+0x170/0x2a0
[<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81090a70>] ? worker_thread+0x0/0x2a0
[<ffffffff81096a36>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810969a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 6b ff ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 46 30 48 89 fb 48 8b bf 98 08 00 00 <48> 8b 4f 30 48 39 c8 74 51 48 81 f9 60 b8 60 81 74 24 48 c7 c2
RIP [<ffffffff810522cc>] check_preempt_curr+0x1c/0x90
RSP <ffff885fdffb7c10>
This worker thread was completing an I/O request and trying to wakeup the initiator of the I/O request. To do this it needs to access the thread_info structure of the initiating process and it was found to be corrupt.
The initiator of the I/O request is:
crash> bt
PID: 10230 TASK: ffff885fe0551540 CPU: 134 COMMAND: "nfsd"
bt: invalid kernel virtual address: 0 type: "stack contents"
bt: read of stack at 0 failed
It's so messed up that we cannot get a normal stack trace.
crash> task -R stack
PID: 10230 TASK: ffff885fe0551540 CPU: 134 COMMAND: "nfsd"
stack = 0xffff885fe0b8a000,
Dump out the entire stack space:
crash> bt -S 0xffff885fe0b8a000
PID: 10230 TASK: ffff885fe0551540 CPU: 134 COMMAND: "nfsd"
#0 [ffff885fe0b8a000] schedule at ffffffff8150e172
#1 [ffff885fe0b8a138] __alloc_pages_nodemask at ffffffff8112bc43
#2 [ffff885fe0b8a278] kmem_getpages at ffffffff81166e1a
#3 [ffff885fe0b8a2a8] cache_grow at ffffffff81167377
#4 [ffff885fe0b8a318] cache_alloc_refill at ffffffff81167640
#5 [ffff885fe0b8a388] vsnprintf at ffffffff81281550
#6 [ffff885fe0b8a428] sprintf at ffffffff81281720
#7 [ffff885fe0b8a488] string at ffffffff8127fed0
#8 [ffff885fe0b8a4c8] symbol_string at ffffffff81280001
#9 [ffff885fe0b8a618] pointer at ffffffff812806eb
#10 [ffff885fe0b8a718] __call_console_drivers at ffffffff8106e585
#11 [ffff885fe0b8a748] _call_console_drivers at ffffffff8106e5ea
#12 [ffff885fe0b8a768] release_console_sem at ffffffff8106ec38
#13 [ffff885fe0b8a7a8] vprintk at ffffffff8106f338
#14 [ffff885fe0b8a848] printk at ffffffff8150dbb5
#15 [ffff885fe0b8a8a8] print_modules at ffffffff810b2e20
#16 [ffff885fe0b8a8d8] warn_slowpath_common at ffffffff8106e3e2
#17 [ffff885fe0b8a918] warn_slowpath_null at ffffffff8106e43a
#18 [ffff885fe0b8a928] hrtick_start_fair at ffffffff8105781b
#19 [ffff885fe0b8a958] dequeue_task_fair at ffffffff8106690b
#20 [ffff885fe0b8a998] dequeue_task at ffffffff81055ede
#21 [ffff885fe0b8a9c8] deactivate_task at ffffffff81055f23
#22 [ffff885fe0b8a9d8] thread_return at ffffffff8150e299
The above entries don't make a lot of sense but do show that at some point the memory was overwritten.
The rest of the stack is valid and makes sense and is in the correct context with respect to the completing I/O above.
#23 [ffff885fe0b8aa98] xlog_state_get_iclog_space at ffffffffa02df048 [xfs]
#24 [ffff885fe0b8ab48] xlog_write at ffffffffa02df410 [xfs]
#25 [ffff885fe0b8ac18] xlog_cil_push at ffffffffa02e04a1 [xfs]
#26 [ffff885fe0b8ace8] xlog_cil_force_lsn at ffffffffa02e0c05 [xfs]
#27 [ffff885fe0b8ad68] _xfs_log_force at ffffffffa02deb68 [xfs]
#28 [ffff885fe0b8adb8] xfs_log_force at ffffffffa02def08 [xfs]
#29 [ffff885fe0b8ade8] xfs_buf_lock at ffffffffa02f7902 [xfs]
#30 [ffff885fe0b8ae18] _xfs_buf_find at ffffffffa02f7a45 [xfs]
#31 [ffff885fe0b8ae78] xfs_buf_get at ffffffffa02f7bc4 [xfs]
#32 [ffff885fe0b8aec8] xfs_trans_get_buf at ffffffffa02edf78 [xfs]
#33 [ffff885fe0b8af18] xfs_btree_get_bufs at ffffffffa02ba61e [xfs]
#34 [ffff885fe0b8af28] xfs_alloc_fix_freelist at ffffffffa02a5710 [xfs]
#35 [ffff885fe0b8b018] xfs_free_extent at ffffffffa02a5b48 [xfs]
#36 [ffff885fe0b8b0c8] xfs_bmap_finish at ffffffffa02af89d [xfs]
#37 [ffff885fe0b8b118] xfs_itruncate_finish at ffffffffa02d585f [xfs]
#38 [ffff885fe0b8b1c8] xfs_free_eofblocks at ffffffffa02f0c7e [xfs]
#39 [ffff885fe0b8b268] xfs_inactive at ffffffffa02f1260 [xfs]
#40 [ffff885fe0b8b2b8] xfs_fs_clear_inode at ffffffffa02feb00 [xfs]
#41 [ffff885fe0b8b2d8] clear_inode at ffffffff8119d31c
#42 [ffff885fe0b8b2f8] dispose_list at ffffffff8119d3f0
#43 [ffff885fe0b8b338] shrink_icache_memory at ffffffff8119d744
#44 [ffff885fe0b8b398] shrink_slab at ffffffff81131ffa
#45 [ffff885fe0b8b3f8] zone_reclaim at ffffffff81134ac9
#46 [ffff885fe0b8b4c8] get_page_from_freelist at ffffffff8112a6ac
#47 [ffff885fe0b8b5e8] __alloc_pages_nodemask at ffffffff8112bc43
#48 [ffff885fe0b8b728] alloc_pages_current at ffffffff81160c6a
#49 [ffff885fe0b8b758] __page_cache_alloc at ffffffff8111a237
#50 [ffff885fe0b8b788] __do_page_cache_readahead at ffffffff8112e9eb
#51 [ffff885fe0b8b818] ra_submit at ffffffff8112eb41
#52 [ffff885fe0b8b828] ondemand_readahead at ffffffff8112eeb5
#53 [ffff885fe0b8b888] page_cache_async_readahead at ffffffff8112f070
#54 [ffff885fe0b8b8d8] __generic_file_splice_read at ffffffff811b0c9f
#55 [ffff885fe0b8bb28] generic_file_splice_read at ffffffff811b0f2a
#56 [ffff885fe0b8bb58] xfs_file_splice_read at ffffffffa02fa8d0 [xfs]
#57 [ffff885fe0b8bbc8] do_splice_to at ffffffff811af18b
#58 [ffff885fe0b8bc08] splice_direct_to_actor at ffffffff811af48f
#59 [ffff885fe0b8bc78] nfsd_vfs_read at ffffffffa0544e70 [nfsd]
#60 [ffff885fe0b8bcf8] nfsd_read at ffffffffa05462f7 [nfsd]
#61 [ffff885fe0b8bd98] nfsd3_proc_read at ffffffffa054e7e5 [nfsd]
#62 [ffff885fe0b8bdd8] nfsd_dispatch at ffffffffa053f43e [nfsd]
#63 [ffff885fe0b8be18] svc_process_common at ffffffffa0425624 [sunrpc]
#64 [ffff885fe0b8be98] svc_process at ffffffffa0425c60 [sunrpc]
#65 [ffff885fe0b8beb8] nfsd at ffffffffa053fb62 [nfsd]
#66 [ffff885fe0b8bee8] kthread at ffffffff81096a36
#67 [ffff885fe0b8bf48] kernel_thread at ffffffff8100c0ca
Yet another example of a stack overflow:
------------[ cut here ]------------
WARNING: at kernel/sched_fair.c:1846 hrtick_start_fair+0x18b/0x190() (Not tainted)
Hardware name: PowerEdge R620
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter dcdbas microcode sg shpchp bnx2x libcrc32c mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 5588, comm: nfsd Not tainted 2.6.32-358.24.1.el6.x86_64 #1
Call Trace:
[<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
BUG: unable to handle kernel NULL pointer dereference at 00000000000009e8
IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
PGD 5fdb8ae067 PUD 5fdbee9067 PMD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/host8/port-8:1/end_device-8:1/target8:0:1/8:0:1:4/state
CPU 2
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter dcdbas microcode sg shpchp bnx2x libcrc32c mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 5588, comm: nfsd Not tainted 2.6.32-358.24.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
RIP: 0010:[<ffffffff8100f4dd>] [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
RSP: 0018:ffff885fe4060700 EFLAGS: 00010006
RAX: 0000000000000000 RBX: ffff885fe4060888 RCX: 000000000000892a
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff885fe4060760 R08: 00000000000270db R09: 00000000fffffffb
R10: 0000000000000001 R11: 000000000000000c R12: ffff885fe4060800
R13: ffff885fe4060000 R14: ffffffff81600460 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000009e8 CR3: 0000005fdb8b9000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 5588, threadinfo ffff885fe4060000, task ffff885fe73de080)
Stack:
ffff885fe40607a0 ffff885fe4061ff8 ffff885fe4060800 ffffffffffffe000
<d> ffffffff817c33b8 ffffffff8106e3e7 ffff885fe4060770 ffff885fe4060868
<d> 000000000000cbe0 ffffffff81600460 ffffffff817c33b8 ffff880028223fc0
Call Trace:
[<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
general protection fault: 0000 [#2] SMP
last sysfs file: /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/host8/port-8:1/end_device-8:1/target8:0:1/8:0:1:4/state
CPU 2
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter dcdbas microcode sg shpchp bnx2x libcrc32c mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 5588, comm: nfsd Not tainted 2.6.32-358.24.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
RIP: 0010:[<ffffffff8100f4dd>] [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
RSP: 0018:ffff885fe4060218 EFLAGS: 00010006
RAX: 01ffffff81ead340 RBX: ffff885fe4060728 RCX: 0000000000008a04
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff885fe4060278 R08: 0000000000000000 R09: ffffffff8163fde0
R10: 0000000000000001 R11: 0000000000000000 R12: ffff885fe4060760
R13: ffff885fe4060000 R14: ffffffff81600460 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000009e8 CR3: 0000005fdb8b9000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 5588, threadinfo ffff885fe4060000, task ffff885fe73de080)
Stack:
ffff885fe40602b8 ffff885fe4061ff8 ffff885fe4060760 ffffffffffffe000
<d> ffffffff8178ef53 ffffffff8106e3e7 ffff885fe40602e8 ffff885fe4060700
<d> 000000000000cbe0 ffffffff81600460 ffffffff8178ef53 ffff880028223fc0
Call Trace:
[<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff8100e4a0>] dump_trace+0x190/0x3b0
[<ffffffff8100f245>] show_trace_log_lvl+0x55/0x70
[<ffffffff8100dfba>] show_stack_log_lvl+0x9a/0x170
[<ffffffff8100e16f>] show_registers+0xdf/0x280
[<ffffffff81513d1a>] ? atomic_notifier_call_chain+0x1a/0x20
[<ffffffff81511bb3>] __die+0xb3/0xf0
[<ffffffff81046bf2>] no_context+0xd2/0x260
[<ffffffff81046ea5>] __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff81046f73>] bad_area_nosemaphore+0x13/0x20
[<ffffffff810476d1>] __do_page_fault+0x321/0x480
[<ffffffff8109ca9f>] ? up+0x2f/0x50
[<ffffffff8109ca9f>] ? up+0x2f/0x50
[<ffffffff8106ecff>] ? release_console_sem+0x1cf/0x220
[<ffffffff81513bfe>] do_page_fault+0x3e/0xa0
[<ffffffff81510fb5>] page_fault+0x25/0x30
[<ffffffff8100f4dd>] ? print_context_stack+0xad/0x140
[<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff8100e4a0>] dump_trace+0x190/0x3b0
[<ffffffff8105781b>] ? hrtick_start_fair+0x18b/0x190
[<ffffffff8100f245>] show_trace_log_lvl+0x55/0x70
[<ffffffff8100f275>] show_trace+0x15/0x20
[<ffffffff8150d9fe>] dump_stack+0x6f/0x76
[<ffffffff810b2e4a>] ? print_modules+0x5a/0xf0
[<ffffffff8106e3e7>] warn_slowpath_common+0x87/0xc0
[<ffffffff8106e43a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8105781b>] hrtick_start_fair+0x18b/0x190
[<ffffffff8106690b>] dequeue_task_fair+0x12b/0x130
[<ffffffff81055ede>] dequeue_task+0x8e/0xb0
[<ffffffff81055f23>] deactivate_task+0x23/0x30
[<ffffffff8150e299>] thread_return+0x127/0x76e
[<ffffffffa02f7352>] ? _xfs_buf_ioapply+0x162/0x1f0 [xfs]
[<ffffffffa02ddd9a>] ? xlog_bdstrat+0x2a/0x60 [xfs]
[<ffffffff8150f035>] schedule_timeout+0x215/0x2e0
[<ffffffffa02ddd9a>] ? xlog_bdstrat+0x2a/0x60 [xfs]
[<ffffffffa02df569>] ? xlog_sync+0x269/0x3e0 [xfs]
[<ffffffff8150ff52>] __down+0x72/0xb0
[<ffffffffa02f8a45>] ? _xfs_buf_find+0xe5/0x230 [xfs]
[<ffffffff8109cb61>] down+0x41/0x50
[<ffffffffa02f8a45>] ? _xfs_buf_find+0xe5/0x230 [xfs]
[<ffffffffa02f88b1>] xfs_buf_lock+0x51/0x100 [xfs]
[<ffffffffa02f8a45>] _xfs_buf_find+0xe5/0x230 [xfs]
[<ffffffffa02f8bc4>] xfs_buf_get+0x34/0x1b0 [xfs]
[<ffffffffa02eef78>] xfs_trans_get_buf+0xe8/0x180 [xfs]
[<ffffffffa02bb5a2>] xfs_btree_get_buf_block+0x52/0x90 [xfs]
[<ffffffffa02bedba>] xfs_btree_split+0x12a/0x710 [xfs]
[<ffffffffa02bf8cd>] xfs_btree_make_block_unfull+0x12d/0x190 [xfs]
[<ffffffffa02bfd1f>] xfs_btree_insrec+0x3ef/0x5a0 [xfs]
[<ffffffffa02f7f85>] ? xfs_buf_rele+0x55/0x100 [xfs]
[<ffffffffa02f92e2>] ? xfs_buf_read+0xc2/0x100 [xfs]
[<ffffffffa02bff63>] xfs_btree_insert+0x93/0x180 [xfs]
[<ffffffffa02f400a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
[<ffffffffa02a4ad4>] xfs_free_ag_extent+0x434/0x750 [xfs]
[<ffffffffa02a6ba1>] xfs_free_extent+0x101/0x130 [xfs]
[<ffffffffa02b089d>] xfs_bmap_finish+0x15d/0x1a0 [xfs]
[<ffffffffa02d685f>] xfs_itruncate_finish+0x15f/0x320 [xfs]
[<ffffffffa02f1c7e>] xfs_free_eofblocks+0x1fe/0x2e0 [xfs]
[<ffffffffa02f2260>] xfs_inactive+0xc0/0x480 [xfs]
[<ffffffffa0302794>] ? xfs_inode_set_reclaim_tag+0x84/0xa0 [xfs]
[<ffffffffa02ffb00>] xfs_fs_clear_inode+0xa0/0xd0 [xfs]
[<ffffffff8119d31c>] clear_inode+0xac/0x140
[<ffffffff8119d3f0>] dispose_list+0x40/0x120
[<ffffffff8119d744>] shrink_icache_memory+0x274/0x2e0
[<ffffffff81131ffa>] shrink_slab+0x12a/0x1a0
[<ffffffff81134ac9>] zone_reclaim+0x279/0x400
[<ffffffff8112a6ac>] get_page_from_freelist+0x69c/0x830
[<ffffffff8109ca57>] ? down_trylock+0x37/0x50
[<ffffffff8112bc43>] __alloc_pages_nodemask+0x113/0x8d0
[<ffffffffa02c1741>] ? xfs_da_buf_make+0x121/0x170 [xfs]
[<ffffffffa02c2288>] ? xfs_da_do_buf+0x618/0x770 [xfs]
[<ffffffffa02f3f57>] ? kmem_zone_alloc+0x77/0xf0 [xfs]
[<ffffffff81166dc2>] kmem_getpages+0x62/0x170
[<ffffffff8116742f>] cache_grow+0x2cf/0x320
[<ffffffff81167682>] cache_alloc_refill+0x202/0x240
[<ffffffff8116871f>] kmem_cache_alloc+0x15f/0x190
[<ffffffffa02f3f57>] kmem_zone_alloc+0x77/0xf0 [xfs]
[<ffffffffa02d3be0>] xfs_inode_alloc+0x30/0xf0 [xfs]
[<ffffffffa02d4070>] xfs_iget+0x260/0x6e0 [xfs]
[<ffffffffa02d3820>] ? xfs_ilock_demote+0x60/0xa0 [xfs]
[<ffffffffa02f1a36>] xfs_lookup+0xc6/0x110 [xfs]
[<ffffffffa02fe6e4>] xfs_vn_lookup+0x54/0xa0 [xfs]
[<ffffffff8118e4f2>] __lookup_hash+0x102/0x160
[<ffffffff8118ee84>] lookup_one_len+0xb4/0x110
[<ffffffffa054565b>] ? fh_put+0x9b/0x100 [nfsd]
[<ffffffffa055318b>] encode_entryplus_baggage+0x18b/0x1b0 [nfsd]
[<ffffffffa05534d4>] encode_entry+0x324/0x390 [nfsd]
[<ffffffffa0553540>] ? nfs3svc_encode_entry_plus+0x0/0x20 [nfsd]
[<ffffffffa0553559>] nfs3svc_encode_entry_plus+0x19/0x20 [nfsd]
[<ffffffffa0548ead>] nfsd_readdir+0x15d/0x240 [nfsd]
[<ffffffffa0550681>] nfsd3_proc_readdirplus+0xc1/0x210 [nfsd]
[<ffffffffa054243e>] nfsd_dispatch+0xfe/0x240 [nfsd]
[<ffffffffa0428624>] svc_process_common+0x344/0x640 [sunrpc]
[<ffffffff81063990>] ? default_wake_function+0x0/0x20
[<ffffffffa0428c60>] svc_process+0x110/0x160 [sunrpc]
[<ffffffffa0542b62>] nfsd+0xc2/0x160 [nfsd]
[<ffffffffa0542aa0>] ? nfsd+0x0/0x160 [nfsd]
[<ffffffff81096a36>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810969a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 39 08 00 85 c0 74 2c 49 8d 44 24 08 48 39 c3 74 7d 31 d2 48 8b 75 c8 48 8b 7d c0 41 ff 56 10 48 81 7d c8 41 ae 00 81 49 8b 45 00 <8b> 90 e8 09 00 00 74 1b 48 83 c3 08 4d 85 ff 75 92 4c 39 eb 76
RIP [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
RSP <ffff885fe4060218>
This time caused by memory reclaim re-entering the filesystem code at frame #48:
PID: 5588 TASK: ffff885fe73de080 CPU: 2 COMMAND: "nfsd"
#0 [ffff885fe4060000] crash_kexec at ffffffff810c0e22
#1 [ffff885fe40600d0] oops_end at ffffffff81511cb0
#2 [ffff885fe4060100] die at ffffffff8100f19b
#3 [ffff885fe4060130] do_general_protection at ffffffff815117b2
#4 [ffff885fe4060160] general_protection at ffffffff81510f85
[exception RIP: print_context_stack+173]
RIP: ffffffff8100f4dd RSP: ffff885fe4060218 RFLAGS: 00010006
RAX: 01ffffff81ead340 RBX: ffff885fe4060728 RCX: 0000000000008a04
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff885fe4060278 R8: 0000000000000000 R9: ffffffff8163fde0
R10: 0000000000000001 R11: 0000000000000000 R12: ffff885fe4060760
R13: ffff885fe4060000 R14: ffffffff81600460 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffff885fe4060210] print_context_stack at ffffffff8100f4d1
#6 [ffff885fe4060280] dump_trace at ffffffff8100e4a0
#7 [ffff885fe4060320] show_trace_log_lvl at ffffffff8100f245
#8 [ffff885fe4060350] show_stack_log_lvl at ffffffff8100dfba
#9 [ffff885fe40603b0] show_registers at ffffffff8100e16f
#10 [ffff885fe4060420] __die at ffffffff81511bb3
#11 [ffff885fe4060450] no_context at ffffffff81046bf2
#12 [ffff885fe40604a0] __bad_area_nosemaphore at ffffffff81046ea5
#13 [ffff885fe40604f0] bad_area_nosemaphore at ffffffff81046f73
#14 [ffff885fe4060500] __do_page_fault at ffffffff810476d1
#15 [ffff885fe4060620] do_page_fault at ffffffff81513bfe
#16 [ffff885fe4060650] page_fault at ffffffff81510fb5
[exception RIP: print_context_stack+173]
RIP: ffffffff8100f4dd RSP: ffff885fe4060700 RFLAGS: 00010006
RAX: 0000000000000000 RBX: ffff885fe4060888 RCX: 000000000000892a
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff885fe4060760 R8: 00000000000270db R9: 00000000fffffffb
R10: 0000000000000001 R11: 000000000000000c R12: ffff885fe4060800
R13: ffff885fe4060000 R14: ffffffff81600460 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#17 [ffff885fe4060728] warn_slowpath_common at ffffffff8106e3e7
#18 [ffff885fe4060768] dump_trace at ffffffff8100e4a0
#19 [ffff885fe4060808] show_trace_log_lvl at ffffffff8100f245
#20 [ffff885fe4060838] show_trace at ffffffff8100f275
#21 [ffff885fe4060848] dump_stack at ffffffff8150d9fe
#22 [ffff885fe4060888] warn_slowpath_common at ffffffff8106e3e7
#23 [ffff885fe40608c8] warn_slowpath_null at ffffffff8106e43a
#24 [ffff885fe40608d8] hrtick_start_fair at ffffffff8105781b
#25 [ffff885fe4060908] dequeue_task_fair at ffffffff8106690b
#26 [ffff885fe4060948] dequeue_task at ffffffff81055ede
#27 [ffff885fe4060978] deactivate_task at ffffffff81055f23
#28 [ffff885fe4060988] thread_return at ffffffff8150e299
#29 [ffff885fe4060a48] schedule_timeout at ffffffff8150f035
#30 [ffff885fe4060af8] __down at ffffffff8150ff52
#31 [ffff885fe4060b48] down at ffffffff8109cb61
#32 [ffff885fe4060b78] xfs_buf_lock at ffffffffa02f88b1 [xfs]
#33 [ffff885fe4060ba8] _xfs_buf_find at ffffffffa02f8a45 [xfs]
#34 [ffff885fe4060c08] xfs_buf_get at ffffffffa02f8bc4 [xfs]
#35 [ffff885fe4060c58] xfs_trans_get_buf at ffffffffa02eef78 [xfs]
#36 [ffff885fe4060ca8] xfs_btree_get_buf_block at ffffffffa02bb5a2 [xfs]
#37 [ffff885fe4060ce8] xfs_btree_split at ffffffffa02bedba [xfs]
#38 [ffff885fe4060dd8] xfs_btree_make_block_unfull at ffffffffa02bf8cd [xfs]
#39 [ffff885fe4060e38] xfs_btree_insrec at ffffffffa02bfd1f [xfs]
#40 [ffff885fe4060f18] xfs_btree_insert at ffffffffa02bff63 [xfs]
#41 [ffff885fe4060fa8] xfs_free_ag_extent at ffffffffa02a4ad4 [xfs]
#42 [ffff885fe4061048] xfs_free_extent at ffffffffa02a6ba1 [xfs]
#43 [ffff885fe40610f8] xfs_bmap_finish at ffffffffa02b089d [xfs]
#44 [ffff885fe4061148] xfs_itruncate_finish at ffffffffa02d685f [xfs]
#45 [ffff885fe40611f8] xfs_free_eofblocks at ffffffffa02f1c7e [xfs]
#46 [ffff885fe4061298] xfs_inactive at ffffffffa02f2260 [xfs]
#47 [ffff885fe40612e8] xfs_fs_clear_inode at ffffffffa02ffb00 [xfs]
#48 [ffff885fe4061308] clear_inode at ffffffff8119d31c
#49 [ffff885fe4061328] dispose_list at ffffffff8119d3f0
#50 [ffff885fe4061368] shrink_icache_memory at ffffffff8119d744
#51 [ffff885fe40613c8] shrink_slab at ffffffff81131ffa
#52 [ffff885fe4061428] zone_reclaim at ffffffff81134ac9
#53 [ffff885fe40614f8] get_page_from_freelist at ffffffff8112a6ac
#54 [ffff885fe4061618] __alloc_pages_nodemask at ffffffff8112bc43
#55 [ffff885fe4061758] kmem_getpages at ffffffff81166dc2
#56 [ffff885fe4061788] cache_grow at ffffffff8116742f
#57 [ffff885fe40617f8] cache_alloc_refill at ffffffff81167682
#58 [ffff885fe4061868] kmem_cache_alloc at ffffffff8116871f
#59 [ffff885fe40618a8] kmem_zone_alloc at ffffffffa02f3f57 [xfs]
#60 [ffff885fe40618e8] xfs_inode_alloc at ffffffffa02d3be0 [xfs]
#61 [ffff885fe4061918] xfs_iget at ffffffffa02d4070 [xfs]
#62 [ffff885fe40619d8] xfs_lookup at ffffffffa02f1a36 [xfs]
#63 [ffff885fe4061a38] xfs_vn_lookup at ffffffffa02fe6e4 [xfs]
#64 [ffff885fe4061a78] __lookup_hash at ffffffff8118e4f2
#65 [ffff885fe4061ac8] lookup_one_len at ffffffff8118ee84
#66 [ffff885fe4061b08] encode_entryplus_baggage at ffffffffa055318b [nfsd]
#67 [ffff885fe4061c98] encode_entry at ffffffffa05534d4 [nfsd]
#68 [ffff885fe4061ce8] nfs3svc_encode_entry_plus at ffffffffa0553559 [nfsd]
#69 [ffff885fe4061d08] nfsd_readdir at ffffffffa0548ead [nfsd]
#70 [ffff885fe4061d88] nfsd3_proc_readdirplus at ffffffffa0550681 [nfsd]
#71 [ffff885fe4061dd8] nfsd_dispatch at ffffffffa054243e [nfsd]
#72 [ffff885fe4061e18] svc_process_common at ffffffffa0428624 [sunrpc]
#73 [ffff885fe4061e98] svc_process at ffffffffa0428c60 [sunrpc]
#74 [ffff885fe4061eb8] nfsd at ffffffffa0542b62 [nfsd]
#75 [ffff885fe4061ee8] kthread at ffffffff81096a36
#76 [ffff885fe4061f48] kernel_thread at ffffffff8100c0ca
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments