Thread overran stack, or stack corrupted while running XFS

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.7 or later
  • Red Hat Enterprise Linux 6
  • XFS (Scalable File System Add-On)
  • Sometimes with NFS server (nfsd) exporting XFS filesystem

Issue

  • Kernel panic where nfsd crashes with XFS symbols in a very long backtrace one of the following is seen in the kernel ring buffer:
    1. Instruction pointer is print_context_stack and Thread overran stack, or stack corrupted
    2. Instruction pointer is __schedule_bug and scheduling while atomic

Resolution

Run one of the following kernels or later:
* RHEL 6.6.z: kernel-2.6.32-504.2.1.el6
* RHEL 6.5.z: kernel-2.6.32-431.46.1.el6
* RHEL 6.4.z: kernel-2.6.32-358.52.1.el6

Root Cause

  • It is possible for the allocator of XFS to exceed the default kernel thread stack size. The solution was to split the allocator stack by offloading extent allocation into a workqueue so it uses a separate stack.
  • Previous stack overrun issues in XFS have been addressed in BZ#693280 and BZ#918359, those fixes are available in:
  • Another stack overrun issue was addressed in BZ#1020574, fix available in:
  • Another stack overrun issue that was being tracked in BZ#1028831 has been addressed in:
  • The above fix caused a regression which was tracked in BZ#1133304 and addressed in:
    • kernel-2.6.32-504.2.1.el6
    • kernel-2.6.32-431.46.1.el6
    • kernel-2.6.32-358.52.1.el6
  • Another case involving direct I/O was tracked at BZ#1085148 but was thought to be a duplicate of the previous Bug 1028831.

Diagnostic Steps

  • If you have an affected kernel from the Environment section and can confirm a matching backtrace, no further confirmation is needed.
  • Running the debug kernel may help as it contains a second kernel stack size check.
  • Look for messages indicating that the stack has overflowed.
  • There were two panics. Before panic, there were ~5 simultaneous NFS writes (reading from another machine and writing to this one).

Turn on the stack depth checking functions to determine what is happening:

# mount -t debugfs nodev /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled

and periodically grab the output of:

# cat /sys/kernel/debug/tracing/stack_max_size
# cat /sys/kernel/debug/tracing/stack_trace

That will report the highest stack usage to date. For example, leave this command running:

# while true ; do date ; cat /sys/kernel/debug/tracing/stack_max_size ; cat /sys/kernel/debug/tracing/stack_trace ; echo --- ; sleep 60 ; done | tee /var/log/stack_trace.log

If you see the stack max size value exceed ~7200 bytes then you may have found the culprit. If the system panics before you can get this stack trace then the information will be contained with the vmcore so it can be analysed to retrieve it.

Here is an example of the stack dumper running at 1 minute intervals, so the following dump is just a minute or so before the crash. Previously, the highest stack usage was 7192 bytes for about 4 days. This example is for a block allocation in the extent tree after allocating a data extent.

7256
------------
        Depth    Size   Location    (54 entries)
        -----    ----   --------
  0)     7096      48   __call_rcu+0x62/0x160
  1)     7048      16   call_rcu_sched+0x15/0x20
  2)     7032      16   call_rcu+0xe/0x10
  3)     7016     272   radix_tree_delete+0x150/0x2b0
  4)     6744      32   __remove_from_page_cache+0x21/0xe0
  5)     6712      64   __remove_mapping+0xa0/0x160
  6)     6648     272   shrink_page_list.clone.0+0x37d/0x540
  7)     6376     432   shrink_inactive_list+0x2f5/0x740
  8)     5944     176   shrink_zone+0x38f/0x520
  9)     5768     224   zone_reclaim+0x354/0x410
 10)     5544     304   get_page_from_freelist+0x694/0x820
 11)     5240     256   __alloc_pages_nodemask+0x111/0x850
 12)     4984      48   kmem_getpages+0x62/0x170
 13)     4936     112   cache_grow+0x2cf/0x320
 14)     4824     112   cache_alloc_refill+0x202/0x240
 15)     4712      64   kmem_cache_alloc+0x15f/0x190
 16)     4648      64   kmem_zone_alloc+0x9a/0xe0 [xfs]
 17)     4584      32   kmem_zone_zalloc+0x1e/0x50 [xfs]
 18)     4552      80   xfs_allocbt_init_cursor+0x4c/0xc0 [xfs]
 19)     4472      16   xfs_allocbt_dup_cursor+0x2c/0x30 [xfs]
 20)     4456     128   xfs_btree_dup_cursor+0x33/0x180 [xfs]
 21)     4328     192   xfs_alloc_ag_vextent_near+0x5fc/0xb70 [xfs]
 22)     4136      32   xfs_alloc_ag_vextent+0xd5/0x130 [xfs]
 23)     4104      96   xfs_alloc_vextent+0x45f/0x600 [xfs]
 24)     4008     160   xfs_bmbt_alloc_block+0xc5/0x1d0 [xfs]
 25)     3848     240   xfs_btree_split+0xbd/0x710 [xfs]
 26)     3608      96   xfs_btree_make_block_unfull+0x12d/0x190 [xfs]
 27)     3512     224   xfs_btree_insrec+0x3ef/0x5a0 [xfs]
 28)     3288     144   xfs_btree_insert+0x93/0x180 [xfs]
 29)     3144     272   xfs_bmap_add_extent_delay_real+0xe7e/0x18d0 [xfs]
 30)     2872     208   xfs_bmap_add_extent+0x3ff/0x420 [xfs]
 31)     2664     432   xfs_bmapi+0xb14/0x11a0 [xfs]
 32)     2232     272   xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]
 33)     1960     208   xfs_iomap+0x389/0x440 [xfs]
 34)     1752      32   xfs_map_blocks+0x2d/0x40 [xfs]
 35)     1720     272   xfs_page_state_convert+0x2f8/0x750 [xfs]
 36)     1448      80   xfs_vm_writepage+0x86/0x170 [xfs]
 37)     1368      32   __writepage+0x17/0x40
 38)     1336     304   write_cache_pages+0x1c9/0x4a0
 39)     1032      16   generic_writepages+0x24/0x30
 40)     1016      48   xfs_vm_writepages+0x5e/0x80 [xfs]
 41)      968      16   do_writepages+0x21/0x40
 42)      952     128   __filemap_fdatawrite_range+0x5b/0x60
 43)      824      48   filemap_write_and_wait_range+0x5a/0x90
 44)      776      80   vfs_fsync_range+0x7e/0xe0
 45)      696      16   vfs_fsync+0x1d/0x20
 46)      680      64   nfsd_commit+0x6b/0xa0 [nfsd]
 47)      616      64   nfsd3_proc_commit+0x9d/0x100 [nfsd]
 48)      552      64   nfsd_dispatch+0xfe/0x240 [nfsd]
 49)      488     128   svc_process_common+0x344/0x640 [sunrpc]
 50)      360      32   svc_process+0x110/0x160 [sunrpc]
 51)      328      48   nfsd+0xc2/0x160 [nfsd]
 52)      280      96   kthread+0x96/0xa0
 53)      184     184   child_rip+0xa/0x20

Another example (this is a user data extent allocation):

7272
------------
        Depth    Size   Location    (61 entries)
        -----    ----   --------
  0)     7080     224   select_task_rq_fair+0x3be/0x980
  1)     6856     112   try_to_wake_up+0x14a/0x400
  2)     6744      16   wake_up_process+0x15/0x20
  3)     6728      16   wakeup_softirqd+0x35/0x40
  4)     6712      48   raise_softirq_irqoff+0x4f/0x90
  5)     6664      48   __blk_complete_request+0x132/0x140
  6)     6616      16   blk_complete_request+0x25/0x30
  7)     6600      32   scsi_done+0x2f/0x60
  8)     6568      48   megasas_queue_command+0xd1/0x140 [megaraid_sas]
  9)     6520      48   scsi_dispatch_cmd+0x1ac/0x340
 10)     6472      96   scsi_request_fn+0x415/0x590
 11)     6376      32   __generic_unplug_device+0x32/0x40
 12)     6344     112   __make_request+0x170/0x500
 13)     6232     224   generic_make_request+0x21e/0x5b0
 14)     6008      80   submit_bio+0x8f/0x120
 15)     5928     112   _xfs_buf_ioapply+0x194/0x2f0 [xfs]
 16)     5816      48   xfs_buf_iorequest+0x4f/0xe0 [xfs]
 17)     5768      32   xlog_bdstrat+0x2a/0x60 [xfs]
 18)     5736      80   xlog_sync+0x1e0/0x3f0 [xfs]
 19)     5656      48   xlog_state_release_iclog+0xb3/0xf0 [xfs]
 20)     5608     144   _xfs_log_force_lsn+0x1cc/0x270 [xfs]
 21)     5464      32   xfs_log_force_lsn+0x18/0x40 [xfs]
 22)     5432      80   xfs_alloc_search_busy+0x10c/0x160 [xfs]
 23)     5352     112   xfs_alloc_get_freelist+0x113/0x170 [xfs]
 24)     5240      48   xfs_allocbt_alloc_block+0x33/0x70 [xfs]
 25)     5192     240   xfs_btree_split+0xbd/0x710 [xfs]
 26)     4952      96   xfs_btree_make_block_unfull+0x12d/0x190 [xfs]
 27)     4856     224   xfs_btree_insrec+0x3ef/0x5a0 [xfs]
 28)     4632     144   xfs_btree_insert+0x93/0x180 [xfs]
 29)     4488     176   xfs_free_ag_extent+0x414/0x7e0 [xfs]
 30)     4312     224   xfs_alloc_fix_freelist+0xf4/0x480 [xfs]
 31)     4088      96   xfs_alloc_vextent+0x173/0x600 [xfs]
 32)     3992     240   xfs_bmap_btalloc+0x167/0x9d0 [xfs]
 33)     3752      16   xfs_bmap_alloc+0xe/0x10 [xfs]
 34)     3736     432   xfs_bmapi+0x9f6/0x11a0 [xfs]
 35)     3304     272   xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]
 36)     3032     208   xfs_iomap+0x389/0x440 [xfs]
 37)     2824      32   xfs_map_blocks+0x2d/0x40 [xfs]
 38)     2792     272   xfs_page_state_convert+0x2f8/0x750 [xfs]
 39)     2520      80   xfs_vm_writepage+0x86/0x170 [xfs]
 40)     2440      32   __writepage+0x17/0x40
 41)     2408     304   write_cache_pages+0x1c9/0x4a0
 42)     2104      16   generic_writepages+0x24/0x30
 43)     2088      48   xfs_vm_writepages+0x5e/0x80 [xfs]
 44)     2040      16   do_writepages+0x21/0x40
 45)     2024     128   __filemap_fdatawrite_range+0x5b/0x60
 46)     1896      48   filemap_write_and_wait_range+0x5a/0x90
 47)     1848     320   xfs_write+0xa2f/0xb70 [xfs]
 48)     1528      16   xfs_file_aio_write+0x61/0x70 [xfs]
 49)     1512     304   do_sync_readv_writev+0xfb/0x140
 50)     1208     224   do_readv_writev+0xcf/0x1f0
 51)      984      16   vfs_writev+0x46/0x60
 52)      968     208   nfsd_vfs_write+0x107/0x430 [nfsd]
 53)      760      96   nfsd_write+0xe7/0x100 [nfsd]
 54)      664     112   nfsd3_proc_write+0xaf/0x140 [nfsd]
 55)      552      64   nfsd_dispatch+0xfe/0x240 [nfsd]
 56)      488     128   svc_process_common+0x344/0x640 [sunrpc]
 57)      360      32   svc_process+0x110/0x160 [sunrpc]
 58)      328      48   nfsd+0xc2/0x160 [nfsd]
 59)      280      96   kthread+0x96/0xa0
 60)      184     184   child_rip+0xa/0x20

Another example shows an indirect consequence of the stack overrun problem. At the top of the stack is the thread_info structure and it holds details about a process including the pre-emption count. During a stack overrun this structure gets corrupted.

The first sign of trouble is the "scheduling while atomic" message and given the pre-empt count of 0xffff885f it looks like it's been overwritten by the high 32 bits of a kernel address.

BUG: scheduling while atomic: nfsd/18811/0xffff885f
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 xfs exportfs uinput power_meter sg shpchp bnx2x libcrc32c mdio dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_1fec5c9402a17fee36a363cad0278e9f_23941]
Pid: 18811, comm: nfsd Not tainted 2.6.32-358.18.1.el6.x86_64 #1
Call Trace:
BUG: unable to handle kernel paging request at fffffffd0684e0a0
IP: [<ffffffff81056904>] update_curr+0x144/0x1f0
PGD 1a87067 PUD 0 
Oops: 0000 [#1] SMP 

Looking closely at the stack trace at the time of the panic we see that it was scheduling out when it detected that the pre-emption count was wrong and printed a message to the console. While printing the message it received a timer interrupt that accessed the corrupted thread_info structure and caused the page fault which triggered the panic.

crash> bt
PID: 18811  TASK: ffff885fdb7c5500  CPU: 16  COMMAND: "nfsd"
 #0 [ffff8800283039a0] machine_kexec at ffffffff81035d6b
 #1 [ffff880028303a00] crash_kexec at ffffffff810c0e22
 #2 [ffff880028303ad0] oops_end at ffffffff81511c20
 #3 [ffff880028303b00] no_context at ffffffff81046c1b
 #4 [ffff880028303b50] __bad_area_nosemaphore at ffffffff81046ea5
 #5 [ffff880028303ba0] bad_area_nosemaphore at ffffffff81046f73
 #6 [ffff880028303bb0] __do_page_fault at ffffffff810476d1
 #7 [ffff880028303cd0] do_page_fault at ffffffff81513b6e
 #8 [ffff880028303d00] page_fault at ffffffff81510f25
    [exception RIP: update_curr+324]
    RIP: ffffffff81056904  RSP: ffff880028303db8  RFLAGS: 00010082
    RAX: ffff885fdb7c5500  RBX: ffffffffb098a040  RCX: ffff88302fef3240
    RDX: 00000000000192d8  RSI: 0000000000000000  RDI: ffff885fdb7c5538
    RBP: ffff880028303de8   R8: ffffffff8160bb65   R9: 0000000000000000
    R10: 0000000000000010  R11: 0000000000000000  R12: ffff880028316768
    R13: 00000000000643d2  R14: 000015e29f03c0bd  R15: ffff885fdb7c5500
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880028303df0] task_tick_fair at ffffffff81056ebb
#10 [ffff880028303e20] scheduler_tick at ffffffff8105ad01
#11 [ffff880028303e60] update_process_times at ffffffff810812fe
#12 [ffff880028303e90] tick_sched_timer at ffffffff810a80c6
#13 [ffff880028303ec0] __run_hrtimer at ffffffff8109b4ae
#14 [ffff880028303f10] hrtimer_interrupt at ffffffff8109b816
#15 [ffff880028303f90] smp_apic_timer_interrupt at ffffffff815177cb
#16 [ffff880028303fb0] apic_timer_interrupt at ffffffff8100bb93
--- <IRQ stack> ---
#17 [ffff885fb098a5d8] apic_timer_interrupt at ffffffff8100bb93
    [exception RIP: vprintk+593]
    RIP: ffffffff8106f341  RSP: ffff885fb098a680  RFLAGS: 00000246
    RAX: 0000000000011480  RBX: ffff885fb098a710  RCX: 0000000000009f1f
    RDX: ffff880028300000  RSI: 0000000000000046  RDI: 0000000000000246
    RBP: ffffffff8100bb8e   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000004  R11: 0000000000000000  R12: 0000000000000400
    R13: 81eacf00e15c3280  R14: ffff885fe472eb00  R15: ffff885f00000001
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#18 [ffff885fb098a718] printk at ffffffff8150db21
#19 [ffff885fb098a778] print_trace_address at ffffffff8100f2b1
#20 [ffff885fb098a7a8] print_context_stack at ffffffff8100f4d1
#21 [ffff885fb098a818] dump_trace at ffffffff8100e4a0
#22 [ffff885fb098a8b8] show_trace_log_lvl at ffffffff8100f245
#23 [ffff885fb098a8e8] show_trace at ffffffff8100f275
#24 [ffff885fb098a8f8] dump_stack at ffffffff8150d96a
#25 [ffff885fb098a938] __schedule_bug at ffffffff8105ab56
#26 [ffff885fb098a958] thread_return at ffffffff8150e730
#27 [ffff885fb098aa18] schedule_timeout at ffffffff8150efa5
...

Looking at what is on the stack after the interrupt we see various device mapper, scsi and memory allocation routines suggesting something used up a lot of stack space before issuing I/O requests.

crash> rd -S 0xffff885fb098a000 1024
ffff885fb098a000:  [task_struct]    default_exec_domain 
ffff885fb098a010:  0000000000000000 0000885fb098a040 
ffff885fb098a020:  zone_statistics+153 0000000000000001 
ffff885fb098a030:  [size-128]       ffff88000001c9c0 
ffff885fb098a040:  ffff885fb098a160 0000000000000082 
ffff885fb098a050:  0000000000000000 0000000000000000 
ffff885fb098a060:  0000000000000002 0000000057ac6e9d 
ffff885fb098a070:  00000000ea11666a 00000040ffffffff 
ffff885fb098a080:  0000000000000000 ffff880000036868 
ffff885fb098a090:  0000000229b2c2cb 0000000000000000 
ffff885fb098a0a0:  ffff880000024a18 00000037ffffffc8 
ffff885fb098a0b0:  ffff880000036860 0000000000000000 
ffff885fb098a0c0:  69f6fac800000041 96c18ad8a7ca600e 
ffff885fb098a0d0:  ffff880000036868 0000000000000001 
ffff885fb098a0e0:  ffff885fb098a000 0000000000013200 
ffff885fb098a0f0:  0000000016652c27 0000000000000000 
ffff885fb098a100:  [size-128]       0007122000000000 
ffff885fb098a110:  0000000000000082 0000000000000010 
ffff885fb098a120:  a7e3a8951504bd41 ffff88000001c9c0 
ffff885fb098a130:  375764b5b640d7fb 0000000000000000 
ffff885fb098a140:  0000000000000002 ffff880000036860 
ffff885fb098a150:  [task_struct]    0000000000051220 
ffff885fb098a160:  ffff885fb098a2a0 __alloc_pages_nodemask+275 
ffff885fb098a170:  ffff88000001c9c0 45af403c00000000 
ffff885fb098a180:  8708161197db24a9 99853489020d2e4b 
ffff885fb098a190:  d5fba7562c5b444a 2c2ab5bc559b505b 
ffff885fb098a1a0:  5e27845c4364ecdc 73eca0b509e6e0d0 
ffff885fb098a1b0:  282df71b1acb23ba dea542689c3d826a 
ffff885fb098a1c0:  f616b2a9fa898d62 593750a7e13251f1 
ffff885fb098a1d0:  f64e793dc2787bc9 6db8c7746d0b8851 
ffff885fb098a1e0:  f543a24babb8c39d f8f8aa350544a140 
ffff885fb098a1f0:  20757ed9744c25e7 6c8e40cb6b5fafe1 
ffff885fb098a200:  5dcee9093609e831 3ac3f6f9075de3a4 
ffff885fb098a210:  ab18cc6b643531c6 28d163ac9e3b05f5 
ffff885fb098a220:  00000000a008d448 0000000000071220 
ffff885fb098a230:  ffff880000036868 0000000000000000 
ffff885fb098a240:  4d335349aaa63576 420b1d28a4b28bec 
ffff885fb098a250:  4289172466de744e 5acd0a4b448b33b0 
ffff885fb098a260:  ffff88000001c9c0 f204ec17d10c4e82 
ffff885fb098a270:  90578e9679aa2212 [kmem_cache]     
ffff885fb098a280:  0000000000000000 0000000000000046 
ffff885fb098a290:  0000000000000086 [kmem_cache]     
ffff885fb098a2a0:  ffff885fb098a2d0 [kmem_cache]     
ffff885fb098a2b0:  0000000000000000 [kmem_cache]     
ffff885fb098a2c0:  [size-64]        [size-128]       
ffff885fb098a2d0:  ffff885fb098a340 cache_grow+535   
ffff885fb098a2e0:  c8389a4904d5ed95 e99a32bb00000040 
ffff885fb098a2f0:  5519bcc800000000 9a1d5a8e00000000 
ffff885fb098a300:  00000000fc0aa124 [size-64]        
ffff885fb098a310:  a566f113b4447291 transfer_objects+92 
ffff885fb098a320:  [size-512]       [kmem_cache]     
ffff885fb098a330:  [size-256]       [size-128]       
ffff885fb098a340:  ffff885fb098a3b0 cache_alloc_refill+158 
ffff885fb098a350:  82ba697e592d849c 000000005f427052 
ffff885fb098a360:  [size-128]       0005122006cc5098 
ffff885fb098a370:  [size-128]       [size-128]       
ffff885fb098a380:  467901bececa4954 [size-128]       
ffff885fb098a390:  0000000000011220 [kmem_cache]     
ffff885fb098a3a0:  0000000000011220 0000000000000046 
ffff885fb098a3b0:  ffff885fb098a3f0 0000000000000046 
ffff885fb098a3c0:  060b5ea6e8f543ba [size-128]       
ffff885fb098a3d0:  0000000000011220 ffff885fb098a430 
ffff885fb098a3e0:  ffff885fb098a448 [size-128]       
ffff885fb098a3f0:  ffff885fb098a400 mempool_alloc_slab+21 
ffff885fb098a400:  ffff885fb098a490 mempool_alloc+99 
ffff885fb098a410:  [size-1024]      [size-128]       
ffff885fb098a420:  [task_struct]    0000000081167610 
ffff885fb098a430:  4c132c5358c58623 00000000869bf4f9 
ffff885fb098a440:  [size-128]       000492200bd03c42 
ffff885fb098a450:  [size-128]       [size-128]       
ffff885fb098a460:  f34a8d61b668f395 [scsi_cmd_cache] 
ffff885fb098a470:  0000000000000080 0000000000000020 
ffff885fb098a480:  0000000000000000 sg_init_table+48 
ffff885fb098a490:  [scsi_cmd_cache] 0000000000000080 
ffff885fb098a4a0:  ffff885fb098a510 __sg_alloc_table+126 
ffff885fb098a4b0:  [sgpool-8]       scsi_sg_alloc    
ffff885fb098a4c0:  0000000000000fe0 0000007fe73a8800 
ffff885fb098a4d0:  0000000000000000 000000018137278a 
ffff885fb098a4e0:  ffff885fb098a540 [scsi_cmd_cache] 
ffff885fb098a4f0:  [dm_rq_target_io] [sgpool-8]       
ffff885fb098a500:  01ff885fe75fe800 [blkdev_queue]   
ffff885fb098a510:  ffff885fb098a570 swiotlb_map_sg_attrs+121 
ffff885fb098a520:  [scsi_cmd_cache] 0000000000000000 
ffff885fb098a530:  [sgpool-8]       00000002e73a8800 
ffff885fb098a540:  ffff885fb098a560 [size-4096]      
ffff885fb098a550:  swiotlb_dma_ops  [sgpool-8]       
ffff885fb098a560:  0000000000000001 0000000000000002 
ffff885fb098a570:  ffff885fb098a5c0 scsi_dma_map+144 
ffff885fb098a580:  ffff885fb098a5c0 ffffffff00000001 
ffff885fb098a590:  [size-2048]      [scsi_cmd_cache] 
ffff885fb098a5a0:  [size-8192]      [sgpool-8]       
ffff885fb098a5b0:  [size-1024]      ffff885fe472eb00 
ffff885fb098a5c0:  ffff885fb098a650 _scsih_qcmd+726  
ffff885fb098a5d0:  0000000000011220 ffff885f00000001 
ffff885fb098a5e0:  ffff885fe472eb00 81eacf00e15c3280 
...

Another example of a stack overrun in XFS:

WARNING: at kernel/sched_fair.c:1846 hrtick_start_fair+0x18b/0x190() (Not tainted)
Hardware name: PowerEdge R620
Modules linked in:
general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:40/0000:40:01.0/0000:41:00.0/host7/port-7:1/end_device-7:1/target7:0:1/7:0:1:5/state
CPU 1 
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter sg shpchp bnx2x libcrc32c mdio microcode dcdbas sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 4326, comm: xfslogd/1 Not tainted 2.6.32-358.24.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
RIP: 0010:[<ffffffff810522cc>]  [<ffffffff810522cc>] check_preempt_curr+0x1c/0x90
RSP: 0018:ffff885fdffb7c10  EFLAGS: 00010082
RAX: ffffffff8160b6a0 RBX: ffffffff81c25700 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff885fe0551540 RDI: cccccccccccccccc
RBP: ffff885fdffb7c20 R08: 0000000000000001 R09: 000000000000001f
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c25700
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff8830d9000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fcec010d000 CR3: 0000005fe2b59000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process xfslogd/1 (pid: 4326, threadinfo ffff885fdffb6000, task ffff885fdf5ce040)
Stack:
 0000000000000000 ffff885fe0551540 ffff885fdffb7c90 ffffffff810637c3
<d> 00000000ffffffff 0000000000000008 0000000000016700 0000000000000000
<d> ffff8830d900fec0 0000000000000086 0000000000016700 ffff885fdf5d7070
Call Trace:
 [<ffffffff810637c3>] try_to_wake_up+0x213/0x3e0
 [<ffffffff810639a2>] default_wake_function+0x12/0x20
 [<ffffffff81051439>] __wake_up_common+0x59/0x90
 [<ffffffff81055ac8>] __wake_up+0x48/0x70
 [<ffffffffa02dc71c>] xlog_state_do_callback+0x1fc/0x2b0 [xfs]
 [<ffffffffa02dc84e>] xlog_state_done_syncing+0x7e/0xb0 [xfs]
 [<ffffffffa02dcfc9>] xlog_iodone+0x59/0xb0 [xfs]
 [<ffffffffa02f7de0>] ? xfs_buf_iodone_work+0x0/0x50 [xfs]
 [<ffffffffa02f7e06>] xfs_buf_iodone_work+0x26/0x50 [xfs]
 [<ffffffff81090be0>] worker_thread+0x170/0x2a0
 [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81090a70>] ? worker_thread+0x0/0x2a0
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 6b ff ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 46 30 48 89 fb 48 8b bf 98 08 00 00 <48> 8b 4f 30 48 39 c8 74 51 48 81 f9 60 b8 60 81 74 24 48 c7 c2 
RIP  [<ffffffff810522cc>] check_preempt_curr+0x1c/0x90
 RSP <ffff885fdffb7c10>

This worker thread was completing an I/O request and trying to wakeup the initiator of the I/O request. To do this it needs to access the thread_info structure of the initiating process and it was found to be corrupt.

The initiator of the I/O request is:

crash> bt
PID: 10230  TASK: ffff885fe0551540  CPU: 134  COMMAND: "nfsd"
bt: invalid kernel virtual address: 0  type: "stack contents"
bt: read of stack at 0 failed

It's so messed up that we cannot get a normal stack trace.

crash> task -R stack
PID: 10230  TASK: ffff885fe0551540  CPU: 134  COMMAND: "nfsd"
  stack = 0xffff885fe0b8a000, 

Dump out the entire stack space:

crash> bt -S 0xffff885fe0b8a000
PID: 10230  TASK: ffff885fe0551540  CPU: 134  COMMAND: "nfsd"
 #0 [ffff885fe0b8a000] schedule at ffffffff8150e172
 #1 [ffff885fe0b8a138] __alloc_pages_nodemask at ffffffff8112bc43
 #2 [ffff885fe0b8a278] kmem_getpages at ffffffff81166e1a
 #3 [ffff885fe0b8a2a8] cache_grow at ffffffff81167377
 #4 [ffff885fe0b8a318] cache_alloc_refill at ffffffff81167640
 #5 [ffff885fe0b8a388] vsnprintf at ffffffff81281550
 #6 [ffff885fe0b8a428] sprintf at ffffffff81281720
 #7 [ffff885fe0b8a488] string at ffffffff8127fed0
 #8 [ffff885fe0b8a4c8] symbol_string at ffffffff81280001
 #9 [ffff885fe0b8a618] pointer at ffffffff812806eb
#10 [ffff885fe0b8a718] __call_console_drivers at ffffffff8106e585
#11 [ffff885fe0b8a748] _call_console_drivers at ffffffff8106e5ea
#12 [ffff885fe0b8a768] release_console_sem at ffffffff8106ec38
#13 [ffff885fe0b8a7a8] vprintk at ffffffff8106f338
#14 [ffff885fe0b8a848] printk at ffffffff8150dbb5
#15 [ffff885fe0b8a8a8] print_modules at ffffffff810b2e20
#16 [ffff885fe0b8a8d8] warn_slowpath_common at ffffffff8106e3e2
#17 [ffff885fe0b8a918] warn_slowpath_null at ffffffff8106e43a
#18 [ffff885fe0b8a928] hrtick_start_fair at ffffffff8105781b
#19 [ffff885fe0b8a958] dequeue_task_fair at ffffffff8106690b
#20 [ffff885fe0b8a998] dequeue_task at ffffffff81055ede
#21 [ffff885fe0b8a9c8] deactivate_task at ffffffff81055f23
#22 [ffff885fe0b8a9d8] thread_return at ffffffff8150e299

The above entries don't make a lot of sense but do show that at some point the memory was overwritten.
The rest of the stack is valid and makes sense and is in the correct context with respect to the completing I/O above.

#23 [ffff885fe0b8aa98] xlog_state_get_iclog_space at ffffffffa02df048 [xfs]
#24 [ffff885fe0b8ab48] xlog_write at ffffffffa02df410 [xfs]
#25 [ffff885fe0b8ac18] xlog_cil_push at ffffffffa02e04a1 [xfs]
#26 [ffff885fe0b8ace8] xlog_cil_force_lsn at ffffffffa02e0c05 [xfs]
#27 [ffff885fe0b8ad68] _xfs_log_force at ffffffffa02deb68 [xfs]
#28 [ffff885fe0b8adb8] xfs_log_force at ffffffffa02def08 [xfs]
#29 [ffff885fe0b8ade8] xfs_buf_lock at ffffffffa02f7902 [xfs]
#30 [ffff885fe0b8ae18] _xfs_buf_find at ffffffffa02f7a45 [xfs]
#31 [ffff885fe0b8ae78] xfs_buf_get at ffffffffa02f7bc4 [xfs]
#32 [ffff885fe0b8aec8] xfs_trans_get_buf at ffffffffa02edf78 [xfs]
#33 [ffff885fe0b8af18] xfs_btree_get_bufs at ffffffffa02ba61e [xfs]
#34 [ffff885fe0b8af28] xfs_alloc_fix_freelist at ffffffffa02a5710 [xfs]
#35 [ffff885fe0b8b018] xfs_free_extent at ffffffffa02a5b48 [xfs]
#36 [ffff885fe0b8b0c8] xfs_bmap_finish at ffffffffa02af89d [xfs]
#37 [ffff885fe0b8b118] xfs_itruncate_finish at ffffffffa02d585f [xfs]
#38 [ffff885fe0b8b1c8] xfs_free_eofblocks at ffffffffa02f0c7e [xfs]
#39 [ffff885fe0b8b268] xfs_inactive at ffffffffa02f1260 [xfs]
#40 [ffff885fe0b8b2b8] xfs_fs_clear_inode at ffffffffa02feb00 [xfs]
#41 [ffff885fe0b8b2d8] clear_inode at ffffffff8119d31c
#42 [ffff885fe0b8b2f8] dispose_list at ffffffff8119d3f0
#43 [ffff885fe0b8b338] shrink_icache_memory at ffffffff8119d744
#44 [ffff885fe0b8b398] shrink_slab at ffffffff81131ffa
#45 [ffff885fe0b8b3f8] zone_reclaim at ffffffff81134ac9
#46 [ffff885fe0b8b4c8] get_page_from_freelist at ffffffff8112a6ac
#47 [ffff885fe0b8b5e8] __alloc_pages_nodemask at ffffffff8112bc43
#48 [ffff885fe0b8b728] alloc_pages_current at ffffffff81160c6a
#49 [ffff885fe0b8b758] __page_cache_alloc at ffffffff8111a237
#50 [ffff885fe0b8b788] __do_page_cache_readahead at ffffffff8112e9eb
#51 [ffff885fe0b8b818] ra_submit at ffffffff8112eb41
#52 [ffff885fe0b8b828] ondemand_readahead at ffffffff8112eeb5
#53 [ffff885fe0b8b888] page_cache_async_readahead at ffffffff8112f070
#54 [ffff885fe0b8b8d8] __generic_file_splice_read at ffffffff811b0c9f
#55 [ffff885fe0b8bb28] generic_file_splice_read at ffffffff811b0f2a
#56 [ffff885fe0b8bb58] xfs_file_splice_read at ffffffffa02fa8d0 [xfs]
#57 [ffff885fe0b8bbc8] do_splice_to at ffffffff811af18b
#58 [ffff885fe0b8bc08] splice_direct_to_actor at ffffffff811af48f
#59 [ffff885fe0b8bc78] nfsd_vfs_read at ffffffffa0544e70 [nfsd]
#60 [ffff885fe0b8bcf8] nfsd_read at ffffffffa05462f7 [nfsd]
#61 [ffff885fe0b8bd98] nfsd3_proc_read at ffffffffa054e7e5 [nfsd]
#62 [ffff885fe0b8bdd8] nfsd_dispatch at ffffffffa053f43e [nfsd]
#63 [ffff885fe0b8be18] svc_process_common at ffffffffa0425624 [sunrpc]
#64 [ffff885fe0b8be98] svc_process at ffffffffa0425c60 [sunrpc]
#65 [ffff885fe0b8beb8] nfsd at ffffffffa053fb62 [nfsd]
#66 [ffff885fe0b8bee8] kthread at ffffffff81096a36
#67 [ffff885fe0b8bf48] kernel_thread at ffffffff8100c0ca

Yet another example of a stack overflow:

------------[ cut here ]------------
WARNING: at kernel/sched_fair.c:1846 hrtick_start_fair+0x18b/0x190() (Not tainted)
Hardware name: PowerEdge R620
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter dcdbas microcode sg shpchp bnx2x libcrc32c mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 5588, comm: nfsd Not tainted 2.6.32-358.24.1.el6.x86_64 #1
Call Trace:
 [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
BUG: unable to handle kernel NULL pointer dereference at 00000000000009e8
IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
PGD 5fdb8ae067 PUD 5fdbee9067 PMD 0 
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/host8/port-8:1/end_device-8:1/target8:0:1/8:0:1:4/state
CPU 2 
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter dcdbas microcode sg shpchp bnx2x libcrc32c mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 5588, comm: nfsd Not tainted 2.6.32-358.24.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
RIP: 0010:[<ffffffff8100f4dd>]  [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
RSP: 0018:ffff885fe4060700  EFLAGS: 00010006
RAX: 0000000000000000 RBX: ffff885fe4060888 RCX: 000000000000892a
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff885fe4060760 R08: 00000000000270db R09: 00000000fffffffb
R10: 0000000000000001 R11: 000000000000000c R12: ffff885fe4060800
R13: ffff885fe4060000 R14: ffffffff81600460 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000009e8 CR3: 0000005fdb8b9000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 5588, threadinfo ffff885fe4060000, task ffff885fe73de080)
Stack:
 ffff885fe40607a0 ffff885fe4061ff8 ffff885fe4060800 ffffffffffffe000
<d> ffffffff817c33b8 ffffffff8106e3e7 ffff885fe4060770 ffff885fe4060868
<d> 000000000000cbe0 ffffffff81600460 ffffffff817c33b8 ffff880028223fc0
Call Trace:
 [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
general protection fault: 0000 [#2] SMP 
last sysfs file: /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/host8/port-8:1/end_device-8:1/target8:0:1/8:0:1:4/state
CPU 2 
Modules linked in: mptctl mptbase ipmi_devintf dell_rbu nfsd autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc xfs exportfs uinput ipv6 power_meter dcdbas microcode sg shpchp bnx2x libcrc32c mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 5588, comm: nfsd Not tainted 2.6.32-358.24.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
RIP: 0010:[<ffffffff8100f4dd>]  [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
RSP: 0018:ffff885fe4060218  EFLAGS: 00010006
RAX: 01ffffff81ead340 RBX: ffff885fe4060728 RCX: 0000000000008a04
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff885fe4060278 R08: 0000000000000000 R09: ffffffff8163fde0
R10: 0000000000000001 R11: 0000000000000000 R12: ffff885fe4060760
R13: ffff885fe4060000 R14: ffffffff81600460 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000009e8 CR3: 0000005fdb8b9000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 5588, threadinfo ffff885fe4060000, task ffff885fe73de080)
Stack:
 ffff885fe40602b8 ffff885fe4061ff8 ffff885fe4060760 ffffffffffffe000
<d> ffffffff8178ef53 ffffffff8106e3e7 ffff885fe40602e8 ffff885fe4060700
<d> 000000000000cbe0 ffffffff81600460 ffffffff8178ef53 ffff880028223fc0
Call Trace:
 [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8100e4a0>] dump_trace+0x190/0x3b0
 [<ffffffff8100f245>] show_trace_log_lvl+0x55/0x70
 [<ffffffff8100dfba>] show_stack_log_lvl+0x9a/0x170
 [<ffffffff8100e16f>] show_registers+0xdf/0x280
 [<ffffffff81513d1a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81511bb3>] __die+0xb3/0xf0
 [<ffffffff81046bf2>] no_context+0xd2/0x260
 [<ffffffff81046ea5>] __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff81046f73>] bad_area_nosemaphore+0x13/0x20
 [<ffffffff810476d1>] __do_page_fault+0x321/0x480
 [<ffffffff8109ca9f>] ? up+0x2f/0x50
 [<ffffffff8109ca9f>] ? up+0x2f/0x50
 [<ffffffff8106ecff>] ? release_console_sem+0x1cf/0x220
 [<ffffffff81513bfe>] do_page_fault+0x3e/0xa0
 [<ffffffff81510fb5>] page_fault+0x25/0x30
 [<ffffffff8100f4dd>] ? print_context_stack+0xad/0x140
 [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8100e4a0>] dump_trace+0x190/0x3b0
 [<ffffffff8105781b>] ? hrtick_start_fair+0x18b/0x190
 [<ffffffff8100f245>] show_trace_log_lvl+0x55/0x70
 [<ffffffff8100f275>] show_trace+0x15/0x20
 [<ffffffff8150d9fe>] dump_stack+0x6f/0x76
 [<ffffffff810b2e4a>] ? print_modules+0x5a/0xf0
 [<ffffffff8106e3e7>] warn_slowpath_common+0x87/0xc0
 [<ffffffff8106e43a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8105781b>] hrtick_start_fair+0x18b/0x190
 [<ffffffff8106690b>] dequeue_task_fair+0x12b/0x130
 [<ffffffff81055ede>] dequeue_task+0x8e/0xb0
 [<ffffffff81055f23>] deactivate_task+0x23/0x30
 [<ffffffff8150e299>] thread_return+0x127/0x76e
 [<ffffffffa02f7352>] ? _xfs_buf_ioapply+0x162/0x1f0 [xfs]
 [<ffffffffa02ddd9a>] ? xlog_bdstrat+0x2a/0x60 [xfs]
 [<ffffffff8150f035>] schedule_timeout+0x215/0x2e0
 [<ffffffffa02ddd9a>] ? xlog_bdstrat+0x2a/0x60 [xfs]
 [<ffffffffa02df569>] ? xlog_sync+0x269/0x3e0 [xfs]
 [<ffffffff8150ff52>] __down+0x72/0xb0
 [<ffffffffa02f8a45>] ? _xfs_buf_find+0xe5/0x230 [xfs]
 [<ffffffff8109cb61>] down+0x41/0x50
 [<ffffffffa02f8a45>] ? _xfs_buf_find+0xe5/0x230 [xfs]
 [<ffffffffa02f88b1>] xfs_buf_lock+0x51/0x100 [xfs]
 [<ffffffffa02f8a45>] _xfs_buf_find+0xe5/0x230 [xfs]
 [<ffffffffa02f8bc4>] xfs_buf_get+0x34/0x1b0 [xfs]
 [<ffffffffa02eef78>] xfs_trans_get_buf+0xe8/0x180 [xfs]
 [<ffffffffa02bb5a2>] xfs_btree_get_buf_block+0x52/0x90 [xfs]
 [<ffffffffa02bedba>] xfs_btree_split+0x12a/0x710 [xfs]
 [<ffffffffa02bf8cd>] xfs_btree_make_block_unfull+0x12d/0x190 [xfs]
 [<ffffffffa02bfd1f>] xfs_btree_insrec+0x3ef/0x5a0 [xfs]
 [<ffffffffa02f7f85>] ? xfs_buf_rele+0x55/0x100 [xfs]
 [<ffffffffa02f92e2>] ? xfs_buf_read+0xc2/0x100 [xfs]
 [<ffffffffa02bff63>] xfs_btree_insert+0x93/0x180 [xfs]
 [<ffffffffa02f400a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
 [<ffffffffa02a4ad4>] xfs_free_ag_extent+0x434/0x750 [xfs]
 [<ffffffffa02a6ba1>] xfs_free_extent+0x101/0x130 [xfs]
 [<ffffffffa02b089d>] xfs_bmap_finish+0x15d/0x1a0 [xfs]
 [<ffffffffa02d685f>] xfs_itruncate_finish+0x15f/0x320 [xfs]
 [<ffffffffa02f1c7e>] xfs_free_eofblocks+0x1fe/0x2e0 [xfs]
 [<ffffffffa02f2260>] xfs_inactive+0xc0/0x480 [xfs]
 [<ffffffffa0302794>] ? xfs_inode_set_reclaim_tag+0x84/0xa0 [xfs]
 [<ffffffffa02ffb00>] xfs_fs_clear_inode+0xa0/0xd0 [xfs]
 [<ffffffff8119d31c>] clear_inode+0xac/0x140
 [<ffffffff8119d3f0>] dispose_list+0x40/0x120
 [<ffffffff8119d744>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81131ffa>] shrink_slab+0x12a/0x1a0
 [<ffffffff81134ac9>] zone_reclaim+0x279/0x400
 [<ffffffff8112a6ac>] get_page_from_freelist+0x69c/0x830
 [<ffffffff8109ca57>] ? down_trylock+0x37/0x50
 [<ffffffff8112bc43>] __alloc_pages_nodemask+0x113/0x8d0
 [<ffffffffa02c1741>] ? xfs_da_buf_make+0x121/0x170 [xfs]
 [<ffffffffa02c2288>] ? xfs_da_do_buf+0x618/0x770 [xfs]
 [<ffffffffa02f3f57>] ? kmem_zone_alloc+0x77/0xf0 [xfs]
 [<ffffffff81166dc2>] kmem_getpages+0x62/0x170
 [<ffffffff8116742f>] cache_grow+0x2cf/0x320
 [<ffffffff81167682>] cache_alloc_refill+0x202/0x240
 [<ffffffff8116871f>] kmem_cache_alloc+0x15f/0x190
 [<ffffffffa02f3f57>] kmem_zone_alloc+0x77/0xf0 [xfs]
 [<ffffffffa02d3be0>] xfs_inode_alloc+0x30/0xf0 [xfs]
 [<ffffffffa02d4070>] xfs_iget+0x260/0x6e0 [xfs]
 [<ffffffffa02d3820>] ? xfs_ilock_demote+0x60/0xa0 [xfs]
 [<ffffffffa02f1a36>] xfs_lookup+0xc6/0x110 [xfs]
 [<ffffffffa02fe6e4>] xfs_vn_lookup+0x54/0xa0 [xfs]
 [<ffffffff8118e4f2>] __lookup_hash+0x102/0x160
 [<ffffffff8118ee84>] lookup_one_len+0xb4/0x110
 [<ffffffffa054565b>] ? fh_put+0x9b/0x100 [nfsd]
 [<ffffffffa055318b>] encode_entryplus_baggage+0x18b/0x1b0 [nfsd]
 [<ffffffffa05534d4>] encode_entry+0x324/0x390 [nfsd]
 [<ffffffffa0553540>] ? nfs3svc_encode_entry_plus+0x0/0x20 [nfsd]
 [<ffffffffa0553559>] nfs3svc_encode_entry_plus+0x19/0x20 [nfsd]
 [<ffffffffa0548ead>] nfsd_readdir+0x15d/0x240 [nfsd]
 [<ffffffffa0550681>] nfsd3_proc_readdirplus+0xc1/0x210 [nfsd]
 [<ffffffffa054243e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa0428624>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff81063990>] ? default_wake_function+0x0/0x20
 [<ffffffffa0428c60>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa0542b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa0542aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 39 08 00 85 c0 74 2c 49 8d 44 24 08 48 39 c3 74 7d 31 d2 48 8b 75 c8 48 8b 7d c0 41 ff 56 10 48 81 7d c8 41 ae 00 81 49 8b 45 00 <8b> 90 e8 09 00 00 74 1b 48 83 c3 08 4d 85 ff 75 92 4c 39 eb 76 
RIP  [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
 RSP <ffff885fe4060218>

This time caused by memory reclaim re-entering the filesystem code at frame #48:

PID: 5588   TASK: ffff885fe73de080  CPU: 2   COMMAND: "nfsd"
 #0 [ffff885fe4060000] crash_kexec at ffffffff810c0e22
 #1 [ffff885fe40600d0] oops_end at ffffffff81511cb0
 #2 [ffff885fe4060100] die at ffffffff8100f19b
 #3 [ffff885fe4060130] do_general_protection at ffffffff815117b2
 #4 [ffff885fe4060160] general_protection at ffffffff81510f85
    [exception RIP: print_context_stack+173]
    RIP: ffffffff8100f4dd  RSP: ffff885fe4060218  RFLAGS: 00010006
    RAX: 01ffffff81ead340  RBX: ffff885fe4060728  RCX: 0000000000008a04
    RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000046
    RBP: ffff885fe4060278   R8: 0000000000000000   R9: ffffffff8163fde0
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff885fe4060760
    R13: ffff885fe4060000  R14: ffffffff81600460  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff885fe4060210] print_context_stack at ffffffff8100f4d1
 #6 [ffff885fe4060280] dump_trace at ffffffff8100e4a0
 #7 [ffff885fe4060320] show_trace_log_lvl at ffffffff8100f245
 #8 [ffff885fe4060350] show_stack_log_lvl at ffffffff8100dfba
 #9 [ffff885fe40603b0] show_registers at ffffffff8100e16f
#10 [ffff885fe4060420] __die at ffffffff81511bb3
#11 [ffff885fe4060450] no_context at ffffffff81046bf2
#12 [ffff885fe40604a0] __bad_area_nosemaphore at ffffffff81046ea5
#13 [ffff885fe40604f0] bad_area_nosemaphore at ffffffff81046f73
#14 [ffff885fe4060500] __do_page_fault at ffffffff810476d1
#15 [ffff885fe4060620] do_page_fault at ffffffff81513bfe
#16 [ffff885fe4060650] page_fault at ffffffff81510fb5
    [exception RIP: print_context_stack+173]
    RIP: ffffffff8100f4dd  RSP: ffff885fe4060700  RFLAGS: 00010006
    RAX: 0000000000000000  RBX: ffff885fe4060888  RCX: 000000000000892a
    RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000046
    RBP: ffff885fe4060760   R8: 00000000000270db   R9: 00000000fffffffb
    R10: 0000000000000001  R11: 000000000000000c  R12: ffff885fe4060800
    R13: ffff885fe4060000  R14: ffffffff81600460  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#17 [ffff885fe4060728] warn_slowpath_common at ffffffff8106e3e7
#18 [ffff885fe4060768] dump_trace at ffffffff8100e4a0
#19 [ffff885fe4060808] show_trace_log_lvl at ffffffff8100f245
#20 [ffff885fe4060838] show_trace at ffffffff8100f275
#21 [ffff885fe4060848] dump_stack at ffffffff8150d9fe
#22 [ffff885fe4060888] warn_slowpath_common at ffffffff8106e3e7
#23 [ffff885fe40608c8] warn_slowpath_null at ffffffff8106e43a
#24 [ffff885fe40608d8] hrtick_start_fair at ffffffff8105781b
#25 [ffff885fe4060908] dequeue_task_fair at ffffffff8106690b
#26 [ffff885fe4060948] dequeue_task at ffffffff81055ede
#27 [ffff885fe4060978] deactivate_task at ffffffff81055f23
#28 [ffff885fe4060988] thread_return at ffffffff8150e299
#29 [ffff885fe4060a48] schedule_timeout at ffffffff8150f035
#30 [ffff885fe4060af8] __down at ffffffff8150ff52
#31 [ffff885fe4060b48] down at ffffffff8109cb61
#32 [ffff885fe4060b78] xfs_buf_lock at ffffffffa02f88b1 [xfs]
#33 [ffff885fe4060ba8] _xfs_buf_find at ffffffffa02f8a45 [xfs]
#34 [ffff885fe4060c08] xfs_buf_get at ffffffffa02f8bc4 [xfs]
#35 [ffff885fe4060c58] xfs_trans_get_buf at ffffffffa02eef78 [xfs]
#36 [ffff885fe4060ca8] xfs_btree_get_buf_block at ffffffffa02bb5a2 [xfs]
#37 [ffff885fe4060ce8] xfs_btree_split at ffffffffa02bedba [xfs]
#38 [ffff885fe4060dd8] xfs_btree_make_block_unfull at ffffffffa02bf8cd [xfs]
#39 [ffff885fe4060e38] xfs_btree_insrec at ffffffffa02bfd1f [xfs]
#40 [ffff885fe4060f18] xfs_btree_insert at ffffffffa02bff63 [xfs]
#41 [ffff885fe4060fa8] xfs_free_ag_extent at ffffffffa02a4ad4 [xfs]
#42 [ffff885fe4061048] xfs_free_extent at ffffffffa02a6ba1 [xfs]
#43 [ffff885fe40610f8] xfs_bmap_finish at ffffffffa02b089d [xfs]
#44 [ffff885fe4061148] xfs_itruncate_finish at ffffffffa02d685f [xfs]
#45 [ffff885fe40611f8] xfs_free_eofblocks at ffffffffa02f1c7e [xfs]
#46 [ffff885fe4061298] xfs_inactive at ffffffffa02f2260 [xfs]
#47 [ffff885fe40612e8] xfs_fs_clear_inode at ffffffffa02ffb00 [xfs]
#48 [ffff885fe4061308] clear_inode at ffffffff8119d31c
#49 [ffff885fe4061328] dispose_list at ffffffff8119d3f0
#50 [ffff885fe4061368] shrink_icache_memory at ffffffff8119d744
#51 [ffff885fe40613c8] shrink_slab at ffffffff81131ffa
#52 [ffff885fe4061428] zone_reclaim at ffffffff81134ac9
#53 [ffff885fe40614f8] get_page_from_freelist at ffffffff8112a6ac
#54 [ffff885fe4061618] __alloc_pages_nodemask at ffffffff8112bc43
#55 [ffff885fe4061758] kmem_getpages at ffffffff81166dc2
#56 [ffff885fe4061788] cache_grow at ffffffff8116742f
#57 [ffff885fe40617f8] cache_alloc_refill at ffffffff81167682
#58 [ffff885fe4061868] kmem_cache_alloc at ffffffff8116871f
#59 [ffff885fe40618a8] kmem_zone_alloc at ffffffffa02f3f57 [xfs]
#60 [ffff885fe40618e8] xfs_inode_alloc at ffffffffa02d3be0 [xfs]
#61 [ffff885fe4061918] xfs_iget at ffffffffa02d4070 [xfs]
#62 [ffff885fe40619d8] xfs_lookup at ffffffffa02f1a36 [xfs]
#63 [ffff885fe4061a38] xfs_vn_lookup at ffffffffa02fe6e4 [xfs]
#64 [ffff885fe4061a78] __lookup_hash at ffffffff8118e4f2
#65 [ffff885fe4061ac8] lookup_one_len at ffffffff8118ee84
#66 [ffff885fe4061b08] encode_entryplus_baggage at ffffffffa055318b [nfsd]
#67 [ffff885fe4061c98] encode_entry at ffffffffa05534d4 [nfsd]
#68 [ffff885fe4061ce8] nfs3svc_encode_entry_plus at ffffffffa0553559 [nfsd]
#69 [ffff885fe4061d08] nfsd_readdir at ffffffffa0548ead [nfsd]
#70 [ffff885fe4061d88] nfsd3_proc_readdirplus at ffffffffa0550681 [nfsd]
#71 [ffff885fe4061dd8] nfsd_dispatch at ffffffffa054243e [nfsd]
#72 [ffff885fe4061e18] svc_process_common at ffffffffa0428624 [sunrpc]
#73 [ffff885fe4061e98] svc_process at ffffffffa0428c60 [sunrpc]
#74 [ffff885fe4061eb8] nfsd at ffffffffa0542b62 [nfsd]
#75 [ffff885fe4061ee8] kthread at ffffffff81096a36
#76 [ffff885fe4061f48] kernel_thread at ffffffff8100c0ca

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments