Translated message

A translation of this page exists in English.

RHEL7: 監査が有効な状態で、多くのプロセスが同じディレクトリーでパスベースの syscall を実行した場合に shrink_dentry_list が長期実行されると CPU の使用率が増加し、ソフトロックアップまたは rcu_sched self-detected stall が発生する。

Solution Verified - Updated 2019-08-21T04:04:24+00:00 -

Issue

BUG: soft lockup または rcu_sched self-detected stall メッセージのいずれかが長期実行された shrink_dentry_list に表示されます。

[909563.904619] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [hpsensor:98289]
[909563.904625] Modules linked in: btrfs zlib_deflate raid6_pq xor msdos bnx2i cnic uio fuse rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi ib_srpt target_core_mod ib_srp scsi_transport_srp rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm ib_ipoib ib_cm mlx4_ib ib_core mlx4_en mlx4_core devlink binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache scsi_transport_iscsi nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 bonding ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter
[909563.904704]  ext4 mbcache jbd2 vfat fat intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr osst iTCO_wdt iTCO_vendor_support ses enclosure scsi_transport_sas st ch sg sb_edac hpilo hpwdt edac_core ioatdma lpc_ich dca shpchp ipmi_devintf ipmi_si ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs dm_service_time sd_mod crc_t10dif crct10dif_generic dm_multipath crct10dif_pclmul crct10dif_common crc32c_intel mgag200 qla2xxx i2c_algo_bit bnx2x drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm scsi_transport_fc scsi_tgt mdio ptp i2c_core pps_core libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[909563.904781] CPU: 8 PID: 98289 Comm: hpsensor Tainted: G        W    L ------------   3.10.0-552.el7.wd2.x86_64 #1
[909563.904783] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.004.084 SFW: 043.025.000 08/16/2016
[909563.904785] task: ffff88a9aa21bec0 ti: ffff88a1e7aec000 task.ti: ffff88a1e7aec000
[909563.904787] RIP: 0010:[<ffffffff8168ce7e>]  [<ffffffff8168ce7e>] _raw_spin_lock+0xe/0x50
[909563.904796] RSP: 0018:ffff88a1e7aef940  EFLAGS: 00000202
[909563.904798] RAX: 00000000232a232a RBX: ffff88daa8e8c540 RCX: 0000000000008f0a
[909563.904799] RDX: ffff88e592b72920 RSI: 00000000000094e2 RDI: ffff88daa8e8c598
[909563.904800] RBP: ffff88a1e7aef990 R08: ffff88de47500980 R09: 0000000000010360
[909563.904802] R10: ffff88daa8e8c540 R11: ffffea03fdd84f80 R12: ffff88a5eeba6e40
[909563.904803] R13: ffff88a1e7aef938 R14: 00000000b01bb6a5 R15: ffff88a1e7aef938
[909563.904805] FS:  00007f8573efe700(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
[909563.904806] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[909563.904807] CR2: 0000000000b4c6c0 CR3: 00000009ae203000 CR4: 00000000001407e0
[909563.904809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[909563.904810] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[909563.904811] Stack:
[909563.904812]  ffffffff81213c1f ffff88daa8e8c540 ffff88e592b72920 ffff88daa8e8c5c0
[909563.904815]  00000000b01bb6a5 ffff890053b8c800 ffff88c05399bf18 0000000000000000
[909563.904817]  ffff890053b8c8d0 ffff88c05399bf40 ffff88a1e7aef9f0 ffffffff812144ee
[909563.904820] Call Trace:
[909563.904825]  [<ffffffff81213c1f>] ? shrink_dentry_list+0x4f/0x480
[909563.904828]  [<ffffffff812144ee>] prune_dcache_sb+0x13e/0x180
[909563.904832]  [<ffffffff811fd7c8>] prune_super+0x138/0x170
[909563.904837]  [<ffffffff81191a33>] shrink_slab+0x163/0x330
[909563.904842]  [<ffffffff811f21d7>] ? vmpressure+0x87/0x90
[909563.904845]  [<ffffffff81194851>] zone_reclaim+0x1d1/0x2f0
[909563.904849]  [<ffffffff81188034>] get_page_from_freelist+0x2c4/0x9f0
[909563.904853]  [<ffffffff811888f7>] __alloc_pages_nodemask+0x197/0xbc0
[909563.904858]  [<ffffffff811cf1da>] alloc_pages_vma+0x9a/0x150
[909563.904862]  [<ffffffff811ae49f>] handle_mm_fault+0xc6f/0xfe0
[909563.904866]  [<ffffffff816910b4>] __do_page_fault+0x154/0x450
[909563.904868]  [<ffffffff816913e5>] do_page_fault+0x35/0x90
[909563.904871]  [<ffffffff8168d5c8>] page_fault+0x28/0x30
[909563.904872] Code: e8 01 74 0a 0f b7 0f 66 39 ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb dc 66 0f 1f 44 00 00 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 <89> c2 c1 ea 10 66 39 c2 75 01 c3 55 83 e2 fe 0f b7 f2 48 89 e5
[909568.756771] INFO: rcu_sched self-detected stall on CPU { 8}  (t=60001 jiffies g=86912459 c=86912458 q=25025260)
[909568.756797] Task dump for CPU 8:
[909568.756800] hpsensor        R  running task        0 98289  95549 0x0000008a
[909568.756803]  ffff88a9aa21bec0 00000000b01bb6a5 ffff881fffc03db0 ffffffff810c1158
[909568.756806]  0000000000000008 ffffffff81a24480 ffff881fffc03dc8 ffffffff810c4a99
[909568.756808]  0000000000000009 ffff881fffc03df8 ffffffff81134060 ffff881fffc10240
[909568.756810] Call Trace:
[909568.756812]  <IRQ>  [<ffffffff810c1158>] sched_show_task+0xa8/0x110
[909568.756829]  [<ffffffff810c4a99>] dump_cpu_task+0x39/0x70
[909568.756834]  [<ffffffff81134060>] rcu_dump_cpu_stacks+0x90/0xd0
[909568.756837]  [<ffffffff811377b2>] rcu_check_callbacks+0x442/0x730
[909568.756845]  [<ffffffff810eff40>] ? tick_sched_handle.isra.13+0x60/0x60
[909568.756852]  [<ffffffff81097457>] update_process_times+0x47/0x80
[909568.756854]  [<ffffffff810eff05>] tick_sched_handle.isra.13+0x25/0x60
[909568.756857]  [<ffffffff810eff81>] tick_sched_timer+0x41/0x70
[909568.756861]  [<ffffffff810b2592>] __hrtimer_run_queues+0xd2/0x260
[909568.756864]  [<ffffffff810b2b30>] hrtimer_interrupt+0xb0/0x1e0
[909568.756868]  [<ffffffff81697b5c>] ? call_softirq+0x1c/0x30
[909568.756872]  [<ffffffff81050007>] local_apic_timer_interrupt+0x37/0x60
[909568.756875]  [<ffffffff816987cf>] smp_apic_timer_interrupt+0x3f/0x60
[909568.756877]  [<ffffffff81696d1d>] apic_timer_interrupt+0x6d/0x80
[909568.756879]  <EOI>  [<ffffffff8168cd0b>] ? _raw_spin_trylock+0xb/0x40
[909568.756889]  [<ffffffff81213ca2>] shrink_dentry_list+0xd2/0x480
[909568.756891]  [<ffffffff812144ee>] prune_dcache_sb+0x13e/0x180
[909568.756896]  [<ffffffff811fd7c8>] prune_super+0x138/0x170
[909568.756900]  [<ffffffff81191a33>] shrink_slab+0x163/0x330
[909568.756905]  [<ffffffff811f21d7>] ? vmpressure+0x87/0x90
[909568.756907]  [<ffffffff81194851>] zone_reclaim+0x1d1/0x2f0
[909568.756913]  [<ffffffff81188034>] get_page_from_freelist+0x2c4/0x9f0
[909568.756915]  [<ffffffff811888f7>] __alloc_pages_nodemask+0x197/0xbc0
[909568.756921]  [<ffffffff811cf1da>] alloc_pages_vma+0x9a/0x150
[909568.756926]  [<ffffffff811ae49f>] handle_mm_fault+0xc6f/0xfe0
[909568.756933]  [<ffffffff816910b4>] __do_page_fault+0x154/0x450
[909568.756936]  [<ffffffff816913e5>] do_page_fault+0x35/0x90
[909568.756938]  [<ffffffff8168d5c8>] page_fault+0x28/0x30

Environment

Red Hat Enterprise Linux 7
- 当初 3.10.0-552.el7.wd2.x86_64 で発生
- 3.10.0-514.el7 で再現
ハードウェア
- HP Superdome2 (CPU 数 160、1+ TB RAM) で発生
- より小型の CPU 数が 16 のシステムで再現
https://access.redhat.com/solutions/2972831 に記載されているワークロード
https://access.redhat.com/solutions/2972831 に記載されているとおりに監査を設定

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Translated message

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Translated message

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links