RHEL6 で kernel-2.6.32-504.23.4 より新しいカーネルの hugetlb コードで競合が発生する
Issue
以下のような競合状態が発生します。
- HugePages が有効な場合に region_* 関数でシステムがクラッシュします。
crash> log | grep -e ^BUG -e ^IP:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
IP:[<ffffffff81166dd4>] region_chg+0xe4/0x100
crash> bt
PID:20463 TASK: ffff881566e4f520 CPU:5 COMMAND:"java"
#0 [ffff881566e7b7d0] machine_kexec at ffffffff8103b60b
#1 [ffff881566e7b830] crash_kexec at ffffffff810c99e2
#2 [ffff881566e7b900] oops_end at ffffffff8152e1c0
#3 [ffff881566e7b930] no_context at ffffffff8104c80b
#4 [ffff881566e7b980] __bad_area_nosemaphore at ffffffff8104ca95
#5 [ffff881566e7b9d0] bad_area at ffffffff8104cbbe
#6 [ffff881566e7ba00] __do_page_fault at ffffffff8104d3c3
#7 [ffff881566e7bb20] do_page_fault at ffffffff8153010e
#8 [ffff881566e7bb50] page_fault at ffffffff8152d4b5
[exception RIP: region_chg+228]
RIP: ffffffff8116a0a4 RSP: ffff8803c31b7ca8 RFLAGS:00010282
RAX: fffffffffffffffe RBX: fffffffffffffffe RCX:00000000000001c0
RDX: dead000000100100 RSI:00000000000001bd RDI: ffff8803f1cffc48
RBP: ffff8803c31b7cc8 R8: ffff8803f1cffc41 R9:00000006979ffff0
R10: ffff881073b05480 R11:0000000000000000 R12: 00000000000001c0
R13:00000000000001bd R14: ffffffff81fd19e0 R15:0000000000000000
ORIG_RAX: ffffffffffffffff CS:0010 SS:0000
#6 [ffff8803c31b7ca0] anon_vma_prepare at ffffffff8115c1a0
#7 [ffff8803c31b7ce0] hugetlb_fault at ffffffff8116bc23
#8 [ffff8803c31b7d90] handle_mm_fault at ffffffff81153285
#9 [ffff8803c31b7e00] __do_page_fault at ffffffff8104f156
#10 [ffff8803c31b7f20] do_page_fault at ffffffff8153eb7e
#11 [ffff8803c31b7f50] page_fault at ffffffff8153bf25
- region_* 関数での破損が挙げられます。
[ 3043.345741] ------------[ cut here ]------------
[ 3043.345762] WARNING: at lib/list_debug.c:51 list_del+0x8d/0xa0() (Not tainted)
[ 3043.345766] Hardware name:PRIMERGY RX600 S5
[ 3043.345769] list_del corruption. next->prev should be ffff88153b50e460, but was ffff88153b50
e7c0
[ 3043.345772] Modules linked in: mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl
sunrpc 8021q garp stp llc smbus(U) cpufreq_ondemand acpi_cpufreq freq_table mperf bonding ipv6
iTCO_wdt iTCO_vendor_support microcode ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandl
er i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core e1000e ixgbe mdio sg igb dca i2c_alg
o_bit i2c_core ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_gener
ic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 3043.345843] Pid:32950, comm: java Not tainted 2.6.32-504.30.3.el6.x86_64 #1
[ 3043.345847] Call Trace:
[ 3043.345860] [<ffffffff81074e47>] ? warn_slowpath_common+0x87/0xc0
[ 3043.345865] [<ffffffff81074f36>] ? warn_slowpath_fmt+0x46/0x50
[ 3043.345870] [<ffffffff8129edbd>] ? list_del+0x8d/0xa0
[ 3043.345878] [<ffffffff8116688a>] ? region_add+0x9a/0xe0
[ 3043.345882] [<ffffffff81167c9d>] ? alloc_huge_page+0x29d/0x3c0
[ 3043.345888] [<ffffffff811691bb>] ? hugetlb_fault+0x43b/0x7b0
[ 3043.345896] [<ffffffff8105872d>] ? check_preempt_curr+0x6d/0x90
[ 3043.345906] [<ffffffff81150115>] ? handle_mm_fault+0x395/0x3d0
[ 3043.345914] [<ffffffff81063c63>] ? perf_event_task_sched_out+0x33/0x70
[ 3043.345920] [<ffffffff8104d096>] ?__do_page_fault+0x146/0x500
[ 3043.345930] [<ffffffff81529afe>] ? thread_return+0x4e/0x7d0
[ 3043.345937] [<ffffffff8153010e>] ? do_page_fault+0x3e/0xa0
[ 3043.345941] [<ffffffff8152d4b5>] ? page_fault+0x25/0x30
[ 3043.345945] ---[ end trace 3171fe47b71fad99 ]---
- shm_close+0xd6/0xe0 でシステムがクラッシュします。
------------[ cut here ]------------
kernel BUG at ipc/shm.c:232!
invalid opcode:0000 [#1] SMP
last sysfs file:/sys/devices/pci0000:00/0000:00:08.0/0000:0a:00.1/host2/rport-2:0-4/target2:0:4/2:0:4:118/state
CPU 1
Modules linked in: mptctl mptbase ipmi_devintf cpufreq_ondemand acpi_cpufreq mperf 8021q garp stp llc bonding ipv6 ipt_REJECT iptable_filter ip_tables dm_round_robin dm_multipath cpufreq_stats freq_table joydev sg iTCO_wdt iTCO_vendor_support serio_raw hpilo hpwdt ses enclosure lpc_ich mfd_core i7core_edac edac_core power_meter acpi_ipmi ipmi_si ipmi_msghandler qlcnic shpchp ext4 jbd2 mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid:9797, comm: httpd Not tainted 2.6.32-573.8.1.el6.x86_64 #1 HP ProLiant DL380 G6
RIP:0010:[<ffffffff81224336>] [<ffffffff81224336>] shm_close+0xd6/0xe0
RSP:0018:ffff880176a43e98 EFLAGS:00010202
RAX: ffffffffffffffea RBX: ffffffff81aec340 RCX:0000000000000006
RDX: ffffffffffffffea RSI:0000000000000040 RDI:0000000000000000
RBP: ffff880176a43eb8 R08:0000000000000000 R09:0000000000000000
R10:0000000000000000 R11:0000000000000202 R12: ffffffff81aec3e0
R13: ffffffffffffffea R14: ffff8800e6bb0188 R15:00007fffe9031000
FS:00007ffff7f8a7e0(0000) GS:ffff880c42600000(0000) knlGS:0000000000000000
CS:0010 DS:0000 ES:0000 CR0:0000000080050033
CR2:00007fffe9775612 CR3:0000000176a39000 CR4:00000000000007e0
DR0:0000000000000000 DR1:0000000000000000 DR2:0000000000000000
DR3:0000000000000000 DR6:00000000ffff0ff0 DR7:0000000000000400
Process httpd (pid:9797, threadinfo ffff880176a40000, task ffff880bfff5a040)
Stack:
0000000000000000 0000000000000000 ffff8800e6bb0188 ffff880d7345de90
<d> ffff880176a43ed8 ffffffff811562e3 0000000000000000 ffff880bff459a00
<d> ffff880176a43f38 ffffffff811588d7 00007fffffffe0a0 ffff8800e6bb0188
Call Trace:
[<ffffffff811562e3>] remove_vma+0x33/0x90
[<ffffffff811588d7>] do_munmap+0x317/0x3b0
[<ffffffff8122327e>] sys_shmdt+0xce/0x170
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Code:0f 1f 44 00 00 41 f6 45 21 02 75 e6 4c 89 e8 66 ff 00 66 66 90 4c 89 e7 e8 a8 27 e8 ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d
RIP [<ffffffff81224336>] shm_close+0xd6/0xe0
RSP <ffff880176a43e98>
Environment
- Red Hat Enterprise Linux 6.7
- Red Hat Enterprise Linux 6.6
- HugePages が有効で使用中である
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.