RHEL6 で kernel-2.6.32-504.23.4 より新しいカーネルの hugetlb コードで競合が発生する

  • HugePages が有効な場合に region_* 関数でシステムがクラッシュします。
crash> log | grep -e ^BUG -e ^IP:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
IP:[<ffffffff81166dd4>] region_chg+0xe4/0x100

crash> bt
PID:20463  TASK: ffff881566e4f520  CPU:5   COMMAND:"java"
 #0 [ffff881566e7b7d0] machine_kexec at ffffffff8103b60b
 #1 [ffff881566e7b830] crash_kexec at ffffffff810c99e2
 #2 [ffff881566e7b900] oops_end at ffffffff8152e1c0
 #3 [ffff881566e7b930] no_context at ffffffff8104c80b
 #4 [ffff881566e7b980] __bad_area_nosemaphore at ffffffff8104ca95
 #5 [ffff881566e7b9d0] bad_area at ffffffff8104cbbe
 #6 [ffff881566e7ba00] __do_page_fault at ffffffff8104d3c3
 #7 [ffff881566e7bb20] do_page_fault at ffffffff8153010e
 #8 [ffff881566e7bb50] page_fault at ffffffff8152d4b5
    [exception RIP: region_chg+228]
    RIP: ffffffff8116a0a4  RSP: ffff8803c31b7ca8  RFLAGS:00010282
    RAX: fffffffffffffffe  RBX: fffffffffffffffe  RCX:00000000000001c0
    RDX: dead000000100100  RSI:00000000000001bd  RDI: ffff8803f1cffc48
    RBP: ffff8803c31b7cc8   R8: ffff8803f1cffc41   R9:00000006979ffff0
    R10: ffff881073b05480  R11:0000000000000000  R12: 00000000000001c0
    R13:00000000000001bd  R14: ffffffff81fd19e0  R15:0000000000000000
    ORIG_RAX: ffffffffffffffff  CS:0010  SS:0000
 #6 [ffff8803c31b7ca0] anon_vma_prepare at ffffffff8115c1a0
 #7 [ffff8803c31b7ce0] hugetlb_fault at ffffffff8116bc23
 #8 [ffff8803c31b7d90] handle_mm_fault at ffffffff81153285
 #9 [ffff8803c31b7e00] __do_page_fault at ffffffff8104f156
#10 [ffff8803c31b7f20] do_page_fault at ffffffff8153eb7e
#11 [ffff8803c31b7f50] page_fault at ffffffff8153bf25
  • region_* 関数での破損が挙げられます。
[ 3043.345741] ------------[ cut here ]------------
[ 3043.345762] WARNING: at lib/list_debug.c:51 list_del+0x8d/0xa0() (Not tainted)
[ 3043.345766] Hardware name:PRIMERGY RX600 S5
[ 3043.345769] list_del corruption. next->prev should be ffff88153b50e460, but was ffff88153b50
[ 3043.345772] Modules linked in: mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl 
sunrpc 8021q garp stp llc smbus(U) cpufreq_ondemand acpi_cpufreq freq_table mperf bonding ipv6 
iTCO_wdt iTCO_vendor_support microcode ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandl
er i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core e1000e ixgbe mdio sg igb dca i2c_alg
o_bit i2c_core ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_gener
ic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 3043.345843] Pid:32950, comm: java Not tainted 2.6.32-504.30.3.el6.x86_64 #1
[ 3043.345847] Call Trace:
[ 3043.345860]  [<ffffffff81074e47>] ? warn_slowpath_common+0x87/0xc0
[ 3043.345865]  [<ffffffff81074f36>] ? warn_slowpath_fmt+0x46/0x50
[ 3043.345870]  [<ffffffff8129edbd>] ? list_del+0x8d/0xa0
[ 3043.345878]  [<ffffffff8116688a>] ? region_add+0x9a/0xe0
[ 3043.345882]  [<ffffffff81167c9d>] ? alloc_huge_page+0x29d/0x3c0
[ 3043.345888]  [<ffffffff811691bb>] ? hugetlb_fault+0x43b/0x7b0
[ 3043.345896]  [<ffffffff8105872d>] ? check_preempt_curr+0x6d/0x90
[ 3043.345906]  [<ffffffff81150115>] ? handle_mm_fault+0x395/0x3d0
[ 3043.345914]  [<ffffffff81063c63>] ? perf_event_task_sched_out+0x33/0x70
[ 3043.345920]  [<ffffffff8104d096>] ?__do_page_fault+0x146/0x500
[ 3043.345930]  [<ffffffff81529afe>] ? thread_return+0x4e/0x7d0
[ 3043.345937]  [<ffffffff8153010e>] ? do_page_fault+0x3e/0xa0
[ 3043.345941]  [<ffffffff8152d4b5>] ? page_fault+0x25/0x30
[ 3043.345945] ---[ end trace 3171fe47b71fad99 ]---
  • shm_close+0xd6/0xe0 でシステムがクラッシュします。
------------[ cut here ]------------
kernel BUG at ipc/shm.c:232!
invalid opcode:0000 [#1] SMP 
last sysfs file:/sys/devices/pci0000:00/0000:00:08.0/0000:0a:00.1/host2/rport-2:0-4/target2:0:4/2:0:4:118/state
CPU 1 
Modules linked in: mptctl mptbase ipmi_devintf cpufreq_ondemand acpi_cpufreq mperf 8021q garp stp llc bonding ipv6 ipt_REJECT iptable_filter ip_tables dm_round_robin dm_multipath cpufreq_stats freq_table joydev sg iTCO_wdt iTCO_vendor_support serio_raw hpilo hpwdt ses enclosure lpc_ich mfd_core i7core_edac edac_core power_meter acpi_ipmi ipmi_si ipmi_msghandler qlcnic shpchp ext4 jbd2 mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid:9797, comm: httpd Not tainted 2.6.32-573.8.1.el6.x86_64 #1 HP ProLiant DL380 G6
RIP:0010:[<ffffffff81224336>]  [<ffffffff81224336>] shm_close+0xd6/0xe0
RSP:0018:ffff880176a43e98  EFLAGS:00010202
RAX: ffffffffffffffea RBX: ffffffff81aec340 RCX:0000000000000006
RDX: ffffffffffffffea RSI:0000000000000040 RDI:0000000000000000
RBP: ffff880176a43eb8 R08:0000000000000000 R09:0000000000000000
R10:0000000000000000 R11:0000000000000202 R12: ffffffff81aec3e0
R13: ffffffffffffffea R14: ffff8800e6bb0188 R15:00007fffe9031000
FS:00007ffff7f8a7e0(0000) GS:ffff880c42600000(0000) knlGS:0000000000000000
CS:0010 DS:0000 ES:0000 CR0:0000000080050033
CR2:00007fffe9775612 CR3:0000000176a39000 CR4:00000000000007e0
DR0:0000000000000000 DR1:0000000000000000 DR2:0000000000000000
DR3:0000000000000000 DR6:00000000ffff0ff0 DR7:0000000000000400
Process httpd (pid:9797, threadinfo ffff880176a40000, task ffff880bfff5a040)
 0000000000000000 0000000000000000 ffff8800e6bb0188 ffff880d7345de90
<d> ffff880176a43ed8 ffffffff811562e3 0000000000000000 ffff880bff459a00
<d> ffff880176a43f38 ffffffff811588d7 00007fffffffe0a0 ffff8800e6bb0188
Call Trace:
 [<ffffffff811562e3>] remove_vma+0x33/0x90
 [<ffffffff811588d7>] do_munmap+0x317/0x3b0
 [<ffffffff8122327e>] sys_shmdt+0xce/0x170
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Code:0f 1f 44 00 00 41 f6 45 21 02 75 e6 4c 89 e8 66 ff 00 66 66 90 4c 89 e7 e8 a8 27 e8 ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d 
RIP  [<ffffffff81224336>] shm_close+0xd6/0xe0
 RSP <ffff880176a43e98>


  • Red Hat Enterprise Linux 6.7
  • Red Hat Enterprise Linux 6.6
  • HugePages が有効で使用中である

