RHEL6: race condition in hugetlb code in kernels newer than kernel-2.6.32-504.23.4

Solution Verified - Updated -

Issue

Race condition can manifest itself as

  • System crash in region_* functions when HugePages are enabled
crash> log | grep -e ^BUG -e ^IP:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
IP: [<ffffffff81166dd4>] region_chg+0xe4/0x100

crash> bt
PID: 20463  TASK: ffff881566e4f520  CPU: 5   COMMAND: "java"
 #0 [ffff881566e7b7d0] machine_kexec at ffffffff8103b60b
 #1 [ffff881566e7b830] crash_kexec at ffffffff810c99e2
 #2 [ffff881566e7b900] oops_end at ffffffff8152e1c0
 #3 [ffff881566e7b930] no_context at ffffffff8104c80b
 #4 [ffff881566e7b980] __bad_area_nosemaphore at ffffffff8104ca95
 #5 [ffff881566e7b9d0] bad_area at ffffffff8104cbbe
 #6 [ffff881566e7ba00] __do_page_fault at ffffffff8104d3c3
 #7 [ffff881566e7bb20] do_page_fault at ffffffff8153010e
 #8 [ffff881566e7bb50] page_fault at ffffffff8152d4b5
    [exception RIP: region_chg+228]
    RIP: ffffffff8116a0a4  RSP: ffff8803c31b7ca8  RFLAGS: 00010282
    RAX: fffffffffffffffe  RBX: fffffffffffffffe  RCX: 00000000000001c0
    RDX: dead000000100100  RSI: 00000000000001bd  RDI: ffff8803f1cffc48
    RBP: ffff8803c31b7cc8   R8: ffff8803f1cffc41   R9: 00000006979ffff0
    R10: ffff881073b05480  R11: 0000000000000000  R12: 00000000000001c0
    R13: 00000000000001bd  R14: ffffffff81fd19e0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #6 [ffff8803c31b7ca0] anon_vma_prepare at ffffffff8115c1a0
 #7 [ffff8803c31b7ce0] hugetlb_fault at ffffffff8116bc23
 #8 [ffff8803c31b7d90] handle_mm_fault at ffffffff81153285
 #9 [ffff8803c31b7e00] __do_page_fault at ffffffff8104f156
#10 [ffff8803c31b7f20] do_page_fault at ffffffff8153eb7e
#11 [ffff8803c31b7f50] page_fault at ffffffff8153bf25
  • list corruption in region_* functions
[ 3043.345741] ------------[ cut here ]------------
[ 3043.345762] WARNING: at lib/list_debug.c:51 list_del+0x8d/0xa0() (Not tainted)
[ 3043.345766] Hardware name: PRIMERGY RX600 S5
[ 3043.345769] list_del corruption. next->prev should be ffff88153b50e460, but was ffff88153b50
e7c0
[ 3043.345772] Modules linked in: mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl 
sunrpc 8021q garp stp llc smbus(U) cpufreq_ondemand acpi_cpufreq freq_table mperf bonding ipv6 
iTCO_wdt iTCO_vendor_support microcode ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandl
er i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core e1000e ixgbe mdio sg igb dca i2c_alg
o_bit i2c_core ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_gener
ic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 3043.345843] Pid: 32950, comm: java Not tainted 2.6.32-504.30.3.el6.x86_64 #1
[ 3043.345847] Call Trace:
[ 3043.345860]  [<ffffffff81074e47>] ? warn_slowpath_common+0x87/0xc0
[ 3043.345865]  [<ffffffff81074f36>] ? warn_slowpath_fmt+0x46/0x50
[ 3043.345870]  [<ffffffff8129edbd>] ? list_del+0x8d/0xa0
[ 3043.345878]  [<ffffffff8116688a>] ? region_add+0x9a/0xe0
[ 3043.345882]  [<ffffffff81167c9d>] ? alloc_huge_page+0x29d/0x3c0
[ 3043.345888]  [<ffffffff811691bb>] ? hugetlb_fault+0x43b/0x7b0
[ 3043.345896]  [<ffffffff8105872d>] ? check_preempt_curr+0x6d/0x90
[ 3043.345906]  [<ffffffff81150115>] ? handle_mm_fault+0x395/0x3d0
[ 3043.345914]  [<ffffffff81063c63>] ? perf_event_task_sched_out+0x33/0x70
[ 3043.345920]  [<ffffffff8104d096>] ? __do_page_fault+0x146/0x500
[ 3043.345930]  [<ffffffff81529afe>] ? thread_return+0x4e/0x7d0
[ 3043.345937]  [<ffffffff8153010e>] ? do_page_fault+0x3e/0xa0
[ 3043.345941]  [<ffffffff8152d4b5>] ? page_fault+0x25/0x30
[ 3043.345945] ---[ end trace 3171fe47b71fad99 ]---
  • System crash at shm_close+0xd6/0xe0
------------[ cut here ]------------
kernel BUG at ipc/shm.c:232!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:08.0/0000:0a:00.1/host2/rport-2:0-4/target2:0:4/2:0:4:118/state
CPU 1 
Modules linked in: mptctl mptbase ipmi_devintf cpufreq_ondemand acpi_cpufreq mperf 8021q garp stp llc bonding ipv6 ipt_REJECT iptable_filter ip_tables dm_round_robin dm_multipath cpufreq_stats freq_table joydev sg iTCO_wdt iTCO_vendor_support serio_raw hpilo hpwdt ses enclosure lpc_ich mfd_core i7core_edac edac_core power_meter acpi_ipmi ipmi_si ipmi_msghandler qlcnic shpchp ext4 jbd2 mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 9797, comm: httpd Not tainted 2.6.32-573.8.1.el6.x86_64 #1 HP ProLiant DL380 G6
RIP: 0010:[<ffffffff81224336>]  [<ffffffff81224336>] shm_close+0xd6/0xe0
RSP: 0018:ffff880176a43e98  EFLAGS: 00010202
RAX: ffffffffffffffea RBX: ffffffff81aec340 RCX: 0000000000000006
RDX: ffffffffffffffea RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880176a43eb8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: ffffffff81aec3e0
R13: ffffffffffffffea R14: ffff8800e6bb0188 R15: 00007fffe9031000
FS:  00007ffff7f8a7e0(0000) GS:ffff880c42600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffe9775612 CR3: 0000000176a39000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 9797, threadinfo ffff880176a40000, task ffff880bfff5a040)
Stack:
 0000000000000000 0000000000000000 ffff8800e6bb0188 ffff880d7345de90
<d> ffff880176a43ed8 ffffffff811562e3 0000000000000000 ffff880bff459a00
<d> ffff880176a43f38 ffffffff811588d7 00007fffffffe0a0 ffff8800e6bb0188
Call Trace:
 [<ffffffff811562e3>] remove_vma+0x33/0x90
 [<ffffffff811588d7>] do_munmap+0x317/0x3b0
 [<ffffffff8122327e>] sys_shmdt+0xce/0x170
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Code: 0f 1f 44 00 00 41 f6 45 21 02 75 e6 4c 89 e8 66 ff 00 66 66 90 4c 89 e7 e8 a8 27 e8 ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d 
RIP  [<ffffffff81224336>] shm_close+0xd6/0xe0
 RSP <ffff880176a43e98>

Environment

  • Red Hat Enterprise Linux 6.7
  • Red Hat Enterprise Linux 6.6
  • HugePages activated and in use.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In