rhel6/7: system panic occurs when mcelog daemon offlines a hugepage
Issue
- System panic occurs when mcelog daemon (or other software) offlines a hugepage. When the frequency of corrected memory errors exceeds the threshold, mcelog daemon executes memory offline. The problem is that page_check_address() called by the offline handler does not check pte, which is a return value from huge_pte_offset().
Environment
- Red Hat Enterprise Linux (RHEL) 6
- All kernel versions prior to 6.2 kernel-2.6.32-220.72.2.el6
- All 6.2 kernels prior to kernel-2.6.32-220.72.2.el6
- All 6.4 kernels prior to kernel-2.6.32-358.79.1.el6
- All 6.5 kernels prior to kernel-2.6.32-431.81.2.el6
- All 6.6 kernels prior to kernel-2.6.32-504.60.2.el6
- All 6.7 kernels prior to kernel-2.6.32-573.43.2.el6
- All 6.8 kernels
- All 6.9 kernels prior to kernel-2.6.32-696.6.3.el6
- Red Hat Enterprise Linux 7.0, 7.1 and 7.2
- mcelog daemon running, or other code which is offlining hugepages
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.