rhel6/7: system panic occurs when mcelog daemon offlines a hugepage
Issue
- System panic occurs when mcelog daemon (or other software) offlines a hugepage. When the frequency of corrected memory errors exceeds the threshold, mcelog daemon executes memory offline. The problem is that page_check_address() called by the offline handler does not check pte, which is a return value from huge_pte_offset().
Environment
- Red Hat Enterprise Linux (RHEL) 6
- All kernel versions prior to 6.2 kernel-2.6.32-220.72.2.el6
- All 6.2 kernels prior to kernel-2.6.32-220.72.2.el6
- All 6.4 kernels prior to kernel-2.6.32-358.79.1.el6
- All 6.5 kernels prior to kernel-2.6.32-431.81.2.el6
- All 6.6 kernels prior to kernel-2.6.32-504.60.2.el6
- All 6.7 kernels prior to kernel-2.6.32-573.43.2.el6
- All 6.8 kernels
- All 6.9 kernels prior to kernel-2.6.32-696.6.3.el6
- Red Hat Enterprise Linux 7.0, 7.1 and 7.2
- mcelog daemon running, or other code which is offlining hugepages
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
