Bad page state/map or Bad rss-counter state followed by kernel crash or soft lockup Red Hat Enterprise Linux.

Solution Unverified - Updated -

Issue

  • Crash with following log:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff81079d42>] dup_mm+0x272/0x520
PGD 404edde067 PUD 404f5b2067 PMD 0 
Oops: 0000 [#1] SMP 

Pid: 33631, comm: rhsmcertd-worke Not tainted 2.6.32-696.3.1.el6.x86_64 #1 HP ProLiant XL170r Gen9/ProLiant XL170r Gen9
RIP: 0010:[<ffffffff81079d42>]  [<ffffffff81079d42>] dup_mm+0x272/0x520
RSP: 0018:ffff88404c523d70  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88404c6c7700 RCX: 0000000000000000
RDX: ffff882050740e80 RSI: ffff88204c794c78 RDI: ffff88404ed2d530
RBP: ffff88404c523de0 R08: ffff882050d340c0 R09: 0000000000000000
R10: 00007fb777d01000 R11: ffff88404eca46e8 R12: ffff88404ed2dc50
R13: ffff88404ed2d530 R14: ffff88204c794c78 R15: ffff88404ed2dc38
FS:  00007fb782286700(0000) GS:ffff8820f0dc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 000000404d271000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process rhsmcertd-worke (pid: 33631, threadinfo ffff88404c520000, task ffff88404eac4040)
Stack:
 00000000000000d0 0000000000000000 ffff88404c6c7768 ffff88404fb38c68
<d> ffff88404fb38c00 ffff88404ed2dc70 ffff88404ed2dc78 0000000000000000
<d> 00007fb7822869d0 0000000001200011 ffff88404cf89520 0000000000000000
Call Trace:
 [<ffffffff8107b112>] copy_process+0xe12/0x1520
 [<ffffffff8107b8b6>] do_fork+0x96/0x4c0
 [<ffffffff811ba222>] ? alloc_fd+0x92/0x160
 [<ffffffff81196997>] ? fd_install+0x47/0x90
 [<ffffffff81009598>] sys_clone+0x28/0x30
 [<ffffffff8100b3f3>] stub_clone+0x13/0x20
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Code: 49 8b 45 30 49 8b 95 98 00 00 00 49 c7 45 20 00 00 00 00 49 c7 45 18 00 00 00 00 80 e4 df 48 85 d2 49 89 45 30 74 6c 48 8b 42 18 <48> 8b 48 10 48 8b 82 b8 00 00 00 f0 48 ff 42 30 41 f6 45 31 08 
RIP  [<ffffffff81079d42>] dup_mm+0x272/0x520
 RSP <ffff88404c523d70>
CR2: 0000000000000010
  • Another Pattern
RIP  [<ffffffff81140dd8>] free_pcppages_bulk+0x318/0x470
 RSP <ffff880434b9bb68>
CR2: ffffffff0042cdd8
  • Another Pattern.
WARNING: CPU: 20 PID: 186113 at mm/mmap.c:3031 exit_mmap+0x196/0x1a0
BUG: Bad rss-counter state mm:ffff881fbe5a3200 idx:0 val:1209
BUG: Bad rss-counter state mm:ffff881fbe5a3200 idx:1 val:2865
BUG: unable to handle kernel paging request at 0000000000400000
IP: [<ffffffff810864c7>] dup_mm+0x237/0x6f0
PGD 8000005fbeae4067 PUD 5fbc697067 PMD 5fb863d067 PTE 5fb9c7c025
Oops: 0003 [#1] SMP
  • Another Pattern.
 BUG: Bad rss-counter state mm:ffff8f06643b3e80 idx:1 val:3
 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u128:1:76]
Call Trace:
 [<ffffffff903a4ce1>] pagevec_lookup_tag+0x21/0x30
 [<ffffffffc0103acc>] mpage_prepare_extent_to_map+0xfc/0x2e0 [ext4]
 [<ffffffff903faf82>] ? kmem_cache_alloc+0x1c2/0x1f0
 [<ffffffffc00d2a93>] ? jbd2__journal_start+0xf3/0x1f0 [jbd2]
 [<ffffffffc010857c>] ? ext4_writepages+0x42c/0xd40 [ext4]
 [<ffffffffc01366a9>] ? __ext4_journal_start_sb+0x69/0xe0 [ext4]
 [<ffffffffc01085a7>] ext4_writepages+0x457/0xd40 [ext4]
 [<ffffffff903a3b81>] do_writepages+0x21/0x50
 [<ffffffff9044cfc0>] __writeback_single_inode+0x40/0x260
 [<ffffffff9044da54>] writeback_sb_inodes+0x1c4/0x490
 [<ffffffff9044ddbf>] __writeback_inodes_wb+0x9f/0xd0
 [<ffffffff9044e5f3>] wb_writeback+0x263/0x2f0
 [<ffffffff9043ab0c>] ? get_nr_inodes+0x4c/0x70
 [<ffffffff9044ef7b>] bdi_writeback_workfn+0x2cb/0x460
 [<ffffffff902b613f>] process_one_work+0x17f/0x440
 [<ffffffff902b71d6>] worker_thread+0x126/0x3c0
 [<ffffffff902b70b0>] ? manage_workers.isra.24+0x2a0/0x2a0
 [<ffffffff902bdf21>] kthread+0xd1/0xe0
 [<ffffffff902bde50>] ? insert_kthread_work+0x40/0x40
 [<ffffffff909255f7>] ret_from_fork_nospec_begin+0x21/0x21
 [<ffffffff902bde50>] ? insert_kthread_work+0x40/0x40

  • Another Pattern.
CPU: 2 PID: 77019 Comm: vertica Kdump: loaded Tainted: G    B          ------------   3.10.0-862.14.4.el7.x86_64 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
 Call Trace:
  [<ffffffff91713754>] dump_stack+0x19/0x1b
  [<ffffffff911c3991>] print_bad_pte+0x1f1/0x290
  [<ffffffff911c65bf>] unmap_page_range+0xaff/0xc30
  [<ffffffff911c6771>] unmap_single_vma+0x81/0xf0
  [<ffffffff911c7a49>] unmap_vmas+0x49/0x90
  [<ffffffff911cdb3e>] unmap_region+0xbe/0x140
  [<ffffffff911ce131>] ? __vma_rb_erase+0x121/0x220
  [<ffffffff911d0145>] do_munmap+0x295/0x470
  [<ffffffff911d0385>] vm_munmap+0x65/0xb0
  [<ffffffff911d15a2>] SyS_munmap+0x22/0x30
  [<ffffffff9172579b>] system_call_fastpath+0x22/0x27
  • Another pattern.
 BUG: Bad page state in process vertica pfn:aafbff
 page:ffffe455eabeffc0 count:0 mapcount:-1 mapping:ffff8f05131bceb8 index:0x23fe
 page flags: 0x2fffff0002001c(referenced|uptodate|dirty|mappedtodisk)
 page dumped because: non-NULL mapping
Call Trace:
  [<ffffffff90913754>] dump_stack+0x19/0x1b
  [<ffffffff9090eba8>] bad_page.part.76+0xdc/0xf9
  [<ffffffff9039f5e0>] free_pages_prepare+0x170/0x190
  [<ffffffff903a0054>] free_hot_cold_page+0x74/0x160
  [<ffffffff903a4ef3>] __put_single_page+0x23/0x30
  [<ffffffff903a4f37>] put_page+0x37/0x50
  [<ffffffff9040ad27>] __split_huge_page+0x357/0x850
  [<ffffffff9040b296>] split_huge_page_to_list+0x76/0xf0
  [<ffffffff9040bd90>] __split_huge_page_pmd+0x1d0/0x5c0
  [<ffffffff903a0186>] ? free_hot_cold_page_list+0x46/0xa0
  [<ffffffff903c669d>] unmap_page_range+0xbdd/0xc30
  [<ffffffff903dddbd>] ? free_pages_and_swap_cache+0xad/0xd0
  [<ffffffff903c6771>] unmap_single_vma+0x81/0xf0
  [<ffffffff903c7bad>] zap_page_range+0x11d/0x190
  [<ffffffff9055f3d8>] ? call_rwsem_down_read_failed+0x18/0x30
  [<ffffffff903c2add>] SyS_madvise+0x3cd/0x9c0
  [<ffffffff902d0438>] ? task_sched_runtime+0xa8/0x110
  [<ffffffff9092579b>] system_call_fastpath+0x22/0x27

Environment

  • Red Hat Enterprise Linux 7
  • Seen on 2.6.32-696.3.1.el6 2.6.32-696.30.1.el6 2.6.32-754.el6
  • Seen on 3.10.0.693.11.6.el7 ~ 3.10.0-957.12.1
  • vmware guest / Microsoft Hyper-v / physical machine
  • docker-container / vertica database

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In