Why we have kernel panic multiple times due to data corruption (Slab) ?

Solution Unverified - Updated -

Issue

  • Kernel panic that is caused by corruption of data on memory occurs several times on the cluster system. The points that a panic happened are different every time, but the data after corruption are similar.
  • Please find out a root cause of this issue
  • Please provide a workaround or a fix
BUG: unable to handle kernel paging request at 00000000fac310f0
IP: [<ffffffff81051f1d>] task_rq_lock+0x4d/0xa0
PGD 448aec067 PUD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/sddlmifh/uevent
CPU 31 
Modules linked in: fuse sunrpc bonding ipv6 sddlmfdrv(P)(U) sddlmadrv(P)(U) osst st microcode i2c_i801 i2c_core ioatdma i7core_edac edac_core sg igb(U) dca shpchp ext4 mbcache jbd2 sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class sr_mod cdrom hfcldd(U) scsi_transport_fc scsi_tgt hfcldd_conf(U) hraslog_link(U) megaraid_sas(U) pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Tainted: P           ----------------   2.6.32-220.45.1.el6.x86_64 #1 HITACHI HA8000/RS440/QSSC-S4R
RIP: 0010:[<ffffffff81051f1d>]  [<ffffffff81051f1d>] task_rq_lock+0x4d/0xa0
RSP: 0018:ffff88048e563e08  EFLAGS: 00010082
RAX: 000000002f207462 RBX: 0000000000015f40 RCX: 0000000000000000
RDX: 0000000000000082 RSI: ffff88048e563e60 RDI: ffff88086bc00ac0
RBP: ffff88048e563e28 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88086bc00ac0
R13: ffff88048e563e60 R14: 0000000000015f40 R15: 000000000000000f
FS:  0000000000000000(0000) GS:ffff88048e560000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000fac310f0 CR3: 0000000474415000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880875f6e000, task ffff880475fc0ac0)
Stack:
 ffff88086bc00ac0 ffff88048e570f08 0000000000000000 000000000000001f
<0> ffff88048e563e98 ffffffff8105ed4c 0000000000000008 ffff88048e563ed0
<0> 0000000000000286 ffff88048e571380 ffffffff81aaf680 0000000000000082
Call Trace:
 <IRQ> 
 [<ffffffff8105ed4c>] try_to_wake_up+0x3c/0x3e0
 [<ffffffff81095130>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff8105f145>] wake_up_process+0x15/0x20
 [<ffffffff81095152>] hrtimer_wakeup+0x22/0x30
 [<ffffffff8109574e>] __run_hrtimer+0x8e/0x1a0
 [<ffffffff81012ba9>] ? read_tsc+0x9/0x20
 [<ffffffff81095af6>] hrtimer_interrupt+0xe6/0x250
 [<ffffffff814f651b>] smp_apic_timer_interrupt+0x6b/0x9b
 [<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20
 <EOI> 
 [<ffffffff812c5fae>] ? intel_idle+0xde/0x170
 [<ffffffff812c5f91>] ? intel_idle+0xc1/0x170
 [<ffffffff8109808d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff813fb587>] cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] cpu_idle+0xb6/0x110
 [<ffffffff814e7668>] start_secondary+0x202/0x245
Code: c3 40 5f 01 00 49 89 fc 49 89 f5 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49 89 55 00 49 8b 44 24 08 49 89 de 8b 40 18 <4c> 03 34 c5 e0 6d bf 81 4c 89 f7 e8 13 eb 49 00 49 8b 44 24 08 
RIP  [<ffffffff81051f1d>] task_rq_lock+0x4d/0xa0
 RSP <ffff88048e563e08>
CR2: 00000000fac310f0
------------[ cut here ]------------
kernel BUG at mm/slab.c:1751!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/sddlmido/queue/rotational
CPU 10 
Modules linked in: bridge stp llc fuse sunrpc bonding ipv6 sddlmfdrv(P)(U) sddlm
adrv(P)(U) osst st microcode i2c_i801 i2c_core ioatdma i7core_edac edac_core sg 
igb(U) dca shpchp ext4 mbcache jbd2 sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class sr_mod cdrom hfcldd(U) scsi_transport_fc scsi_tgt hfcldd_conf(U) hraslog_link(U) megaraid_sas(U) pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 7254, comm: .dlmmgr_exe Tainted: P        W  ----------------   2.6.32-220.45.1.el6.x86_64 #1 HITACHI HA8000/RS440/QSSC-S4R
RIP: 0010:[<ffffffff8115e8ac>]  [<ffffffff8115e8ac>] kmem_freepages+0x12c/0x130
RSP: 0018:ffff88046a805868  EFLAGS: 00010002
RAX: ffff880876786ac0 RBX: ffff880877d00340 RCX: ffffffffffffffa0
RDX: fffffffffffffffe RSI: 000000000000000c RDI: 0000000000000082
RBP: ffff88046a805888 R08: 0000000000000060 R09: fb969ce71902f002
R10: 0000000000000000 R11: 0000000000000003 R12: ffff880449cf7c64
R13: 0000000000000001 R14: ffffea000f025608 R15: ffffea0000000000
FS:  00007f6617c2e700(0000) GS:ffff88048e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000085a874f88 CR3: 0000000874630000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process .dlmmgr_exe (pid: 7254, threadinfo ffff88046a804000, task ffff8804700a0b00)
Stack:
 ffff880877d00340 ffff88046a35e000 ffff88087fc1d4e8 0000000000000002
<0> ffff88046a8058a8 ffffffff81160a43 ffff880877d00340 000000000000000c
<0> ffff88046a805908 ffffffff81160b00 ffff88047fef0640 ffff88046a35e000
Call Trace:
 [<ffffffff81160a43>] slab_destroy+0x33/0x90
 [<ffffffff81160b00>] free_block+0x60/0x170
 [<ffffffff81160c99>] __drain_alien_cache+0x89/0xa0
 [<ffffffff811609c2>] kmem_cache_free+0x262/0x2b0
 [<ffffffff811a89e2>] free_buffer_head+0x22/0x50
 [<ffffffff811a8db9>] try_to_free_buffers+0x79/0xc0
 [<ffffffffa017d79e>] jbd2_journal_invalidatepage+0x10e/0x2c0 [jbd2]
 [<ffffffff81271462>] ? radix_tree_gang_lookup_slot+0x72/0xb0
 [<ffffffffa02002b2>] ext4_invalidatepage+0x42/0x60 [ext4]
 [<ffffffffa0202bbe>] ext4_da_invalidatepage+0x8e/0x1e0 [ext4]
 [<ffffffff81128ab5>] do_invalidatepage+0x25/0x30
 [<ffffffff81128cd2>] truncate_inode_page+0xa2/0xc0
 [<ffffffff81128fd0>] truncate_inode_pages_range+0x160/0x460
 [<ffffffff811292e5>] truncate_inode_pages+0x15/0x20
 [<ffffffff81129337>] truncate_pagecache+0x47/0x70
 [<ffffffff81129379>] truncate_setsize+0x19/0x20
 [<ffffffff811293be>] vmtruncate+0x3e/0x70
 [<ffffffff81193000>] inode_setattr+0x30/0x60
 [<ffffffffa020782c>] ext4_setattr+0x10c/0x360 [ext4]
 [<ffffffff811933e8>] notify_change+0x168/0x340
 [<ffffffff8118f8f7>] ? __d_lookup+0xa7/0x150
 [<ffffffff81175b14>] do_truncate+0x64/0xa0
 [<ffffffff8120e82f>] ? security_inode_permission+0x1f/0x30
 [<ffffffff81188469>] do_filp_open+0x829/0xd60
 [<ffffffff81194332>] ? alloc_fd+0x92/0x160
 [<ffffffff811748d9>] do_sys_open+0x69/0x140
 [<ffffffff811749f0>] sys_open+0x20/0x30
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: 00 00 f7 da 48 89 f8 48 c1 ef 35 83 e7 03 48 c1 e8 37 48 69 ff c0 86 00 00 48 03 3c c5 60 f2 bf 81 e8 19 51 fd ff e9 61 ff ff ff <0f> 0b eb fe 55 48 89 e5 0f 1f 44 00 00 48 89 f7 48 c7 c6 80 c4 
RIP  [<ffffffff8115e8ac>] kmem_freepages+0x12c/0x130
 RSP <ffff88046a805868>

Environment

  • Red Hat Enterprise Linux 6.2.
  • kernel-2.6.32-220.45.1.el6.x86_64.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content