Why we have kernel panic multiple times due to data corruption (Slab) ?
Issue
- Kernel panic that is caused by corruption of data on memory occurs several times on the cluster system. The points that a panic happened are different every time, but the data after corruption are similar.
- Please find out a root cause of this issue
- Please provide a workaround or a fix
BUG: unable to handle kernel paging request at 00000000fac310f0
IP: [<ffffffff81051f1d>] task_rq_lock+0x4d/0xa0
PGD 448aec067 PUD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/sddlmifh/uevent
CPU 31
Modules linked in: fuse sunrpc bonding ipv6 sddlmfdrv(P)(U) sddlmadrv(P)(U) osst st microcode i2c_i801 i2c_core ioatdma i7core_edac edac_core sg igb(U) dca shpchp ext4 mbcache jbd2 sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class sr_mod cdrom hfcldd(U) scsi_transport_fc scsi_tgt hfcldd_conf(U) hraslog_link(U) megaraid_sas(U) pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Tainted: P ---------------- 2.6.32-220.45.1.el6.x86_64 #1 HITACHI HA8000/RS440/QSSC-S4R
RIP: 0010:[<ffffffff81051f1d>] [<ffffffff81051f1d>] task_rq_lock+0x4d/0xa0
RSP: 0018:ffff88048e563e08 EFLAGS: 00010082
RAX: 000000002f207462 RBX: 0000000000015f40 RCX: 0000000000000000
RDX: 0000000000000082 RSI: ffff88048e563e60 RDI: ffff88086bc00ac0
RBP: ffff88048e563e28 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88086bc00ac0
R13: ffff88048e563e60 R14: 0000000000015f40 R15: 000000000000000f
FS: 0000000000000000(0000) GS:ffff88048e560000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000fac310f0 CR3: 0000000474415000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880875f6e000, task ffff880475fc0ac0)
Stack:
ffff88086bc00ac0 ffff88048e570f08 0000000000000000 000000000000001f
<0> ffff88048e563e98 ffffffff8105ed4c 0000000000000008 ffff88048e563ed0
<0> 0000000000000286 ffff88048e571380 ffffffff81aaf680 0000000000000082
Call Trace:
<IRQ>
[<ffffffff8105ed4c>] try_to_wake_up+0x3c/0x3e0
[<ffffffff81095130>] ? hrtimer_wakeup+0x0/0x30
[<ffffffff8105f145>] wake_up_process+0x15/0x20
[<ffffffff81095152>] hrtimer_wakeup+0x22/0x30
[<ffffffff8109574e>] __run_hrtimer+0x8e/0x1a0
[<ffffffff81012ba9>] ? read_tsc+0x9/0x20
[<ffffffff81095af6>] hrtimer_interrupt+0xe6/0x250
[<ffffffff814f651b>] smp_apic_timer_interrupt+0x6b/0x9b
[<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20
<EOI>
[<ffffffff812c5fae>] ? intel_idle+0xde/0x170
[<ffffffff812c5f91>] ? intel_idle+0xc1/0x170
[<ffffffff8109808d>] ? sched_clock_cpu+0xcd/0x110
[<ffffffff813fb587>] cpuidle_idle_call+0xa7/0x140
[<ffffffff81009e06>] cpu_idle+0xb6/0x110
[<ffffffff814e7668>] start_secondary+0x202/0x245
Code: c3 40 5f 01 00 49 89 fc 49 89 f5 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49 89 55 00 49 8b 44 24 08 49 89 de 8b 40 18 <4c> 03 34 c5 e0 6d bf 81 4c 89 f7 e8 13 eb 49 00 49 8b 44 24 08
RIP [<ffffffff81051f1d>] task_rq_lock+0x4d/0xa0
RSP <ffff88048e563e08>
CR2: 00000000fac310f0
------------[ cut here ]------------
kernel BUG at mm/slab.c:1751!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/sddlmido/queue/rotational
CPU 10
Modules linked in: bridge stp llc fuse sunrpc bonding ipv6 sddlmfdrv(P)(U) sddlm
adrv(P)(U) osst st microcode i2c_i801 i2c_core ioatdma i7core_edac edac_core sg
igb(U) dca shpchp ext4 mbcache jbd2 sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class sr_mod cdrom hfcldd(U) scsi_transport_fc scsi_tgt hfcldd_conf(U) hraslog_link(U) megaraid_sas(U) pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 7254, comm: .dlmmgr_exe Tainted: P W ---------------- 2.6.32-220.45.1.el6.x86_64 #1 HITACHI HA8000/RS440/QSSC-S4R
RIP: 0010:[<ffffffff8115e8ac>] [<ffffffff8115e8ac>] kmem_freepages+0x12c/0x130
RSP: 0018:ffff88046a805868 EFLAGS: 00010002
RAX: ffff880876786ac0 RBX: ffff880877d00340 RCX: ffffffffffffffa0
RDX: fffffffffffffffe RSI: 000000000000000c RDI: 0000000000000082
RBP: ffff88046a805888 R08: 0000000000000060 R09: fb969ce71902f002
R10: 0000000000000000 R11: 0000000000000003 R12: ffff880449cf7c64
R13: 0000000000000001 R14: ffffea000f025608 R15: ffffea0000000000
FS: 00007f6617c2e700(0000) GS:ffff88048e400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000085a874f88 CR3: 0000000874630000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process .dlmmgr_exe (pid: 7254, threadinfo ffff88046a804000, task ffff8804700a0b00)
Stack:
ffff880877d00340 ffff88046a35e000 ffff88087fc1d4e8 0000000000000002
<0> ffff88046a8058a8 ffffffff81160a43 ffff880877d00340 000000000000000c
<0> ffff88046a805908 ffffffff81160b00 ffff88047fef0640 ffff88046a35e000
Call Trace:
[<ffffffff81160a43>] slab_destroy+0x33/0x90
[<ffffffff81160b00>] free_block+0x60/0x170
[<ffffffff81160c99>] __drain_alien_cache+0x89/0xa0
[<ffffffff811609c2>] kmem_cache_free+0x262/0x2b0
[<ffffffff811a89e2>] free_buffer_head+0x22/0x50
[<ffffffff811a8db9>] try_to_free_buffers+0x79/0xc0
[<ffffffffa017d79e>] jbd2_journal_invalidatepage+0x10e/0x2c0 [jbd2]
[<ffffffff81271462>] ? radix_tree_gang_lookup_slot+0x72/0xb0
[<ffffffffa02002b2>] ext4_invalidatepage+0x42/0x60 [ext4]
[<ffffffffa0202bbe>] ext4_da_invalidatepage+0x8e/0x1e0 [ext4]
[<ffffffff81128ab5>] do_invalidatepage+0x25/0x30
[<ffffffff81128cd2>] truncate_inode_page+0xa2/0xc0
[<ffffffff81128fd0>] truncate_inode_pages_range+0x160/0x460
[<ffffffff811292e5>] truncate_inode_pages+0x15/0x20
[<ffffffff81129337>] truncate_pagecache+0x47/0x70
[<ffffffff81129379>] truncate_setsize+0x19/0x20
[<ffffffff811293be>] vmtruncate+0x3e/0x70
[<ffffffff81193000>] inode_setattr+0x30/0x60
[<ffffffffa020782c>] ext4_setattr+0x10c/0x360 [ext4]
[<ffffffff811933e8>] notify_change+0x168/0x340
[<ffffffff8118f8f7>] ? __d_lookup+0xa7/0x150
[<ffffffff81175b14>] do_truncate+0x64/0xa0
[<ffffffff8120e82f>] ? security_inode_permission+0x1f/0x30
[<ffffffff81188469>] do_filp_open+0x829/0xd60
[<ffffffff81194332>] ? alloc_fd+0x92/0x160
[<ffffffff811748d9>] do_sys_open+0x69/0x140
[<ffffffff811749f0>] sys_open+0x20/0x30
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: 00 00 f7 da 48 89 f8 48 c1 ef 35 83 e7 03 48 c1 e8 37 48 69 ff c0 86 00 00 48 03 3c c5 60 f2 bf 81 e8 19 51 fd ff e9 61 ff ff ff <0f> 0b eb fe 55 48 89 e5 0f 1f 44 00 00 48 89 f7 48 c7 c6 80 c4
RIP [<ffffffff8115e8ac>] kmem_freepages+0x12c/0x130
RSP <ffff88046a805868>
Environment
- Red Hat Enterprise Linux 6.2.
- kernel-2.6.32-220.45.1.el6.x86_64.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.