RHEL6.3: kernel panic during fsync, RIP ext4_mb_good_group, after online resize of LUN on SAN from 500 to 800 GB
Issue
- On March 23 we performed an online resize of a SAN backed LUN from 500 to 800 GB on two of our Notes servers (server1: start grow job 09:11, finished 09:13 / server2: start grow job 09:13, finished 09:14.)
- On March 28 both these servers panicked and rebooted in the same sequence that they had their filesystem grown
(server1: Thu Mar 28 18:05:26 2013 / server2: Thu Mar 28 18:10:37 2013). - After online resize of a SAN backed LUN system panicked with the following in the log
sd 2:0:0:4: [sdf] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sdf: detected capacity change from 536871567360 to 858994114560
sd 1:0:0:4: [sdl] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sdl: detected capacity change from 536871567360 to 858994114560
sd 2:0:0:3: [sde] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sde: detected capacity change from 536871567360 to 858994114560
sd 1:0:0:3: [sdk] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sdk: detected capacity change from 536871567360 to 858994114560
sdf: unknown partition table
sdl: unknown partition table
sde: unknown partition table
sdk: unknown partition table
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0149b7b>] ext4_mb_good_group+0x5b/0x110 [ext4]
PGD c5dcd1067 PUD c5caeb067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.0/0000:05:00.0/0000:06:00.0/0000:07:02.0/0000:0a:00.0/host1/rport-1:0-1/target1:0:0/1:0:0:1/state
CPU 13
Modules linked in: mptctl autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 enic power_meter microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp sg dm_round_robin ext4 mbcache jbd2 fnic libfcoe libfc scsi_transport_fc scsi_tgt sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 8337, comm: update Not tainted 2.6.32-279.5.2.el6.x86_64 #1 Cisco Systems Inc N20-B6625-1/N20-B6625-1
RIP: 0010:[<ffffffffa0149b7b>] [<ffffffffa0149b7b>] ext4_mb_good_group+0x5b/0x110 [ext4]
RSP: 0018:ffff88069ae03868 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8806aaa886b8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88065f658000 RDI: ffff88065a4b7800
RBP: ffff88069ae03898 R08: 0000000000000032 R09: 000000000c7e263b
R10: 0000000000000000 R11: 000000000001f230 R12: 0000000000001900
R13: 0000000000000fa0 R14: 0000000000001900 R15: 00000000000006dd
FS: 0000000000000000(0000) GS:ffff8806854c0000(0063) knlGS:00000000e9c61b70
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006b69b9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process update (pid: 8337, threadinfo ffff88069ae02000, task ffff8806a1505500)
Stack:
0000000000000001 00000000aaa886b8 0000000000000fa0 ffff8806aaa886b8
<d> 0000000000000000 0000000000000fa0 ffff88069ae03948 ffffffffa014b38b
<d> 0000000000000001 000000007fe5f2a3 ffff88065ce84800 ffff88065ce84b18
Call Trace:
[<ffffffffa014b38b>] ext4_mb_regular_allocator+0x19b/0x410 [ext4]
[<ffffffff81114457>] ? unlock_page+0x27/0x30
[<ffffffff811aed10>] ? __block_write_full_page+0x1f0/0x3b0
[<ffffffff811ae640>] ? end_buffer_async_write+0x0/0x190
[<ffffffffa014d25d>] ext4_mb_new_blocks+0x38d/0x560 [ext4]
[<ffffffffa0120420>] ? noalloc_get_block_write+0x0/0x60 [ext4]
[<ffffffffa011be00>] ? ext4_bh_delay_or_unwritten+0x0/0x30 [ext4]
[<ffffffffa011ddc3>] ext4_alloc_branch+0x4a3/0x5a0 [ext4]
[<ffffffffa011c55e>] ? ext4_get_branch+0xfe/0x130 [ext4]
[<ffffffffa011f81d>] ext4_ind_get_blocks+0x1dd/0x600 [ext4]
[<ffffffffa011fe30>] ext4_get_blocks+0x1f0/0x2a0 [ext4]
[<ffffffff8112aab5>] ? pagevec_lookup_tag+0x25/0x40
[<ffffffffa0121be1>] mpage_da_map_and_submit+0xa1/0x450 [ext4]
[<ffffffffa00ff3c5>] ? jbd2_journal_start+0xb5/0x100 [jbd2]
[<ffffffffa01227de>] ext4_da_writepages+0x2ee/0x620 [ext4]
[<ffffffff81129c61>] do_writepages+0x21/0x40
[<ffffffff81114b7b>] __filemap_fdatawrite_range+0x5b/0x60
[<ffffffff81114bda>] filemap_write_and_wait_range+0x5a/0x90
[<ffffffff811aa1fe>] vfs_fsync_range+0x7e/0xe0
[<ffffffff811aa2cd>] vfs_fsync+0x1d/0x20
[<ffffffff811aa30e>] do_fsync+0x3e/0x60
[<ffffffff811aa343>] sys_fdatasync+0x13/0x20
[<ffffffff8104a820>] sysenter_dispatch+0x7/0x2e
Code: 03 00 00 89 4d dc 8b 88 90 00 00 00 48 8b b0 70 02 00 00 41 d3 e8 48 8b 48 30 45 89 c0 4a 8b 04 c6 48 83 e9 01 44 21 e1 83 fa 03 <4c> 8b 2c c8 0f 87 a4 00 00 00 41 f6 45 00 01 75 56 41 8b 45 14
RIP [<ffffffffa0149b7b>] ext4_mb_good_group+0x5b/0x110 [ext4]
RSP <ffff88069ae03868>
CR2: 0000000000000000
Environment
- Red Hat Enterprise Linux 6.3
- 2.6.32-279.5.2.el6
- ext4 filesystem
- online resize of SAN volume underneath ext4 filesystem
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.