RHEL6.3: kernel panic during fsync, RIP ext4_mb_good_group, after online resize of LUN on SAN from 500 to 800 GB

Solution Unverified - Updated -

Issue

  • On March 23 we performed an online resize of a SAN backed LUN from 500 to 800 GB on two of our Notes servers (server1: start grow job 09:11, finished 09:13 / server2: start grow job 09:13, finished 09:14.)
  • On March 28 both these servers panicked and rebooted in the same sequence that they had their filesystem grown
    (server1: Thu Mar 28 18:05:26 2013 / server2: Thu Mar 28 18:10:37 2013).
  • After online resize of a SAN backed LUN system panicked with the following in the log
sd 2:0:0:4: [sdf] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sdf: detected capacity change from 536871567360 to 858994114560
sd 1:0:0:4: [sdl] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sdl: detected capacity change from 536871567360 to 858994114560
sd 2:0:0:3: [sde] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sde: detected capacity change from 536871567360 to 858994114560
sd 1:0:0:3: [sdk] 1677722880 512-byte logical blocks: (858 GB/800 GiB)
sdk: detected capacity change from 536871567360 to 858994114560
 sdf: unknown partition table
 sdl: unknown partition table
 sde: unknown partition table
 sdk: unknown partition table
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0149b7b>] ext4_mb_good_group+0x5b/0x110 [ext4]
PGD c5dcd1067 PUD c5caeb067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.0/0000:05:00.0/0000:06:00.0/0000:07:02.0/0000:0a:00.0/host1/rport-1:0-1/target1:0:0/1:0:0:1/state
CPU 13 
Modules linked in: mptctl autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 enic power_meter microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp sg dm_round_robin ext4 mbcache jbd2 fnic libfcoe libfc scsi_transport_fc scsi_tgt sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 8337, comm: update Not tainted 2.6.32-279.5.2.el6.x86_64 #1 Cisco Systems Inc N20-B6625-1/N20-B6625-1
RIP: 0010:[<ffffffffa0149b7b>]  [<ffffffffa0149b7b>] ext4_mb_good_group+0x5b/0x110 [ext4]
RSP: 0018:ffff88069ae03868  EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8806aaa886b8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88065f658000 RDI: ffff88065a4b7800
RBP: ffff88069ae03898 R08: 0000000000000032 R09: 000000000c7e263b
R10: 0000000000000000 R11: 000000000001f230 R12: 0000000000001900
R13: 0000000000000fa0 R14: 0000000000001900 R15: 00000000000006dd
FS:  0000000000000000(0000) GS:ffff8806854c0000(0063) knlGS:00000000e9c61b70
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006b69b9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process update (pid: 8337, threadinfo ffff88069ae02000, task ffff8806a1505500)
Stack:
 0000000000000001 00000000aaa886b8 0000000000000fa0 ffff8806aaa886b8
<d> 0000000000000000 0000000000000fa0 ffff88069ae03948 ffffffffa014b38b
<d> 0000000000000001 000000007fe5f2a3 ffff88065ce84800 ffff88065ce84b18
Call Trace:
 [<ffffffffa014b38b>] ext4_mb_regular_allocator+0x19b/0x410 [ext4]
 [<ffffffff81114457>] ? unlock_page+0x27/0x30
 [<ffffffff811aed10>] ? __block_write_full_page+0x1f0/0x3b0
 [<ffffffff811ae640>] ? end_buffer_async_write+0x0/0x190
 [<ffffffffa014d25d>] ext4_mb_new_blocks+0x38d/0x560 [ext4]
 [<ffffffffa0120420>] ? noalloc_get_block_write+0x0/0x60 [ext4]
 [<ffffffffa011be00>] ? ext4_bh_delay_or_unwritten+0x0/0x30 [ext4]
 [<ffffffffa011ddc3>] ext4_alloc_branch+0x4a3/0x5a0 [ext4]
 [<ffffffffa011c55e>] ? ext4_get_branch+0xfe/0x130 [ext4]
 [<ffffffffa011f81d>] ext4_ind_get_blocks+0x1dd/0x600 [ext4]
 [<ffffffffa011fe30>] ext4_get_blocks+0x1f0/0x2a0 [ext4]
 [<ffffffff8112aab5>] ? pagevec_lookup_tag+0x25/0x40
 [<ffffffffa0121be1>] mpage_da_map_and_submit+0xa1/0x450 [ext4]
 [<ffffffffa00ff3c5>] ? jbd2_journal_start+0xb5/0x100 [jbd2]
 [<ffffffffa01227de>] ext4_da_writepages+0x2ee/0x620 [ext4]
 [<ffffffff81129c61>] do_writepages+0x21/0x40
 [<ffffffff81114b7b>] __filemap_fdatawrite_range+0x5b/0x60
 [<ffffffff81114bda>] filemap_write_and_wait_range+0x5a/0x90
 [<ffffffff811aa1fe>] vfs_fsync_range+0x7e/0xe0
 [<ffffffff811aa2cd>] vfs_fsync+0x1d/0x20
 [<ffffffff811aa30e>] do_fsync+0x3e/0x60
 [<ffffffff811aa343>] sys_fdatasync+0x13/0x20
 [<ffffffff8104a820>] sysenter_dispatch+0x7/0x2e
Code: 03 00 00 89 4d dc 8b 88 90 00 00 00 48 8b b0 70 02 00 00 41 d3 e8 48 8b 48 30 45 89 c0 4a 8b 04 c6 48 83 e9 01 44 21 e1 83 fa 03 <4c> 8b 2c c8 0f 87 a4 00 00 00 41 f6 45 00 01 75 56 41 8b 45 14 
RIP  [<ffffffffa0149b7b>] ext4_mb_good_group+0x5b/0x110 [ext4]
 RSP <ffff88069ae03868>
CR2: 0000000000000000

Environment

  • Red Hat Enterprise Linux 6.3
    • 2.6.32-279.5.2.el6
  • ext4 filesystem
  • online resize of SAN volume underneath ext4 filesystem

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content