Mounting an ext4 file system caused panic
Environment
- Red Hat Enterprise Linux 7
- 3.10.0-1160.11.1.el7.x86_64
- RHEL 7 and lower RHEL versions
- EXT4 File System
Issue
- Mounting an ext4 file system caused crash
- EXT4 file system is corrupted and mount command causes a panic
crash> bt
PID: 31814 TASK: ffff9d986dfcb180 CPU: 5 COMMAND: "mount"
#0 [ffff9d9a727eb890] machine_kexec at ffffffff9a6662c4
#1 [ffff9d9a727eb8f0] __crash_kexec at ffffffff9a722802
#2 [ffff9d9a727eb9c0] crash_kexec at ffffffff9a7228f0
#3 [ffff9d9a727eb9d8] oops_end at ffffffff9ad8b798
#4 [ffff9d9a727eba00] die at ffffffff9a630a7b
#5 [ffff9d9a727eba30] do_trap at ffffffff9ad8aee0
#6 [ffff9d9a727eba80] do_invalid_op at ffffffff9a62d2a4
#7 [ffff9d9a727ebb30] invalid_op at ffffffff9ad972ee
[exception RIP: ext4_clear_journal_err+230]
RIP: ffffffffc0b3eb66 RSP: ffff9d9a727ebbe0 RFLAGS: 00010246
RAX: ffff9d9f15234000 RBX: ffff9d9f15230000 RCX: 00000000026448fe
RDX: ffff9d9f0c034400 RSI: ffff9d9f0c03443a RDI: ffff9d9f15230000
RBP: ffff9d9a727ebc10 R8: 000000000001f0e0 R9: ffffffffc0b69f65
R10: ffff9d9f2f09f0e0 R11: fffffd55205a9640 R12: ffff9d9f15230000
R13: ffff9d9f16a59e80 R14: ffff9d9f15236800 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff9d9a727ebc18] ext4_load_journal at ffffffffc0b69faa [ext4]
#9 [ffff9d9a727ebca0] ext4_fill_super at ffffffffc0b4200e [ext4]
#10 [ffff9d9a727ebd90] mount_bdev at ffffffff9a851e53
#11 [ffff9d9a727ebe00] ext4_mount at ffffffffc0b3a595 [ext4]
#12 [ffff9d9a727ebe10] mount_fs at ffffffff9a8527be
#13 [ffff9d9a727ebe58] vfs_kern_mount at ffffffff9a871467
#14 [ffff9d9a727ebe90] do_mount at ffffffff9a873b9f
#15 [ffff9d9a727ebf18] sys_mount at ffffffff9a8749f3
#16 [ffff9d9a727ebf50] system_call_fastpath at ffffffff9ad93f92
Resolution
A bugzilla has been opened to address and mitigate the issue of crash during ext4 mount.
https://bugzilla.redhat.com/show_bug.cgi?id=1933975
This issue has been reported upstream and a fix is identified. https://lore.kernel.org/linux-ext4/20200710140759.18031-1-jack@suse.cz/
Root Cause
The ext4 File System that was tried to mount seemed to have been corrupted, due to underlying storage issue (which was noted in one case study). And the vmcore analysis indicated an invalid ext4_super_block* reference which caused the BUGON condition check to trigger the crash
/*
* If we are mounting (or read-write remounting) a filesystem whose journal
* has recorded an error from a previous lifetime, move that error to the
* main filesystem now.
*/
static void ext4_clear_journal_err(struct super_block *sb,
struct ext4_super_block *es)
{
journal_t *journal;
int j_errno;
const char *errstr;
BUG_ON(!EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL)); << R[1]
journal = EXT4_SB(sb)->s_journal;
/*
* Now check for any error status which may have been recorded in the
* journal by a prior ext4_error() or ext4_abort()
*/
j_errno = jbd2_journal_errno(journal);
if (j_errno) {
char nbuf[16];
errstr = ext4_decode_error(sb, j_errno, nbuf);
ext4_warning(sb, "Filesystem error recorded "
"from previous mount: %s", errstr);
ext4_warning(sb, "Marking fs in need of filesystem check.");
EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
ext4_commit_super(sb, 1);
jbd2_journal_clear_err(journal);
jbd2_journal_update_sb_errno(journal);
}
}
#define EXT4_HAS_COMPAT_FEATURE(sb,mask) \
((EXT4_SB(sb)->s_es->s_feature_compat & cpu_to_le32(mask)) != 0
The vmcore analysis shows an invalid reference for ext4_super_block* which caused a BUGON condition
ext4_sb_info.s_es,s_es_shrinker,s_sb,journal_bdev 0xffff9d9f15234000
s_es = 0xffff9d9f0c034400 << X[1]
s_es_shrinker = {
shrink = 0xffffffffc0b60aa0 <ext4_es_shrink>,
seeks = 0x2,
batch = 0x0,
list = {
next = 0xffffffff9b295940,
prev = 0xffff9d9f152303c8
},
nr_in_batch = {
counter = 0x0
}
}
s_sb = 0xffff9d9f15230000
journal_bdev = 0x0
As shown below X[1] is not a valid kernel virtual address reference for struct ext4_super_block*
crash> struct ext4_super_block.s_feature_compat 0xffff9d9f0c034400
struct: page excluded: kernel virtual address: ffff9d9f0c034400 type: "gdb_readmem_callback" <<
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments