System panicked due to Lustre fileystem errors
Environment
-
Red Hat Enterprise Linux 5.2
-
Lustre File system
Issue
- System panicked due to lustre fileystem errors
Resolution
Contact Lustre File System vendor (Oracle).
Root Cause
The mentioned vmcore log shows (See Dignostic steps section of this article):
1) The System is using Lustre file-system and which has created problem.(Refer line numbers 9 to 31)
2) The Lustre file-system internally referring the the lustre(U) module.
3) At line number 71 showing that the _wake_up_common system call was called and from line numbers 54 to 67 the call trace of _wake_up_common().
The wake_up_common function wakes up a specified number of tasks for a wait_queue.
The wait_queue is a list of tasks.
4) On line number 39 ptlrpcd Tainted error message has happened with 2.6.18-92.1.10.el5 kernel.
The call traces are going through lustre code.
Diagnostic Steps
- The Vmcore logs are as follows: (This is the truncated output)
1 Lustre: lfs002-MDT0000-mdc-ffff811014717400: Connection restored to service lfs002-MDT0000 using nid 141.128.90.150@tcp.
2 Lustre: Skipped 4 previous similar messages
3 Lustre: MGC141.128.90.150@tcp: Reactivating import
4 Lustre: Skipped 6 previous similar messages
5 Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
6 Uhhuh. NMI received for unknown reason b0.
7 You probably have a hardware problem with your RAM chips
8 Dazed and confused, but trying to continue
9 LustreError: 7940:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
10 LustreError: 22331:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
11 LustreError: 22331:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
12 LustreError: 22331:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
13 LustreError: 24859:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
14 LustreError: 24864:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
15 LustreError: 22331:0:(dir.c:388:ll_readdir()) error reading dir 172752899/911826523 page 0: rc -43
16 LustreError: 23869:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161305
17 LustreError: 23869:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161305
18 LustreError: 23873:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161321
19 LustreError: 23873:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 2 previous similar messages
20 LustreError: 24041:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161321
21 LustreError: 24041:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 27 previous similar messages
22 LustreError: 24065:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161321
23 LustreError: 24065:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 15 previous similar messages
24 LustreError: 24119:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161305
25 LustreError: 24119:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 83 previous similar messages
26 LustreError: 24211:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161327
27 LustreError: 24211:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 11 previous similar messages
28 LustreError: 24295:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161315
29 LustreError: 24295:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 36 previous similar messages
30 LustreError: 24347:0:(file.c:2925:ll_inode_revalidate_fini()) failure -2 inode 182161305
31 LustreError: 24347:0:(file.c:2925:ll_inode_revalidate_fini()) Skipped 18 previous similar messages
32 nfs: server etrnyv2 not responding, still trying
33 nfs: server etrnyv2 OK
34 bats[5323]: segfault at 00002aabac6e6980 rip 000000000083b3ed rsp 00007fffb7d855e0 error 4
35 general protection fault: 0000 [1] SMP
36 last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
37 CPU 6
38 Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) sg ipmi_si(U) ipmi_devintf (U) ipmi_msghandler(U) autofs4 hp_ilo(U) nfs lockd fscache nfs_acl sunrpc bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core butto n battery asus_acpi acpi_memhotplug ac parport_pc lp parport ide_cd serio_raw cdrom bnx2 shpchp pcspkr e1000e mptctl mptbase usb_storage ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
39 Pid: 7890, comm: ptlrpcd Tainted: G 2.6.18-92.1.10.el5 #1
40 RIP: 0010:[<ffffffff800893a1>] [<ffffffff800893a1>] __wake_up_common+0x24/0x68
41 RSP: 0018:ffff81100ec2dc30 EFLAGS: 00010086
42 RAX: 0000000000000282 RBX: ffff810ba3564f00 RCX: 0000000000000000
43 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff810ba3564f00
44 RBP: ffff81100ec2dc60 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
45 R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: 0000000000000001
46 R13: 0000000000000001 R14: ffff810ba3564f00 R15: 0000000000000000
47 FS: 00002b957b538240(0000) GS:ffff81102fea7b40(0000) knlGS:0000000000000000
48 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
49 CR2: 00000000ef4c1a90 CR3: 000000100ad09000 CR4: 00000000000006e0
50 Process ptlrpcd (pid: 7890, threadinfo ffff81100ec2c000, task ffff81102b950820)
51 Stack: 0000000300000000 ffff810ba3564f00 0000000000000000 0000000000000001
52 0000000000000282 0000000000000003 ffff81100ec2dca0 ffffffff8002e28d
53 ffff810d4d35a280 ffff810d1bf02380 ffff810d4d35a280 0000000000000001
54 Call Trace:
55 [<ffffffff8002e28d>] __wake_up+0x38/0x4f
56 [<ffffffff8871b4ef>] :lustre:ll_statahead_interpret+0x4ef/0x5b0
57 [<ffffffff8866d885>] :mdc:mdc_intent_getattr_async_interpret+0x465/0x490
58 [<ffffffff88553a91>] :ptlrpc:ptlrpc_check_set+0x9a1/0xb60
59 [<ffffffff8004a9e9>] try_to_del_timer_sync+0x51/0x5a
60 [<ffffffff88582e2d>] :ptlrpc:ptlrpcd_check+0x16d/0x290
61 [<ffffffff800955b8>] process_timeout+0x0/0x5
62 [<ffffffff88583456>] :ptlrpc:ptlrpcd+0x1a6/0x21e
63 [<ffffffff8008ad7d>] default_wake_function+0x0/0xe
64 [<ffffffff800b45f9>] audit_syscall_exit+0x31b/0x336
65 [<ffffffff8005dfb1>] child_rip+0xa/0x11
66 [<ffffffff885832b0>] :ptlrpc:ptlrpcd+0x0/0x21e
67 [<ffffffff8005dfa7>] child_rip+0x0/0x11
68
69
70 Code: 49 8b 18 eb 2a 49 8d 78 e8 45 8b 68 e8 4c 89 f9 8b 55 d0 8b
71 RIP [<ffffffff800893a1>] __wake_up_common+0x24/0x68
72 RSP <ffff81100ec2dc30>
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
