GPFS (Spectrum Scale) 7.3 ppc64 lpars are crashing

Solution In Progress - Updated -

Issue

  • Frequent crashes on ppc64 lpars systems with GPFS setup (Spectrum Scale)

crash instance 1:

crash> ps -p 6318 
PID: 0      TASK: c000000001358180  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: c000001ef9d00000  CPU: 31  COMMAND: "systemd"
  PID: 41045  TASK: c000000e9e8d5300  CPU: 10  COMMAND: "runmmfs"
   PID: 68294  TASK: c000000e9f766670  CPU: 12  COMMAND: "mmfsd"        <
    PID: 6296   TASK: c000000f4da56830  CPU: 66  COMMAND: "mmcommon"
     PID: 6317   TASK: c000001ee9befac0  CPU: 41  COMMAND: "umount"
      PID: 6318   TASK: c000001dc63779e0  CPU: 57  COMMAND: "umount.nfs"     < 

crash> 

crash> bt 6318 
PID: 6318   TASK: c000001dc63779e0  CPU: 57  COMMAND: "umount.nfs"
 #0 [c000001dca6633a0] .crash_kexec at c000000000196284
 #1 [c000001dca663420] .die at c000000000020cd8
 #2 [c000001dca6634d0] ._exception at c000000000020ff4
 #3 [c000001dca663670] .program_check_exception at c00000000097d748
 #4 [c000001dca663700] program_check_common at c000000000006308
 Program Check [700] exception frame:
 R0:  c000000000341868    R1:  c000001dca6639f0    R2:  c000000001429b00   
 R3:  c00000051bc5b280    R4:  f0000000002bf480    R5:  c0000000c8f0e400   
 R6:  c0000000c8f0e400    R7:  0000000000000001    R8:  0000000000000000   
 R9:  0000000000000001    R10: 0000000000000000    R11: 0000000000000000   
 R12: c000000000346cc0    R13: c000000007b40100    R14: 0000000000000000   
 R15: 0000000000000000    R16: 0000000000000000    R17: 0000000000000000   
 R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000000   
 R21: c0000000012fdf88    R22: 0000000000000000    R23: c0000000017df938   
 R24: 0000000000000000    R25: c000001ef5ed9000    R26: 0000000000000000   
 R27: 00000100290a01e0    R28: d00000001dfae7c0    R29: 0000000000000043   
 R30: c000001ef5ed8800    R31: c00000051bc5b280   
 NIP: c00000000033cd90    MSR: 8000000002029032    OR3: c000000000341864
 CTR: d00000001df75470    LR:  c000000000341868    XER: 0000000000000000
 CCR: 0000000084002822    MQ:  0000000000000001    DAR: 0000000000000000
 DSISR: f0000000002bf480     Syscall Result: 0000000000000000
 #5 [c000001dca6639f0] .shrink_dcache_for_umount_subtree at c00000000033cd90
 [Link Register] [c000001dca6639f0] .shrink_dcache_for_umount at c000000000341868
 #6 [c000001dca663aa0] .shrink_dcache_for_umount at c000000000341868  (unreliable)
 #7 [c000001dca663b20] .kill_anon_super at c00000000031a1f8
 #8 [c000001dca663bb0] fscache_n_op_requeue at d00000001df83bb0 [nfs]
 #9 [c000001dca663c30] .deactivate_locked_super at c00000000031af30
#10 [c000001dca663cb0] .mntput_no_expire at c00000000034e340
#11 [c000001dca663d40] .sys_umount at c0000000003506d4
#12 [c000001dca663e30] system_call at c00000000000a17c
 System Call [c00] exception frame:
 R0:  0000000000000034    R1:  00003fffe9994bc0    R2:  00003fffae454700   

crash instance 2:

crash> ps -p 13272 
PID: 0      TASK: c000000001358180  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: c000001ef9980000  CPU: 62  COMMAND: "systemd"
  PID: 60123  TASK: c0000017109226e0  CPU: 74  COMMAND: "runmmfs"
   PID: 64877  TASK: c00000008aea6670  CPU: 21  COMMAND: "mmfsd"           < 
    PID: 13167  TASK: c000000f6475e750  CPU: 4   COMMAND: "mmcommon"
     PID: 13270  TASK: c000001eeccee590  CPU: 2   COMMAND: "umount"
      PID: 13272  TASK: c000000ef12dfc80  CPU: 2   COMMAND: "umount.nfs"     < 

crash> bt 
PID: 13272  TASK: c000000ef12dfc80  CPU: 2   COMMAND: "umount.nfs"
 #0 [c000000eba60b3a0] .crash_kexec at c000000000196284
 #1 [c000000eba60b420] .die at c000000000020cd8
 #2 [c000000eba60b4d0] ._exception at c000000000020ff4
 #3 [c000000eba60b670] .program_check_exception at c00000000097d748
 #4 [c000000eba60b700] program_check_common at c000000000006308
 Program Check [700] exception frame:
 R0:  c000000000341868    R1:  c000000eba60b9f0    R2:  c000000001429b00   
 R3:  c000000fc9ac8940    R4:  f00000000473c618    R5:  c00000145a65c000   
 R6:  c00000145a65c000    R7:  0000000000000001    R8:  0000000000000000   
 R9:  0000000000000001    R10: 0000000000000000    R11: 0000000000000000   
 R12: c000000000346cc0    R13: c000000007b21200    R14: 0000000000000000   
 R15: 0000000000000000    R16: 0000000000000000    R17: 0000000000000000   
 R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000000   
 R21: c0000000012fdf88    R22: 0000000000000000    R23: c0000000017df938   
 R24: 0000000000000000    R25: c000000ed39a3800    R26: 0000000000000000   
 R27: 000001003c9a01e0    R28: d00000002df1e7c0    R29: 000000000000009e   
 R30: c000000ed39a2000    R31: c000000fc9ac8940   
 NIP: c00000000033cd90    MSR: 8000000000029032    OR3: c000000000341864
 CTR: d00000002dee5470    LR:  c000000000341868    XER: 0000000000000000
 CCR: 0000000088002822    MQ:  0000000000000001    DAR: 0000000000000000
 DSISR: f00000000473c618     Syscall Result: 0000000000000000
 #5 [c000000eba60b9f0] .shrink_dcache_for_umount_subtree at c00000000033cd90
 [Link Register] [c000000eba60b9f0] .shrink_dcache_for_umount at c000000000341868
 #6 [c000000eba60baa0] .shrink_dcache_for_umount at c000000000341868  (unreliable)
 #7 [c000000eba60bb20] .kill_anon_super at c00000000031a1f8
 #8 [c000000eba60bbb0] fscache_n_op_requeue at d00000002def3bb0 [nfs]
 #9 [c000000eba60bc30] .deactivate_locked_super at c00000000031af30
#10 [c000000eba60bcb0] .mntput_no_expire at c00000000034e340
#11 [c000000eba60bd40] .sys_umount at c0000000003506d4
#12 [c000000eba60be30] system_call at c00000000000a17c
 System Call [c00] exception frame:

Environment

  • Red Hat Enterprise Linux 7
  • GPFS software - General Parallel File System, for clustered environment
    mmfs26 and mmfslinux kernel modules - part of GPFS

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content