glusterd crashed several times in a node due a race condition
Issue
-
glusterdservice has crashed several times by a SIGSEGV signalJun 15 12:13:38 node03 abrt-hook-ccpp: Process 8260 (glusterfsd) of user 0 killed by SIGSEGV - dumping core Jun 15 12:13:47 node03 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV -
Also a trace is found in /var/log/messages
[2020-06-15 12:14:03.409818] E [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xf0f1) [0x7f40cb0a90f1] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x7c5aa) [0x7f40bf4765aa] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x7a47b) [0x7f40bf47447b] ) 0-: Assertion failed: rsp.op == txn_op_info.op pending frames: frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-06-15 12:14:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.0 /lib64/libglusterfs.so.0(+0x271e0)[0x7f40cb3001e0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f40cb30ac04] /lib64/libc.so.6(+0x363f0)[0x7f40c993c3f0] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f40ca140d00] /lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f40cb32c42c] /lib64/libglusterfs.so.0(data_destroy+0x4b)[0x7f40cb2f3c3b] /lib64/libglusterfs.so.0(dict_get_int32n+0x11e)[0x7f40cb2f754e] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x64008)[0x7f40bf45e008] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x7a4b6)[0x7f40bf4744b6] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x7c5aa)[0x7f40bf4765aa] /lib64/libgfrpc.so.0(+0xf0f1)[0x7f40cb0a90f1] /lib64/libgfrpc.so.0(+0xf457)[0x7f40cb0a9457] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f40cb0a5af3] /usr/lib64/glusterfs/6.0/rpc-transport/socket.so(+0xaaf5)[0x7f40be644af5] /lib64/libglusterfs.so.0(+0x8b806)[0x7f40cb364806] /lib64/libpthread.so.0(+0x7ea5)[0x7f40ca13eea5] /lib64/libc.so.6(clone+0x6d)[0x7f40c9a048cd] --------- [2020-06-15 12:20:43.467202] W [MSGID: 103071] [rdma.c:4472:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] -
In /var/log/glusterfs/glusterd.log there are a huge number of entries like these
[2020-06-14 14:18:23.533011] E [MSGID: 106150] [glusterd-syncop.c:1931:gd_sync_task_begin] 0-management: Locking Peers Failed. [2020-06-14 14:18:23.533962] E [MSGID: 106115] [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on gfs01.dkv.altemista.cloud.novalocal. Please check log file for details. [2020-06-14 14:18:23.534002] E [MSGID: 106115] [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on gfs02.dkv.altemista.cloud.novalocal. Please check log file for details. [2020-06-14 14:18:23.534035] E [MSGID: 106151] [glusterd-syncop.c:1616:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s) [2020-06-14 14:19:01.859085] E [MSGID: 106275] [glusterd-rpc-ops.c:862:glusterd_mgmt_v3_lock_peers_cbk_fn] 0-management: Received mgmt_v3 lock RJT from uuid: 695e99df-1a71-4e9c-a3f4-05ca272f4bc2 [2020-06-14 14:19:01.860073] E [MSGID: 106275] [glusterd-rpc-ops.c:862:glusterd_mgmt_v3_lock_peers_cbk_fn] 0-management: Received mgmt_v3 lock RJT from uuid: cdf0a57f-9d85-44ad-a054-5b56428bd729 [2020-06-14 14:19:01.860778] E [MSGID: 106278] [glusterd-rpc-ops.c:970:glusterd_mgmt_v3_unlock_peers_cbk_fn] 0-management: Received mgmt_v3 unlock RJT from uuid: 695e99df-1a71-4e9c-a3f4-05ca272f4bc2
Environment
- RHGS 3.5
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.