Gnocchi keeps having segfault
Issue
-
We have deployed gnocchi with backend ceph and mariadb. It looks like gnocchi-metric keeps crashing creating continuously crashdumps in /var/crash directory. There is not much stats going on. This happens right after installing and creating 1 VM.
-
In /var/log/messages we see the following:
Aug 20 16:38:04 overcloud-controller-0 kernel: gnocchi-metricd[455879]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:04 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:07 overcloud-controller-0 kernel: gnocchi-metricd[456112]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:07 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456224]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456245]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456249]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:14 overcloud-controller-0 kernel: gnocchi-metricd[456435]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:14 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456493]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456497]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456502]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:16 overcloud-controller-0 kernel: gnocchi-metricd[456508]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:16 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:17 overcloud-controller-0 kernel: gnocchi-metricd[456565]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:17 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:19 overcloud-controller-0 kernel: gnocchi-metricd[456604]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:19 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:19 overcloud-controller-0 kernel: gnocchi-metricd[456647]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:19 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:21 overcloud-controller-0 kernel: gnocchi-metricd[456808]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:21 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:22 overcloud-controller-0 kernel: gnocchi-metricd[456820]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:22 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:23 overcloud-controller-0 kernel: gnocchi-metricd[456889]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:23 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:24 overcloud-controller-0 kernel: gnocchi-metricd[456971]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:24 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.175 7f6c88326700 0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " d f " , " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.175 7f6c88326700 0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.175 7f6c88326700 0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " d f " , " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.175 7f6c88326700 0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.176 7f6c88326700 0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " o s d p o o l g e t - q u o t a " , " p o o l " : " v o l u m e s " , " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.176 7f6c88326700 0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",v,o,l,u,m,e,s,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.176 7f6c88326700 0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " o s d p o o l g e t - q u o t a " , " p o o l " : " v o l u m e s " , " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.176 7f6c88326700 0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",v,o,l,u,m,e,s,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 kernel: gnocchi-metricd[457181]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:29 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:29 overcloud-controller-0 kernel: gnocchi-metricd[457189]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
- Ceph seems fine:
cephmon_28073 [ceph@overcloud-controller-0 /]$ ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 12 TiB 12 TiB 3.3 GiB 18 GiB 0.15
TOTAL 12 TiB 12 TiB 3.3 GiB 18 GiB 0.15
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
images 1 1.1 GiB 143 3.2 GiB 0.03 3.9 TiB
volumes 2 0 B 0 0 B 0 3.9 TiB
metrics 3 14 B 108 192 KiB 0 3.9 TiB <<< This is the one that gnocchi uses.
- We see the mysql tables in mariadb database:
MariaDB [gnocchi]> select * from archive_policy;
+----------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+
| name | back_window | definition | aggregation_methods |
+----------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+
| ap_5m_2d | 0 | [{"timespan": 172800.0, "granularity": 300.0, "points": 576}] | ["min", "max", "sum", "std", "count", "mean"] |
| bool | 3600 | [{"timespan": 31536000.0, "granularity": 1.0, "points": 31536000}] | ["last"] |
| ceilometer-high | 0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 86400.0, "granularity": 60.0, "points": 1440}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}] | ["mean"] |
| ceilometer-high-rate | 0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 86400.0, "granularity": 60.0, "points": 1440}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}] | ["mean", "rate:mean"] |
| ceilometer-low | 0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}] | ["mean"] |
| ceilometer-low-rate | 0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}] | ["mean", "rate:mean"] |
| high | 0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 604800.0, "granularity": 60.0, "points": 10080}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}] | ["min", "sum", "std", "max", "mean", "count"] |
| low | 0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}] | ["min", "sum", "std", "max", "mean", "count"] |
| medium | 0 | [{"timespan": 604800.0, "granularity": 60.0, "points": 10080}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}] | ["min", "sum", "std", "max", "mean", "count"] |
- Not much in the gnocchi-metric logs:
2020-08-20 16:47:17,758 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-0
2020-08-20 16:47:17,768 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-2
2020-08-20 16:47:17,775 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-4
2020-08-20 16:47:17,779 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-5
2020-08-20 16:47:17,786 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-7
2020-08-20 16:47:17,790 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-8
2020-08-20 16:47:17,793 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-9
2020-08-20 16:47:17,796 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-10
2020-08-20 16:47:17,799 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-11
2020-08-20 16:47:17,802 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-12
2020-08-20 16:47:17,806 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-13
2020-08-20 16:47:17,809 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-14
2020-08-20 16:47:17,814 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-16
2020-08-20 16:47:17,818 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-17
2020-08-20 16:47:17,823 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-18
2020-08-20 16:47:17,855 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-21
2020-08-20 16:47:17,862 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-22
2020-08-20 16:47:17,900 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-25
2020-08-20 16:47:17,914 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-27
2020-08-20 16:47:17,929 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-29
2020-08-20 16:47:17,934 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-30
2020-08-20 16:47:17,961 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-32
2020-08-20 16:47:17,975 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-34
2020-08-20 16:47:17,985 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-35
2020-08-20 16:47:17,989 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-36
2020-08-20 16:47:18,020 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-42
2020-08-20 16:47:18,066 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-45
2020-08-20 16:47:18,080 [67671] DEBUG gnocchi.chef: Processing measures for sack incoming128-47
- There's a bunch of crash files:
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | wc -l
2699
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | tail -2
-rw-rw-rw-. 1 root root 12320768 Aug 20 16:49 gnocchi-metricd.1597967338.488923.gz
-rw-rw-rw-. 1 root root 15728640 Aug 20 16:49 gnocchi-metricd.1597967327.489436.gz
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | wc -l
2707
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | tail -2
-rw-rw-rw-. 1 root root 15204352 Aug 20 16:49 gnocchi-metricd.1597967345.489925.gz
-rw-rw-rw-. 1 root root 19398656 Aug 20 16:49 gnocchi-metricd.1597967392.492815.gz
- Not much info trying to decoding the crash dump:
[root@overcloud-controller-0 crash]# gunzip gnocchi-metricd.1597962184.253089.gz
[root@overcloud-controller-0 crash]# eu-unstrip -n --core=/var/crash/gnocchi-metricd.1597962184.253089
0x5579f2883000+0x203000 306d113ca7e9506136be43fa77097c7670f2206e@0x5579f2883284 . - /usr/libexec/platform-python3.6
0x7f76e517e000+0x8e81000 07dc1cdecd938e94c7823c062fd76555170dcac9@0x7f76e517e210 - - /usr/lib64/ceph/libceph-common.so.0
0x7f77b8df7000+0x206000 1b97596bda8fd7f32c6a28d1ad0c050a43bde798@0x7f77b8df71d8 . - /usr/lib64/python3.6/lib-dynload/termios.cpython-36m-x86_64-linux-gnu.so
0x7f77c01e2000+0x20b000 82b165a0855d4ed7967192a249eed5a0ac2b9e68@0x7f77c01e2210 - - /usr/lib64/python3.6/site-packages/simplejson/_speedups.cpython-36m-x86_64-linux-gnu.so
0x7f77c03ed000+0x205000 440e3191933e4c588f1de5a7212b031395ad69f6@0x7f77c03ed210 . - /usr/lib64/liburcu-common.so.6.0.0
0x7f77c05f2000+0x20a000 603651bfa53acf00cddf71a11de2caf06bfb4501@0x7f77c05f2210 . - /usr/lib64/liburcu-cds.so.6.0.0
Environment
- Red Hat OpenStack Platform 16.1 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.