Gnocchi keeps having segfault

Solution In Progress - Updated -

Issue

  • We have deployed gnocchi with backend ceph and mariadb. It looks like gnocchi-metric keeps crashing creating continuously crashdumps in /var/crash directory. There is not much stats going on. This happens right after installing and creating 1 VM.

  • In /var/log/messages we see the following:

Aug 20 16:38:04 overcloud-controller-0 kernel: gnocchi-metricd[455879]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:04 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:07 overcloud-controller-0 kernel: gnocchi-metricd[456112]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:07 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456224]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456245]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456249]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:14 overcloud-controller-0 kernel: gnocchi-metricd[456435]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:14 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456493]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456497]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456502]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:16 overcloud-controller-0 kernel: gnocchi-metricd[456508]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:16 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:17 overcloud-controller-0 kernel: gnocchi-metricd[456565]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:17 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:19 overcloud-controller-0 kernel: gnocchi-metricd[456604]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:19 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:19 overcloud-controller-0 kernel: gnocchi-metricd[456647]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:19 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:21 overcloud-controller-0 kernel: gnocchi-metricd[456808]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:21 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:22 overcloud-controller-0 kernel: gnocchi-metricd[456820]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:22 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:23 overcloud-controller-0 kernel: gnocchi-metricd[456889]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:23 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:24 overcloud-controller-0 kernel: gnocchi-metricd[456971]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:24 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.175 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " d f " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.175 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.175 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " d f " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.175 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.176 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " o s d   p o o l   g e t - q u o t a " ,   " p o o l " :   " v o l u m e s " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.176 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",v,o,l,u,m,e,s,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.176 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " o s d   p o o l   g e t - q u o t a " ,   " p o o l " :   " v o l u m e s " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.176 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",v,o,l,u,m,e,s,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 kernel: gnocchi-metricd[457181]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:29 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:29 overcloud-controller-0 kernel: gnocchi-metricd[457189]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
  • Ceph seems fine:
cephmon_28073 [ceph@overcloud-controller-0 /]$ ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL      USED        RAW USED     %RAW USED 
    hdd       12 TiB     12 TiB     3.3 GiB       18 GiB          0.15 
    TOTAL     12 TiB     12 TiB     3.3 GiB       18 GiB          0.15 

POOLS:
    POOL        ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    images       1     1.1 GiB         143     3.2 GiB      0.03       3.9 TiB 
    volumes      2         0 B           0         0 B         0       3.9 TiB 
    metrics      3        14 B         108     192 KiB         0       3.9 TiB  <<< This is the one that gnocchi uses.
  • We see the mysql tables in mariadb database:
MariaDB [gnocchi]> select * from archive_policy;
+----------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+
| name                 | back_window | definition                                                                                                                                                                                | aggregation_methods                           |
+----------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+
| ap_5m_2d             |           0 | [{"timespan": 172800.0, "granularity": 300.0, "points": 576}]                                                                                                                             | ["min", "max", "sum", "std", "count", "mean"] |
| bool                 |        3600 | [{"timespan": 31536000.0, "granularity": 1.0, "points": 31536000}]                                                                                                                        | ["last"]                                      |
| ceilometer-high      |           0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 86400.0, "granularity": 60.0, "points": 1440}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}]   | ["mean"]                                      |
| ceilometer-high-rate |           0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 86400.0, "granularity": 60.0, "points": 1440}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}]   | ["mean", "rate:mean"]                         |
| ceilometer-low       |           0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}]                                                                                                                           | ["mean"]                                      |
| ceilometer-low-rate  |           0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}]                                                                                                                           | ["mean", "rate:mean"]                         |
| high                 |           0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 604800.0, "granularity": 60.0, "points": 10080}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}] | ["min", "sum", "std", "max", "mean", "count"] |
| low                  |           0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}]                                                                                                                           | ["min", "sum", "std", "max", "mean", "count"] |
| medium               |           0 | [{"timespan": 604800.0, "granularity": 60.0, "points": 10080}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}]                                                           | ["min", "sum", "std", "max", "mean", "count"] |
  • Not much in the gnocchi-metric logs:
2020-08-20 16:47:17,758 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-0
2020-08-20 16:47:17,768 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-2
2020-08-20 16:47:17,775 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-4
2020-08-20 16:47:17,779 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-5
2020-08-20 16:47:17,786 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-7
2020-08-20 16:47:17,790 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-8
2020-08-20 16:47:17,793 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-9
2020-08-20 16:47:17,796 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-10
2020-08-20 16:47:17,799 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-11
2020-08-20 16:47:17,802 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-12
2020-08-20 16:47:17,806 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-13
2020-08-20 16:47:17,809 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-14
2020-08-20 16:47:17,814 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-16
2020-08-20 16:47:17,818 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-17
2020-08-20 16:47:17,823 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-18
2020-08-20 16:47:17,855 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-21
2020-08-20 16:47:17,862 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-22
2020-08-20 16:47:17,900 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-25
2020-08-20 16:47:17,914 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-27
2020-08-20 16:47:17,929 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-29
2020-08-20 16:47:17,934 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-30
2020-08-20 16:47:17,961 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-32
2020-08-20 16:47:17,975 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-34
2020-08-20 16:47:17,985 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-35
2020-08-20 16:47:17,989 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-36
2020-08-20 16:47:18,020 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-42
2020-08-20 16:47:18,066 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-45
2020-08-20 16:47:18,080 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-47
  • There's a bunch of crash files:
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | wc -l
2699
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | tail -2
-rw-rw-rw-. 1 root root  12320768 Aug 20 16:49 gnocchi-metricd.1597967338.488923.gz
-rw-rw-rw-. 1 root root  15728640 Aug 20 16:49 gnocchi-metricd.1597967327.489436.gz
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | wc -l
2707
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | tail -2
-rw-rw-rw-. 1 root root  15204352 Aug 20 16:49 gnocchi-metricd.1597967345.489925.gz
-rw-rw-rw-. 1 root root  19398656 Aug 20 16:49 gnocchi-metricd.1597967392.492815.gz
  • Not much info trying to decoding the crash dump:
[root@overcloud-controller-0 crash]# gunzip gnocchi-metricd.1597962184.253089.gz 
[root@overcloud-controller-0 crash]# eu-unstrip -n --core=/var/crash/gnocchi-metricd.1597962184.253089
0x5579f2883000+0x203000 306d113ca7e9506136be43fa77097c7670f2206e@0x5579f2883284 . - /usr/libexec/platform-python3.6
0x7f76e517e000+0x8e81000 07dc1cdecd938e94c7823c062fd76555170dcac9@0x7f76e517e210 - - /usr/lib64/ceph/libceph-common.so.0
0x7f77b8df7000+0x206000 1b97596bda8fd7f32c6a28d1ad0c050a43bde798@0x7f77b8df71d8 . - /usr/lib64/python3.6/lib-dynload/termios.cpython-36m-x86_64-linux-gnu.so
0x7f77c01e2000+0x20b000 82b165a0855d4ed7967192a249eed5a0ac2b9e68@0x7f77c01e2210 - - /usr/lib64/python3.6/site-packages/simplejson/_speedups.cpython-36m-x86_64-linux-gnu.so
0x7f77c03ed000+0x205000 440e3191933e4c588f1de5a7212b031395ad69f6@0x7f77c03ed210 . - /usr/lib64/liburcu-common.so.6.0.0
0x7f77c05f2000+0x20a000 603651bfa53acf00cddf71a11de2caf06bfb4501@0x7f77c05f2210 . - /usr/lib64/liburcu-cds.so.6.0.0

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content