Gnocchi keeps having segfault

Solution In Progress - Updated -

Issue

  • We have deployed gnocchi with backend ceph and mariadb. It looks like gnocchi-metric keeps crashing creating continuously crashdumps in /var/crash directory. There is not much stats going on. This happens right after installing and creating 1 VM.

  • In /var/log/messages we see the following:

Aug 20 16:38:04 overcloud-controller-0 kernel: gnocchi-metricd[455879]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:04 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:07 overcloud-controller-0 kernel: gnocchi-metricd[456112]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:07 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456224]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456245]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:10 overcloud-controller-0 kernel: gnocchi-metricd[456249]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:10 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:14 overcloud-controller-0 kernel: gnocchi-metricd[456435]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:14 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456493]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456497]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:15 overcloud-controller-0 kernel: gnocchi-metricd[456502]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:15 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:16 overcloud-controller-0 kernel: gnocchi-metricd[456508]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:16 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:17 overcloud-controller-0 kernel: gnocchi-metricd[456565]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:17 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:19 overcloud-controller-0 kernel: gnocchi-metricd[456604]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:19 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:19 overcloud-controller-0 kernel: gnocchi-metricd[456647]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:19 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:21 overcloud-controller-0 kernel: gnocchi-metricd[456808]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:21 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:22 overcloud-controller-0 kernel: gnocchi-metricd[456820]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:22 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:23 overcloud-controller-0 kernel: gnocchi-metricd[456889]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:23 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:24 overcloud-controller-0 kernel: gnocchi-metricd[456971]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3f7fca10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:24 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.175 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " d f " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.175 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.175 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " d f " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.175 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",d,f,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.176 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " o s d   p o o l   g e t - q u o t a " ,   " p o o l " :   " v o l u m e s " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 d0ed37659573[18799]: 2020-08-20 16:38:29.176 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",v,o,l,u,m,e,s,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.176 7f6c88326700  0 mon.ceph-overcloud-controller-0@0(leader) e2 handle_command mon_command({ " p r e f i x " : " o s d   p o o l   g e t - q u o t a " ,   " p o o l " :   " v o l u m e s " ,   " f o r m a t " : " j s o n " } v 0) v1
Aug 20 16:38:29 overcloud-controller-0 docker[45554]: 2020-08-20 16:38:29.176 7f6c88326700  0 log_channel(audit) log [DBG] : from='client.? 28.2.1.13:0/2880104761' entity='client.cinder' cmd=[{,",p,r,e,f,i,x,",:,",o,s,d, ,p,o,o,l, ,g,e,t,-,q,u,o,t,a,",,, ,",p,o,o,l,",:, ,",v,o,l,u,m,e,s,",,, ,",f,o,r,m,a,t,",:,",j,s,o,n,",}]: dispatch
Aug 20 16:38:29 overcloud-controller-0 kernel: gnocchi-metricd[457181]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
Aug 20 16:38:29 overcloud-controller-0 kernel: Code: Bad RIP value.
Aug 20 16:38:29 overcloud-controller-0 kernel: gnocchi-metricd[457189]: segfault at 5413ba87 ip 000000005413ba87 sp 00007fcb3fffda10 error 14 in platform-python3.6[5588e1e04000+2000]
  • Ceph seems fine:
cephmon_28073 [ceph@overcloud-controller-0 /]$ ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL      USED        RAW USED     %RAW USED 
    hdd       12 TiB     12 TiB     3.3 GiB       18 GiB          0.15 
    TOTAL     12 TiB     12 TiB     3.3 GiB       18 GiB          0.15 

POOLS:
    POOL        ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    images       1     1.1 GiB         143     3.2 GiB      0.03       3.9 TiB 
    volumes      2         0 B           0         0 B         0       3.9 TiB 
    metrics      3        14 B         108     192 KiB         0       3.9 TiB  <<< This is the one that gnocchi uses.
  • We see the mysql tables in mariadb database:
MariaDB [gnocchi]> select * from archive_policy;
+----------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+
| name                 | back_window | definition                                                                                                                                                                                | aggregation_methods                           |
+----------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+
| ap_5m_2d             |           0 | [{"timespan": 172800.0, "granularity": 300.0, "points": 576}]                                                                                                                             | ["min", "max", "sum", "std", "count", "mean"] |
| bool                 |        3600 | [{"timespan": 31536000.0, "granularity": 1.0, "points": 31536000}]                                                                                                                        | ["last"]                                      |
| ceilometer-high      |           0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 86400.0, "granularity": 60.0, "points": 1440}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}]   | ["mean"]                                      |
| ceilometer-high-rate |           0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 86400.0, "granularity": 60.0, "points": 1440}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}]   | ["mean", "rate:mean"]                         |
| ceilometer-low       |           0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}]                                                                                                                           | ["mean"]                                      |
| ceilometer-low-rate  |           0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}]                                                                                                                           | ["mean", "rate:mean"]                         |
| high                 |           0 | [{"timespan": 3600.0, "granularity": 1.0, "points": 3600}, {"timespan": 604800.0, "granularity": 60.0, "points": 10080}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}] | ["min", "sum", "std", "max", "mean", "count"] |
| low                  |           0 | [{"timespan": 2592000.0, "granularity": 300.0, "points": 8640}]                                                                                                                           | ["min", "sum", "std", "max", "mean", "count"] |
| medium               |           0 | [{"timespan": 604800.0, "granularity": 60.0, "points": 10080}, {"timespan": 31536000.0, "granularity": 3600.0, "points": 8760}]                                                           | ["min", "sum", "std", "max", "mean", "count"] |
  • Not much in the gnocchi-metric logs:
2020-08-20 16:47:17,758 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-0
2020-08-20 16:47:17,768 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-2
2020-08-20 16:47:17,775 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-4
2020-08-20 16:47:17,779 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-5
2020-08-20 16:47:17,786 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-7
2020-08-20 16:47:17,790 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-8
2020-08-20 16:47:17,793 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-9
2020-08-20 16:47:17,796 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-10
2020-08-20 16:47:17,799 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-11
2020-08-20 16:47:17,802 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-12
2020-08-20 16:47:17,806 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-13
2020-08-20 16:47:17,809 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-14
2020-08-20 16:47:17,814 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-16
2020-08-20 16:47:17,818 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-17
2020-08-20 16:47:17,823 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-18
2020-08-20 16:47:17,855 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-21
2020-08-20 16:47:17,862 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-22
2020-08-20 16:47:17,900 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-25
2020-08-20 16:47:17,914 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-27
2020-08-20 16:47:17,929 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-29
2020-08-20 16:47:17,934 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-30
2020-08-20 16:47:17,961 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-32
2020-08-20 16:47:17,975 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-34
2020-08-20 16:47:17,985 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-35
2020-08-20 16:47:17,989 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-36
2020-08-20 16:47:18,020 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-42
2020-08-20 16:47:18,066 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-45
2020-08-20 16:47:18,080 [67671] DEBUG    gnocchi.chef: Processing measures for sack incoming128-47
  • There's a bunch of crash files:
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | wc -l
2699
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | tail -2
-rw-rw-rw-. 1 root root  12320768 Aug 20 16:49 gnocchi-metricd.1597967338.488923.gz
-rw-rw-rw-. 1 root root  15728640 Aug 20 16:49 gnocchi-metricd.1597967327.489436.gz
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | wc -l
2707
[root@overcloud-controller-0 ~]# ls -ltr /var/crash | tail -2
-rw-rw-rw-. 1 root root  15204352 Aug 20 16:49 gnocchi-metricd.1597967345.489925.gz
-rw-rw-rw-. 1 root root  19398656 Aug 20 16:49 gnocchi-metricd.1597967392.492815.gz
  • Not much info trying to decoding the crash dump:
[root@overcloud-controller-0 crash]# gunzip gnocchi-metricd.1597962184.253089.gz 
[root@overcloud-controller-0 crash]# eu-unstrip -n --core=/var/crash/gnocchi-metricd.1597962184.253089
0x5579f2883000+0x203000 306d113ca7e9506136be43fa77097c7670f2206e@0x5579f2883284 . - /usr/libexec/platform-python3.6
0x7f76e517e000+0x8e81000 07dc1cdecd938e94c7823c062fd76555170dcac9@0x7f76e517e210 - - /usr/lib64/ceph/libceph-common.so.0
0x7f77b8df7000+0x206000 1b97596bda8fd7f32c6a28d1ad0c050a43bde798@0x7f77b8df71d8 . - /usr/lib64/python3.6/lib-dynload/termios.cpython-36m-x86_64-linux-gnu.so
0x7f77c01e2000+0x20b000 82b165a0855d4ed7967192a249eed5a0ac2b9e68@0x7f77c01e2210 - - /usr/lib64/python3.6/site-packages/simplejson/_speedups.cpython-36m-x86_64-linux-gnu.so
0x7f77c03ed000+0x205000 440e3191933e4c588f1de5a7212b031395ad69f6@0x7f77c03ed210 . - /usr/lib64/liburcu-common.so.6.0.0
0x7f77c05f2000+0x20a000 603651bfa53acf00cddf71a11de2caf06bfb4501@0x7f77c05f2210 . - /usr/lib64/liburcu-cds.so.6.0.0

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In