We updated to latest RHEL 7.5 kernel and now the PCP metric "hotproc.psinfo.cpusallowed" is broken

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7.5
  • Kernel 3.10.0-862.6.3
  • Performance Co-Pilot (PCP) 3.12.2-5.el7.x86_64

Issue

  • PCP metric hotproc.psinfo.cpusallowed broken after updating to RHEL 7.5
  • pminfo -f hotproc.psinfo.cpusallowed returns Error: Metric not supported by this version of monitored application

Resolution

Please update to
- pcp-4.1.0-5.el7_6 or newer released with Advisory RHBA-2018:3345 for RHEL 7.6.
- pcp-3.12.2-7.el7_5 or newer released with Advisory RHBA-2018:3512 for RHEL 7.5 EUS.

Root Cause

Previously, the Performance Co-Pilot (PCP) toolkit failed to report certain process details on very large systems in some cases. The code reading the process details files was changed so that it can read data of arbitrary length, instead of only the first 1024 bytes. As a result, the described PCP error can no longer happen.

Diagnostic Steps

When trying to read the metric:

[root@localhost ~]# pminfo -f hotproc.psinfo.cpusallowed
hotproc.psinfo.cpusallowed
Error: Metric not supported by this version of monitored application

The exsiting instrumentation yields the following:

fetch_proc_pid_status: read raw:Cpus_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff,ffffffff
fetch_proc_pid_status: skip Cpus_allowed:       00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff,ffffffff
[Wed Jul 11 11:33:22] pmdaproc(42544) Error: pmdaFetch: Unavailable metric PMID 3.56.31[42841]

Per the above, the read is getting to the Cpus_allowed line, and then stopping. Meaning that the following is occurring:

                case 'C':                       
                    if (strncmp(curline, "Cpus_allowed_list:", 18) == 0)                             
                        ep->status_lines.cpusallowed = strsep(&curline, "\n");                       
                    else                        
                        goto nomatch;           
                    break;                      
                case 'e':                       

The above is exercised, but only the C case match is matched. The strncmp() operation doesn't match due to the following contents:

Current curline = "Cpus_allowed:   00"
  Match curline = "Cpus_allowed_list:"

This is exceptionally odd in that, if we widen the scope of review to just any instance of fetch_proc_pid_status, we see the expected type of behavior:

fetch_proc_pid_status: read raw:Name:       kworker/8:0H
fetch_proc_pid_status: skip Name:   kworker/8:0H
fetch_proc_pid_status: read raw:Umask:      0000
fetch_proc_pid_status: skip Umask:  0000
fetch_proc_pid_status: read raw:State:      S (sleeping)
fetch_proc_pid_status: skip State:  S (sleeping)
fetch_proc_pid_status: read raw:Tgid:       53
fetch_proc_pid_status: read raw:Ngid:       0
fetch_proc_pid_status: read raw:Pid:        53
fetch_proc_pid_status: skip Pid:    53
fetch_proc_pid_status: read raw:PPid:       2
fetch_proc_pid_status: skip PPid:   2
fetch_proc_pid_status: read raw:TracerPid:  0
fetch_proc_pid_status: skip TracerPid:      0
fetch_proc_pid_status: read raw:Uid:        0       0       0       0
fetch_proc_pid_status: read raw:Gid:        0       0       0       0
fetch_proc_pid_status: read raw:FDSize:     64
fetch_proc_pid_status: skip FDSize: 64
fetch_proc_pid_status: read raw:Groups:
fetch_proc_pid_status: skip Groups: 
fetch_proc_pid_status: read raw:Threads:    1
fetch_proc_pid_status: read raw:SigQ:       4/1542231
fetch_proc_pid_status: skip SigQ:   4/1542231
fetch_proc_pid_status: read raw:SigPnd:     0000000000000000
fetch_proc_pid_status: read raw:ShdPnd:     0000000000000000
fetch_proc_pid_status: skip ShdPnd: 0000000000000000
fetch_proc_pid_status: read raw:SigBlk:     0000000000000000
fetch_proc_pid_status: read raw:SigIgn:     ffffffffffffffff
fetch_proc_pid_status: read raw:SigCgt:     0000000000000000
fetch_proc_pid_status: read raw:CapInh:     0000000000000000
fetch_proc_pid_status: skip CapInh: 0000000000000000
fetch_proc_pid_status: read raw:CapPrm:     0000001fffffffff
fetch_proc_pid_status: skip CapPrm: 0000001fffffffff
fetch_proc_pid_status: read raw:CapEff:     0000001fffffffff
fetch_proc_pid_status: skip CapEff: 0000001fffffffff
fetch_proc_pid_status: read raw:CapBnd:     0000001fffffffff
fetch_proc_pid_status: skip CapBnd: 0000001fffffffff
fetch_proc_pid_status: read raw:CapAmb:     0000000000000000
fetch_proc_pid_status: skip CapAmb: 0000000000000000
fetch_proc_pid_status: read raw:Seccomp:    0
fetch_proc_pid_status: skip Seccomp:        0
fetch_proc_pid_status: read raw:Speculation_Store_Bypass:   vulnerable
fetch_proc_pid_status: skip Speculation_Store_Bypass:       vulnerable
fetch_proc_pid_status: read raw:Cpus_allowed:       00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
fetch_proc_pid_status: skip Cpus_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
fetch_proc_pid_status: read raw:Cpus_allowed_list:  8
fetch_proc_pid_status: read raw:Mems_allowed:       00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00
fetch_proc_pid_status: skip Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
fetch_proc_pid_status: read raw:Mems_allowed_list:  0-1
fetch_proc_pid_status: skip Mems_allowed_list:      0-1
fetch_proc_pid_status: read raw:voluntary_ctxt_switches:    8
fetch_proc_pid_status: read raw:nonvoluntary_ctxt_switches: 0

When I pair this with an strace dataset, I believe we might have an idea of why the problem is occurring. Seeking through the file, and formatting in such a was as to retain newline breaks, we can find entries similar to the following:

72685 16:06:10.731397 read(10, "Name:   xhpcg_skx
Umask:  0077
State:  R (running)
Tgid:   95583
Ngid:   95614
Pid:    95583
PPid:   95581
TracerPid:      0
Uid:    20821   20821   20821   20821
Gid:    5509    5509    5509    5509
FDSize: 64
Groups: 1010 1021 2399 3285 3309 4391 4496 4547 4548 4837 5301 5325 5353 5356 5358 5509 5527
VmPeak: 19694676 kB
VmSize: 13805096 kB
VmLck:   0 kB
VmPin:   0 kB
VmHWM:  19282624 kB
VmRSS:  13394256 kB
RssAnon:        13378592 kB
RssFile:         12724 kB
RssShmem:        2940 kB
VmData: 13562172 kB
VmStk:   144 kB
VmExe:   396 kB
VmLib:   150592 kB
VmPTE:   26480 kB
VmSwap:  0 kB
Threads:        20
SigQ:   0/768120
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000100000
SigCgt: 00000001800004e8
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
Seccomp:        0
Speculation_Store_Bypass:       thread vulnerable
Cpus_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff", 1024) = 1024 <0.000032>

In the above, the Cpus_allowed entry is clipped! In fact, it is clipped at the above point and there isn't a further read() operation to get the rest of the file. If I follow the codepath correctly, it seems almost as if the read is capped at 1024 bytes and doesn't retry...

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments