Using node_exporter to expose Pod metrics to Prometheus in OpenShift

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform
    • 3.x

Issue

  • How do I monitor a persistent volume's disk usage in a Pod using Prometheus?

Resolution

See also: OpenShift Volume metrics in Prometheus

There is currently no fully supported solution but you can follow these steps to configure monitoring:

  1. Configure the node_exporter as a side-car container to your application. Disable all but the filesystem collector for node_exporter (enable more if you need more metrics collected).
  2. Mount the PersistentVolumeClaim you want to monitor in the side-car container (in addition to the main application container).
  3. Configure a new or existing Service to expose the metrics.
  4. See Reference section below for an example application using the side-car container.

Example metrics showing PersistentVolumeClaim metrics:

sh-4.2$ curl -s http://rhel-tools-metrics:9100/metrics | grep /data/test
node_filesystem_avail_bytes{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 1.016111104e+09
node_filesystem_device_error{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 0
node_filesystem_files{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 524288
node_filesystem_files_free{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 521042
node_filesystem_free_bytes{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 1.016111104e+09
node_filesystem_readonly{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 0
node_filesystem_size_bytes{device="10.29.67.106:op_pv_c3_small153",fstype="fuse.glusterfs",mountpoint="/data/test"} 1.063256064e+09

Comparing output to df output from the main application container. The values match up ((1.016111104e+09)/1024=992296)

sh-4.2$ df | grep /data/test 
10.29.67.106:op_pv_c3_small153   1038336    46040    992296   5% /data/test

References:
- The example below creates a Pod, Service, and PersistentVolumeClaim. The metrics would be presented via the Service at http://rhel-tools-metrics.project.svc.cluster.local:9100/metrics:

apiVersion: v1
kind: Pod
metadata:
  name: rhel-tools
  labels:
    name: rhel-tools
spec:
  volumes:
  - name: test-storage
    persistentVolumeClaim:
      claimName: my-volume-claim
  containers:
  - name: rhel-tools
    image: registry.access.redhat.com/rhel7/rhel-tools
    args: ["sleep", "infinity"]
    volumeMounts:
    - name: test-storage
      mountPath: /data/test
  - name: prometheus-sidecar
    image: quay.io/prometheus/node-exporter
    volumeMounts:
    - mountPath: /data/test
      name: test-storage
    ports:
    - containerPort: 9100
      protocol: TCP
    args:
      - --no-collector.arp
      - --no-collector.bcache
      - --no-collector.bonding
      - --no-collector.buddyinfo
      - --no-collector.conntrack
      - --no-collector.cpu
      - --no-collector.cpufreq
      - --no-collector.diskstats
      - --no-collector.drbd
      - --no-collector.edac
      - --no-collector.entropy
      - --no-collector.filefd
      - --no-collector.hwmon
      - --no-collector.infiniband
      - --no-collector.interrupts
      - --no-collector.ipvs
      - --no-collector.ksmd
      - --no-collector.loadavg
      - --no-collector.logind
      - --no-collector.mdadm
      - --no-collector.meminfo
      - --no-collector.meminfo_numa
      - --no-collector.mountstats
      - --no-collector.netclass
      - --no-collector.netdev
      - --no-collector.netstat
      - --no-collector.nfs
      - --no-collector.nfsd
      - --no-collector.ntp
      - --no-collector.perf
      - --no-collector.pressure
      - --no-collector.processes
      - --no-collector.qdisc
      - --no-collector.runit
      - --no-collector.sockstat
      - --no-collector.stat
      - --no-collector.supervisord
      - --no-collector.systemd
      - --no-collector.tcpstat
      - --no-collector.textfile
      - --no-collector.time
      - --no-collector.timex
      - --no-collector.uname
      - --no-collector.vmstat
      - --no-collector.wifi
      - --no-collector.xfs
      - --no-collector.zfs
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: rhel-tools
  name: rhel-tools-metrics
spec:
  ports:
  - name: port-1
    port: 9100
    protocol: TCP
    targetPort: 9100
  selector:
    name: rhel-tools
  type: ClusterIP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments