How to Use 'fio' to Check Etcd Disk Performance in OCP

Solution Verified - Updated -

Issue

  • etcd has delicate disk response requirements, and it is often necessary to ensure that the speed that etcd writes to its backing storage is fast enough for production workloads.
  • etcd alerts from the web console or frequent error messages such as the below may suggest that writes are taking too long:

    2020-10-21T09:56:00.246667768Z 2020-10-21 09:56:00.246542 W | etcdserver: read-only range request "key:\"/kubernetes.io/serviceaccounts/openshift-kube-scheduler/localhost-recovery-client\" " with result  "range_response_count:1 size:407" took too long (113.372697ms) to execute
    
  • The performance documentation on etcd suggests that in production workloads, wal_fsync_duration_seconds p99 duration should be less than 10ms to confirm the disk is reasonably fast.

  • Depending on the severity of disk speed issues, impact can range from frequent alerting to overall cluster instability.
  • For more general information regarding infrastructure requirements, please see etcd backend performance requirements.

Environment

  • Red Hat OpenShift Container Platform (RHOCP, OCP)
    • 3.11
    • 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content