Red Hat Training

A Red Hat training course is available for OpenShift Container Platform

第31章 sysctl

31.1. 概要

Sysctl settings are exposed via Kubernetes, allowing users to modify certain kernel parameters at runtime for namespaces within a container. Only sysctls that are namespaced can be set independently on pods; if a sysctl is not namespaced (called node-level), it cannot be set within OpenShift Container Platform. Moreover, only those sysctls considered safe are whitelisted by default; other unsafe sysctls can be manually enabled on the node to be available to the user.

31.2. Understanding Sysctls

In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available via the /proc/sys/ virtual process file system. The parameters cover various subsystems such as:

  • カーネル (共通のプレフィックス: kernel.)
  • ネットワーク (共通のプレフィックス: net.)
  • 仮想メモリー (共通のプレフィックス: vm.)
  • MDADM (共通のプレフィックス: dev.)

追加のサブシステムについては、カーネルのドキュメントで説明されています。すてのパラメーターの一覧を取得するには、以下を実行できます。

$ sudo sysctl -a

31.3. Namespaced vs Node-Level Sysctls

A number of sysctls are namespaced in today’s Linux kernels. This means that they can be set independently for each pod on a node. Being namespaced is a requirement for sysctls to be accessible in a pod context within Kubernetes.

以下の sysctl は namespace を使用するものとして知られている sysctl です。

  • kernel.shm*
  • kernel.msg*
  • kernel.sem
  • fs.mqueue.*
  • net.*

Sysctls that are not namespaced are called node-level and must be set manually by the cluster administrator, either by means of the underlying Linux distribution of the nodes (e.g., via /etc/sysctls.conf) or using a DaemonSet with privileged containers.

注記

Consider marking nodes with special sysctls as tainted. Only schedule pods onto them that need those sysctl settings. Use the Kubernetes taints and toleration feature to implement this.

31.4. Safe vs Unsafe Sysctls

Sysctls are grouped into safe and unsafe sysctls. In addition to proper namespacing, a safe sysctl must be properly isolated between pods on the same node. This means that setting a safe sysctl for one pod:

  • must not have any influence on any other pod on the node,
  • must not allow to harm the node’s health, and
  • must not allow to gain CPU or memory resources outside of the resource limits of a pod.

namespace を使用した sysctl は必ずしも常に安全であると見なされる訳ではありません。

For OpenShift Container Platform 3.3.1, the following sysctls are supported (whitelisted) in the safe set:

  • kernel.shm_rmid_forced
  • net.ipv4.ip_local_port_range

This list will be extended in future versions when the kubelet supports better isolation mechanisms.

All safe sysctls are enabled by default. All unsafe sysctls are disabled by default and must be allowed manually by the cluster administrator on a per-node basis. Pods with disabled unsafe sysctls will be scheduled, but will fail to launch.

警告

安全でないという性質上、安全でない sysctl は各自の責任で使用されます。場合によっては、コンテナーの正しくない動作やリソース不足、またはノードの完全な破損などの深刻な問題が生じる可能性があります。

31.5. Enabling Unsafe Sysctls

With the warning above in mind, the cluster administrator can allow certain unsafe sysctls for very special situations, e.g., high-performance or real-time application tuning.

If you want to use unsafe sysctls, cluster administrators must enable them individually on nodes. Only namespaced sysctls can be enabled this way.

  1. Use the kubeletArguments field in the /etc/origin/node/node-config.yaml file, as described in Configuring Node Resources, to set the desired unsafe sysctls:

    kubeletArguments:
      experimental-allowed-unsafe-sysctls:
        - "kernel.msg*,net.ipv4.route.min_pmtu"
  2. 変更を適用するためにノードサービスを再起動します。

    # systemctl restart atomic-openshift-node

31.6. Setting Sysctls for a Pod

Sysctls are set on pods using annotations. They apply to all containers in the same pod.

Here is an example, with different annotations for safe and unsafe sysctls:

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
  annotations:
    security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1
    security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3
spec:
  ...
注記

A pod with the unsafe sysctls specified above will fail to launch on any node that has not enabled those two unsafe sysctls explicitly. As with node-level sysctls, use the taints and toleration feature or labels on nodes to schedule those pods onto the right nodes.