Chapter 8. OpenShift SDN

8.1. About OpenShift SDN default CNI network provider

OpenShift Container Platform uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between Pods across the OpenShift Container Platform cluster. This Pod network is established and maintained by the OpenShift SDN, which configures an overlay network using Open vSwitch (OVS).

OpenShift SDN provides three SDN modes for configuring the Pod network:

  • The network policy mode allows project administrators to configure their own isolation policies using NetworkPolicy objects. NetworkPolicy is the default mode in OpenShift Container Platform 4.2.
  • The multitenant mode provides project-level isolation for Pods and Services. Pods from different projects cannot send packets to or receive packets from Pods and Services of a different project. You can disable isolation for a project, allowing it to send network traffic to all Pods and Services in the entire cluster and receive network traffic from those Pods and Services.
  • The subnet mode provides a flat Pod network where every Pod can communicate with every other Pod and Service. The network policy mode provides the same functionality as the subnet mode.

8.2. Configuring egress IPs for a project

As a cluster administrator, you can configure the OpenShift SDN default Container Network Interface (CNI) network provider to assign one or more egress IP addresses to a project.

8.2.1. Egress IP address assignment for project egress traffic

By configuring an egress IP address for a project, all outgoing external connections from the specified project will share the same, fixed source IP address. External resources can recognize traffic from a particular project based on the egress IP address. An egress IP address assigned to a project is different from the egress router, which is used to send traffic to specific destinations.

Egress IP addresses are implemented as additional IP addresses on the primary network interface of the node and must be in the same subnet as the node’s primary IP address.

Important

Egress IP addresses must not be configured in any Linux network configuration files, such as ifcfg-eth0.

Allowing additional IP addresses on the primary network interface might require extra configuration when using some cloud or VM solutions.

You can assign egress IP addresses to namespaces by setting the egressIPs parameter of the NetNamespace resource. After an egress IP is associated with a project, OpenShift SDN allows you to assign egress IPs to hosts in two ways:

  • In the automatically assigned approach, an egress IP address range is assigned to a node.
  • In the manually assigned approach, a list of one or more egress IP address is assigned to a node.

Namespaces that request an egress IP address are matched with nodes that can host those egress IP addresses, and then the egress IP addresses are assigned to those nodes. If the egressIPs parameter is set on a NetNamespace resource, but no node hosts that egress IP address, then egress traffic from the namespace will be dropped.

High availability of nodes is automatic. If a node that hosts an egress IP address is unreachable and there are nodes that are able to host that egress IP address, then the egress IP address will move to a new node. When the unreachable node comes back online, the egress IP address automatically moves to balance egress IP addresses across nodes.

Important

You cannot use manually assigned and automatically assigned egress IP addresses on the same nodes. If you manually assign egress IP addresses from an IP address range, you must not make that range available for automatic IP assignment.

8.2.1.1. Considerations when using automatically assigned egress IP addresses

When using the automatic assignment approach for egress IP addresses the following considerations apply:

  • You set the egressCIDRs parameter of each node’s HostSubnet resource to indicate the range of egress IP addresses that can be hosted by a node. OpenShift Container Platform sets the egressIPs parameter of the HostSubnet resource based on the IP address range you specify.
  • Only a single egress IP address per namespace is supported when using the automatic assignment mode.

If the node hosting the namespace’s egress IP address is unreachable, OpenShift Container Platform will reassign the egress IP address to another node with a compatible egress IP address range. The automatic assignment approach works best for clusters installed in environments with flexibility in associating additional IP addresses with nodes.

8.2.1.2. Considerations when using manually assigned egress IP addresses

When using the manual assignment approach for egress IP addresses the following considerations apply:

  • You set the egressIPs parameter of each node’s HostSubnet resource to indicate the IP addresses that can be hosted by a node.
  • Multiple egress IP addresses per namespace are supported.

When a namespace has multiple egress IP addresses, if the node hosting the first egress IP address is unreachable, OpenShift Container Platform will automatically switch to using the next available egress IP address until the first egress IP address is reachable again.

This approach is recommended for clusters installed in public cloud environments, where there can be limitations on associating additional IP addresses with nodes.

8.2.2. Configuring automatically assigned egress IP addresses for a namespace

In OpenShift Container Platform you can enable automatic assignment of an egress IP address for a specific namespace across one or more nodes.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Update the NetNamespace resource with the egress IP address using the following JSON:

     $ oc patch netnamespace <project_name> --type=merge -p \ 1
      '{
        "egressIPs": [
          "<ip_address>" 2
        ]
      }'
    1
    Specify the name of the project.
    2
    Specify a single egress IP address. Using multiple IP addresses is not supported.

    For example, to assign project1 to an IP address of 192.168.1.100 and project2 to an IP address of 192.168.1.101:

    $ oc patch netnamespace project1 --type=merge -p \
      '{"egressIPs": ["192.168.1.100"]}'
    $ oc patch netnamespace project2 --type=merge -p \
      '{"egressIPs": ["192.168.1.101"]}'
  2. Indicate which nodes can host egress IP addresses by setting the egressCIDRs parameter for each host using the following JSON:

    $ oc patch hostsubnet <node_name> --type=merge -p \ 1
      '{
        "egressCIDRs": [
          "<ip_address_range_1>", "<ip_address_range_2>" 2
        ]
      }'
    1
    Specify a node name.
    2
    Specify one or more IP address ranges in CIDR format.

    For example, to set node1 and node2 to host egress IP addresses in the range 192.168.1.0 to 192.168.1.255:

    $ oc patch hostsubnet node1 --type=merge -p \
      '{"egressCIDRs": ["192.168.1.0/24"]}'
    $ oc patch hostsubnet node2 --type=merge -p \
      '{"egressCIDRs": ["192.168.1.0/24"]}'

    OpenShift Container Platform automatically assigns specific egress IP addresses to available nodes in a balanced way. In this case, it assigns the egress IP address 192.168.1.100 to node1 and the egress IP address 192.168.1.101 to node2 or vice versa.

8.2.3. Configuring manually assigned egress IP addresses for a namespace

In OpenShift Container Platform you can associate one or more egress IP addresses with a namespace.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Update the NetNamespace resource by specifying the following JSON object with the desired IP addresses:

    $ oc patch netnamespace <project> --type=merge -p \ 1
      '{
        "egressIPs": [ 2
          "<ip_address>"
          ]
      }'
    1
    Specify the name of the project.
    2
    Specify one or more egress IP addresses. The egressIPs parameter is an array.

    For example, to assign the project1 project to an IP address of 192.168.1.100:

    $ oc patch netnamespace project1 --type=merge \
      -p '{"egressIPs": ["192.168.1.100"]}'

    You can set egressIPs to two or more IP addresses on different nodes to provide high availability. If multiple egress IP addresses are set, pods use the first IP in the list for egress, but if the node hosting that IP address fails, pods switch to using the next IP in the list after a short delay.

  2. Manually assign the egress IP to the node hosts. Set the egressIPs parameter on the HostSubnet object on the node host. Using the following JSON, include as many IPs as you want to assign to that node host:

    $ oc patch hostsubnet <node_name> --type=merge -p \ 1
      '{
        "egressIPs": [ 2
          "<ip_address_1>",
          "<ip_address_N>"
          ]
      }'
    1
    Specify the name of the project.
    2
    Specify one or more egress IP addresses. The egressIPs field is an array.

    For example, to specify that node1 should have the egress IPs 192.168.1.100, 192.168.1.101, and 192.168.1.102:

    $ oc patch hostsubnet node1 --type=merge -p \
      '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'

    In the previous example, all egress traffic for project1 will be routed to the node hosting the specified egress IP, and then connected (using NAT) to that IP address.

8.3. Configuring an egress firewall to control access to external IP addresses

As a cluster administrator, you can create an egress firewall for a project that will restrict egress traffic leaving your OpenShift Container Platform cluster.

8.3.1. How an egress firewall works in a project

As a cluster administrator, you can use an egress firewall to limit the external hosts that some or all Pods can access from within the cluster. An egress firewall supports the following scenarios:

  • A Pod can only connect to internal hosts and cannot initiate connections to the public Internet.
  • A Pod can only connect to the public Internet and cannot initiate connections to internal hosts that are outside the OpenShift Container Platform cluster.
  • A Pod cannot reach specified internal subnets or hosts outside the OpenShift Container Platform cluster.
  • A Pod can connect to only specific external hosts.

You configure an egress firewall policy by creating an EgressNetworkPolicy Custom Resource (CR) object and specifying an IP address range in CIDR format or by specifying a DNS name. For example, you can allow one project access to a specified IP range but deny the same access to a different project. Or you can restrict application developers from updating from Python pip mirrors, and force updates to come only from approved sources.

Important

You must have OpenShift SDN configured to use either the network policy or multitenant modes to configure egress firewall policy.

If you use network policy mode, egress policy is compatible with only one policy per namespace and will not work with projects that share a network, such as global projects.

Caution

Egress firewall rules do not apply to traffic that goes through routers. Any user with permission to create a Route CR object can bypass egress network policy rules by creating a route that points to a forbidden destination.

8.3.1.1. Limitations of an egress firewall

An egress firewall has the following limitations:

  • No project can have more than one EgressNetworkPolicy object.
  • The default project cannot use egress network policy.
  • When using the OpenShift SDN default Container Network Interface (CNI) network provider in multitenant mode, the following limitations apply:

    • Global projects cannot use an egress firewall. You can make a project global by using the oc adm pod-network make-projects-global command.
    • Projects merged by using the oc adm pod-network join-projects command cannot use an egress firewall in any of the joined projects.

Violating any of these restrictions results in broken egress network policy for the project, and may cause all external network traffic to be dropped.

8.3.1.2. Matching order for egress network policy rules

The egress network policy rules are evaluated in the order that they are defined, from first to last. The first rule that matches an egress connection from a Pod applies. Any subsequent rules are ignored for that connection.

8.3.1.3. How Domain Name Server (DNS) resolution works

If you use DNS names in any of your egress firewall policy rules, proper resolution of the domain names is subject to the following restrictions:

  • Domain name updates are polled based on the TTL (time to live) value of the domain returned by the local non-authoritative servers.
  • The Pod must resolve the domain from the same local name servers when necessary. Otherwise the IP addresses for the domain known by the egress firewall controller and the Pod can be different. If the IP addresses for a host name differ, the egress firewall might not be enforced consistently.
  • Because the egress firewall controller and Pods asynchronously poll the same local name server, the Pod might obtain the updated IP address before the egress controller does, which causes a race condition. Due to this current limitation, domain name usage in EgressNetworkPolicy objects is only recommended for domains with infrequent IP address changes.
Note

The egress firewall always allows Pods access to the external interface of the node that the Pod is on for DNS resolution.

If you use domain names in your egress firewall policy and your DNS resolution is not handled by a DNS server on the local node, then you must add egress firewall rules that allow access to your DNS server’s IP addresses. if you are using domain names in your Pods.

8.3.2. EgressNetworkPolicy custom resource (CR) object

The following YAML describes an EgressNetworkPolicy CR object:

kind: EgressNetworkPolicy
apiVersion: v1
metadata:
  name: <name> 1
spec:
  egress: 2
    ...
1
Specify a name for your egress firewall policy.
2
Specify a collection of one or more egress network policy rules as described in the following section.

8.3.2.1. EgressNetworkPolicy rules

The following YAML describes an egress firewall rule object. The egress key expects an array of one or more objects.

egress:
- type: <type> 1
  to: 2
    cidrSelector: <cidr> 3
    dnsName: <dns-name> 4
1
Specify the type of rule. The value must be either Allow or Deny.
2
Specify a value for either the cidrSelector key or the dnsName key for the rule. You cannot use both keys in a rule.
3
Specify an IP address range in CIDR format.
4
Specify a domain name.

8.3.2.2. Example EgressNetworkPolicy CR object

The following example defines several egress firewall policy rules:

kind: EgressNetworkPolicy
apiVersion: v1
metadata:
  name: default-rules 1
spec:
  egress: 2
  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Allow
    to:
      dnsName: www.example.com
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0
1
The name for the policy object.
2
A collection of egress firewall policy rule objects.

8.3.3. Creating an egress firewall policy object

As a cluster administrator, you can create an egress firewall policy object for a project.

Important

If the project already has an EgressNetworkPolicy object defined, you must edit the existing policy to make changes to the egress firewall rules.

Prerequisites

  • A cluster that uses the OpenShift SDN default Container Network Interface (CNI) network provider plug-in.
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster as a cluster administrator.

Procedure

  1. Create a policy rule:

    1. Create a <policy-name>.yaml file where <policy-name> describes the egress policy rules.
    2. In the file you created, define an egress policy object.
  2. Enter the following command to create the policy object:

    $ oc create -f <policy-name>.yaml -n <project>

    In the following example, a new EgressNetworkPolicy object is created in a project named project1:

    $ oc create -f default-rules.yaml -n project1
    egressnetworkpolicy.network.openshift.io/default-rules created
  3. Optional: Save the <policy-name>.yaml so that you can make changes later.

8.4. Editing an egress firewall for a project

As a cluster administrator, you can modify network traffic rules for an existing egress firewall.

8.4.1. Editing an EgressNetworkPolicy object

As a cluster administrator, you can update the egress firewall for a project.

Prerequisites

  • A cluster using the OpenShift SDN network plug-in.
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster as a cluster administrator.

Procedure

To edit an existing egress network policy object for a project, complete the following steps:

  1. Find the name of the EgressNetworkPolicy object for the project. Replace <project> with the name of the project.

    $ oc get -n <project> egressnetworkpolicy
  2. Optional: If you did not save a copy of the EgressNetworkPolicy object when you created the egress network firewall, enter the following command to create a copy.

    $ oc get -n <project> \ 1
      egressnetworkpolicy <name> \ 2
      -o yaml > <filename>.yaml 3
    1
    Replace <project> with the name of the project
    2
    Replace <name> with the name of the object.
    3
    Replace <filename> with the name of the file to save the YAML.
  3. Enter the following command to replace the EgressNetworkPolicy object. Replace <filename> with the name of the file containing the updated EgressNetworkPolicy object.

    $ oc replace -f <filename>.yaml

8.4.2. EgressNetworkPolicy custom resource (CR) object

The following YAML describes an EgressNetworkPolicy CR object:

kind: EgressNetworkPolicy
apiVersion: v1
metadata:
  name: <name> 1
spec:
  egress: 2
    ...
1
Specify a name for your egress firewall policy.
2
Specify a collection of one or more egress network policy rules as described in the following section.

8.4.2.1. EgressNetworkPolicy rules

The following YAML describes an egress firewall rule object. The egress key expects an array of one or more objects.

egress:
- type: <type> 1
  to: 2
    cidrSelector: <cidr> 3
    dnsName: <dns-name> 4
1
Specify the type of rule. The value must be either Allow or Deny.
2
Specify a value for either the cidrSelector key or the dnsName key for the rule. You cannot use both keys in a rule.
3
Specify an IP address range in CIDR format.
4
Specify a domain name.

8.4.2.2. Example EgressNetworkPolicy CR object

The following example defines several egress firewall policy rules:

kind: EgressNetworkPolicy
apiVersion: v1
metadata:
  name: default-rules 1
spec:
  egress: 2
  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Allow
    to:
      dnsName: www.example.com
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0
1
The name for the policy object.
2
A collection of egress firewall policy rule objects.

8.5. Removing an egress firewall from a project

As a cluster administrator, you can remove an egress firewall from a project to remove all restrictions on network traffic from the project that leaves the OpenShift Container Platform cluster.

8.5.1. Removing an EgressNetworkPolicy object

As a cluster administrator, you can remove an egress firewall from a project.

Prerequisites

  • A cluster using the OpenShift SDN network plug-in.
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster as a cluster administrator.

Procedure

To remove an egress network policy object for a project, complete the following steps:

  1. Find the name of the EgressNetworkPolicy object for the project. Replace <project> with the name of the project.

    $ oc get -n <project> egressnetworkpolicy
  2. Enter the following command to delete the EgressNetworkPolicy object. Replace <project> with the name of the project and <name> with the name of the object.

    $ oc delete -n <project> egressnetworkpolicy <name>

8.6. Using multicast

8.6.1. About multicast

With IP multicast, data is broadcast to many IP addresses simultaneously.

Important

At this time, multicast is best used for low-bandwidth coordination or service discovery and not a high-bandwidth solution.

Multicast traffic between OpenShift Container Platform Pods is disabled by default. If you are using the OpenShift SDN default Container Network Interface (CNI) network provider plug-in, you can enable multicast on a per-project basis.

When using the OpenShift SDN network plug-in in networkpolicy isolation mode:

  • Multicast packets sent by a Pod will be delivered to all other Pods in the project, regardless of NetworkPolicy objects. Pods might be able to communicate over multicast even when they cannot communicate over unicast.
  • Multicast packets sent by a Pod in one project will never be delivered to Pods in any other project, even if there are NetworkPolicy objects that allow communication between the projects.

When using the OpenShift SDN network plug-in in multitenant isolation mode:

  • Multicast packets sent by a Pod will be delivered to all other Pods in the project.
  • Multicast packets sent by a Pod in one project will be delivered to Pods in other projects only if each project is joined together and multicast is enabled in each joined project.

8.6.2. Enabling multicast between Pods

You can enable multicast between Pods for your project.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  • Run the following command to enable multicast for a project:

    $ oc annotate netnamespace <namespace> \ 1
        netnamespace.network.openshift.io/multicast-enabled=true
    1
    The namespace for the project you want to enable multicast for.

8.6.3. Disabling multicast between Pods

You can disable multicast between Pods for your project.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  • Disable multicast by running the following command:

    $ oc annotate netnamespace <namespace> \ 1
        netnamespace.network.openshift.io/multicast-enabled-
    1
    The namespace for the project you want to disable multicast for.

8.7. Configuring network isolation using OpenShift SDN

When your cluster is configured to use the multitenant isolation mode for the OpenShift SDN CNI plug-in, each project is isolated by default. Network traffic is not allowed between Pods or services in different projects in multitenant isolation mode.

You can change the behavior of multitenant isolation for a project in two ways:

  • You can join one or more projects, allowing network traffic between Pods and services in different projects.
  • You can disable network isolation for a project. It will be globally accessible, accepting network traffic from Pods and services in all other projects. A globally accessible project can access Pods and services in all other projects.

Prerequisites

  • You must have a cluster configured to use the OpenShift SDN Container Network Interface (CNI) plug-in in multitenant isolation mode.

8.7.1. Joining projects

You can join two or more projects to allow network traffic between Pods and services in different projects.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  1. Use the following command to join projects to an existing project network:

    $ oc adm pod-network join-projects --to=<project1> <project2> <project3>

    Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option to specify projects based upon an associated label.

  2. Optional: Run the following command to view the pod networks that you have joined together:

    $ oc get netnamespaces

    Projects in the same pod-network have the same network ID in the NETID column.

8.7.2. Isolating a project

You can isolate a project so that Pods and services in other projects cannot access its Pods and services.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  • To isolate the projects in the cluster, run the following command:

    $ oc adm pod-network isolate-projects <project1> <project2>

    Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option to specify projects based upon an associated label.

8.7.3. Disabling network isolation for a project

You can disable network isolation for a project.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  • Run the following command for the project:

    $ oc adm pod-network make-projects-global <project1> <project2>

    Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option to specify projects based upon an associated label.

8.8. Configuring kube-proxy

The Kubernetes network proxy (kube-proxy) runs on each node and is managed by the Cluster Network Operator (CNO). kube-proxy maintains network rules for forwarding connections for endpoints associated with services.

8.8.1. About iptables rules synchronization

The synchronization period determines how frequently the Kubernetes network proxy (kube-proxy) syncs the iptables rules on a node.

A sync begins when either of the following events occurs:

  • An event occurs, such as service or endpoint is added to or removed from the cluster.
  • The time since the last sync exceeds the sync period defined for kube-proxy.

8.8.2. Modifying the kube-proxy configuration

You can modify the Kubernetes network proxy configuration for your cluster.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • Log in to a running cluster with the cluster-admin role.

Procedure

  1. Edit the Network.operator.openshift.io Custom Resource (CR) by running the following command:

    $ oc edit network.operator.openshift.io cluster
  2. Modify the kubeProxyConfig parameter in the CR with your changes to the kube-proxy configuration, such as in the following example CR:

    apiVersion: operator.openshift.io/v1
    kind: Network
    metadata:
      name: cluster
    spec:
      kubeProxyConfig:
        iptablesSyncPeriod: 30s
        proxyArguments:
          iptables-min-sync-period: ["30s"]
  3. Save the file and exit the text editor.

    The syntax is validated by the oc command when you save the file and exit the editor. If your modifications contain a syntax error, the editor opens the file and displays an error message.

  4. Run the following command to confirm the configuration update:

    $ oc get networks.operator.openshift.io -o yaml

    The command returns output similar to the following example:

    apiVersion: v1
    items:
    - apiVersion: operator.openshift.io/v1
      kind: Network
      metadata:
        name: cluster
      spec:
        clusterNetwork:
        - cidr: 10.128.0.0/14
          hostPrefix: 23
        defaultNetwork:
          type: OpenShiftSDN
        kubeProxyConfig:
          iptablesSyncPeriod: 30s
          proxyArguments:
            iptables-min-sync-period:
            - 30s
        serviceNetwork:
        - 172.30.0.0/16
      status: {}
    kind: List
  5. Optional: Run the following command to confirm that the Cluster Network Operator accepted the configuration change:

    $ oc get clusteroperator network
    NAME      VERSION     AVAILABLE   PROGRESSING   DEGRADED   SINCE
    network   4.1.0-0.9   True        False         False      1m

    The AVAILABLE field is True when the configuration update is applied successfully.

8.8.3. kube-proxy configuration parameters

You can modify the following kubeProxyConfig parameters:

Table 8.1. Parameters

ParameterDescriptionValuesDefault

iptablesSyncPeriod

The refresh period for iptables rules.

A time interval, such as 30s or 2m. Valid suffixes include s, m, and h and are described in the Go time package documentation.

30s

proxyArguments.iptables-min-sync-period

The minimum duration before refreshing iptables rules. This parameter ensures that the refresh does not happen too frequently.

A time interval, such as 30s or 2m. Valid suffixes include s, m, and h and are described in the Go time package

30s