Chapter 2. Planning a Distributed Compute Node (DCN) deployment

When you plan your DCN architecture, check that the technologies that you need are available and supported.

2.1. Considerations for storage on DCN architecture

The following features are not currently supported for DCN architectures:

  • Fast forward updates (FFU) on a distributed compute node architecture from Red Hat OpenStack Platform 13 to 16.
  • Non-hyperconverged storage nodes at edge sites.
  • Copying a volume snapshot between edge sites. You can work around this by creating an image from the volume and using glance to copy the image. After the image is copied, you can create a volume from it.
  • Migrating or retyping a volume between sites.
  • Ceph Rados Gateway (RGW) at the edge.
  • CephFS at the edge.
  • Instance high availability (HA) at the edge sites.
  • Live migration between edge sites or from the central location to edge sites. You can still live migrate instances within a site boundary.
  • RBD mirroring between sites.

Additionally, you must consider the following:

  • You must upload images to the central location before copying them to edge sites; a copy of each image must exist in the Image service (glance) at the central location.
  • Before you create an instance at an edge site, you must have a local copy of the image at that edge site.
  • You must use the RBD storage driver for the Image, Compute and Block Storage services.
  • For each site, assign a unique availability zone, and use the same value for the NovaComputeAvailabilityZone and CinderStorageAvailabilityZone parameters.

2.2. Considerations for networking on DCN architecture

The following features are not currently supported for DCN architectures:

  • Octavia
  • DHCP on DPDK nodes
  • Conntrack for TC Flower Hardware Offload

The ML2/OVN mechanism driver is available on DCN as a Technology Preview, and therefore using these solutions together is not fully supported by Red Hat. This feature should only be used with DCN for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.

Note

The ML2/OVN mechanism driver is fully supported outside of DCN environments.

The following networking technologies are supported with ML2/OVS:

  • DPDK without DHCP on the DPDK nodes
  • SR-IOV
  • TC Flower hardware offload, without conntrack
  • Neutron availability zones (AZs) with networker nodes at the edge, with on AZ per site
  • Routed provider networks

Additionally, you must consider the following:

  • Network latency: Balance the latency as measured in round-trip time (RTT), with the expected number of concurrent API operations to maintain acceptable performance. Maximum TCP/IP throughput is inversely proportional to RTT. You can mitigate some issues with high-latency connections with high bandwidth by tuning kernel TCP parameters.Contact Red Hat support if a cross-site communication exceeds 100 ms.
  • Network drop outs: If the edge site temporarily loses connection to the central site, then no OpenStack control plane API or CLI operations can be executed at the impacted edge site for the duration of the outage. For example, Compute nodes at the edge site are consequently unable to create a snapshot of an instance, issue an auth token, or delete an image. General OpenStack control plane API and CLI operations remain functional during this outage, and can continue to serve any other edge sites that have a working connection. Image type: You must use raw images when deploying a DCN architecture with Ceph storage.
  • Image sizing:

    • Overcloud node images - overcloud node images are downloaded from the central undercloud node. These images are potentially large files that will be transferred across all necessary networks from the central site to the edge site during provisioning.
    • Instance images: If there is no block storage at the edge, then the Image service images traverse the WAN during first use. The images are copied or cached locally to the target edge nodes for all subsequent use. There is no size limit for glance images. Transfer times vary with available bandwidth and network latency.

      If there is block storage at the edge, then the image is copied over the WAN asynchronously for faster boot times at the edge.

  • Provider networks: This is the recommended networking approach for DCN deployments. If you use provider networks at remote sites, then you must consider that the Networking service (neutron) does not place any limits or checks on where you can attach available networks. For example, if you use a provider network only in edge site A, you must ensure that you do not try to attach to the provider network in edge site B. This is because there are no validation checks on the provider network when binding it to a Compute node.
  • Site-specific networks: A limitation in DCN networking arises if you use networks that are specific to a certain site: When you deploy centralized neutron controllers with Compute nodes, there are no triggers in neutron to identify a certain Compute node as a remote node. Consequently, the Compute nodes receive a list of other Compute nodes and automatically form tunnels between each other; the tunnels are formed from edge to edge through the central site. If you use VXLAN or Geneve, every Compute node at every site forms a tunnel with every other Compute node and Controller node, whether or not they are local or remote. This is not an issue if you are using the same neutron networks everywhere. When you use VLANs, neutron expects that all Compute nodes have the same bridge mappings, and that all VLANs are available at every site.
  • Additional sites: If you need to expand from a central site to additional remote sites, you can use the openstack CLI on Red Hat OpenStack Platform director to add new network segments and subnets.
  • If edge servers are not pre-provisioned, you must configure DHCP relay for introspection and provisioning on routed segments.
  • Routing must be configured either on the cloud or within the networking infrastructure that connects each edge site to the hub. You should implement a networking design that allocates an L3 subnet for each Red Hat OpenStack Platform cluster network (external, internal API, and so on), unique to each site.

2.3. Storage topologies and roles at the edge

When you deploy Red Hat OpenStack platform with a distributed compute node architecture, you must decide if you need storage at the edge. Based on storage and performance needs, you can deploy each site with one of three configurations. Not all edge sites must have an identical configuration.

If block storage is not going to be deployed at the edge, you must follow the section of the document, xfref:deploying-the-edge-without-storage_deploy-edge-without-storage[]. If there is no block storage at the edge site:

  • Swift is used as a Glance backend
  • Compute nodes at the edge may only cache images.
  • Volume services like Cinder are not available at edge sites.

If you plan to deploy storage at the edge at any location, you must also deploy block storage at the central location. Follow the section of the document Section 5.2, “Deploying the central site with storage”. If there is block storage at the edge site:

  • Ceph RBD is used as a Glance backend
  • Images may be stored at edge sites
  • The Cinder volume service is available via the Ceph RBD driver.

The roles required for your deployment will differ based whether or not you deploy Block storage at the edge:

  • No Block storage is required at the edge:

    Compute
    When you deploy an edge location without block storage, use must use the traditional compute role.
  • Block Storage is required at the edge:

    DistributedComputeHCI

    This role includes the following:

    • Default compute services
    • Block Storage (cinder) volume service
    • Ceph Mon
    • Ceph Mgr
    • Ceph OSD
    • GlanceApiEdge
    • Etcd

      This role enables a hyperconverged deployment at the edge. You must use exactly three nodes when using the DistributedComputeHCI role.

    DistributedComputeHCIScaleOut
    This role includes the Ceph OSD service, which allows storage capacity to be scaled with compute resources when more nodes are added to the edge. This role also includes the HAproxyEdge service to redirect image download requests to the GlanceAPIEdge nodes at the edge site.
    DistributedComputeScaleOut
    If you want to scale compute resources at the edge without storage, you can use the DistributedComputeScaleOut role.