Chapter 2. Installing and deploying OpenShift AI

Red Hat OpenShift AI is a platform for data scientists and developers of artificial intelligence (AI) applications. It provides a fully supported environment that lets you rapidly develop, train, test, and deploy machine learning models on-premises and/or in the public cloud.

OpenShift AI is provided as a managed cloud service add-on for Red Hat OpenShift or as self-managed software that you can install on-premise or in the public cloud on OpenShift.

For information about installing OpenShift AI as self-managed software on your OpenShift cluster in a connected or a disconnected environment, see Product Documentation for Red Hat OpenShift AI Self-Managed.

Important

Data Science Pipelines (DSP) 2.0 contains an installation of Argo Workflows. OpenShift AI does not support direct customer usage of this installation of Argo Workflows. To install OpenShift AI with DSP 2.0, ensure that no separate installation of Argo Workflows exists on your cluster.

There are two deployment options for Red Hat OpenShift AI as a managed cloud service add-on:

  • OpenShift Dedicated with a Customer Cloud Subscription on Amazon Web Services or Google Cloud Platform

    OpenShift Dedicated is a complete OpenShift Container Platform cluster provided as a cloud service, configured for high availability, and dedicated to a single customer. OpenShift Dedicated is professionally managed by Red Hat and hosted on Amazon Web Services (AWS) or Google Cloud Platform (GCP). The Customer Cloud Subscription (CCS) model allows Red Hat to deploy and manage clusters into a customer’s AWS or GCP account. Contact your Red Hat account manager to get OpenShift Dedicated through a CCS.

  • Red Hat OpenShift Service on AWS (ROSA)

    ROSA is a fully-managed, turnkey application platform that allows you to focus on delivering value to your customers by building and deploying applications. You subscribe to the service directly from your AWS account.

Installing OpenShift AI as a managed cloud service involves the following high-level tasks:

  1. Confirm that your OpenShift cluster meets all requirements.
  2. Configure an identity provider for your OpenShift cluster.
  3. Add administrative users for your OpenShift cluster.
  4. Subscribe to the Red Hat OpenShift AI Add-on.

    For OpenShift Dedicated with a CCS for AWS or GCP, get a subscription through Red Hat.

    For ROSA, get a subscription through the AWS Marketplace.

  5. Install the Red Hat OpenShift AI Add-on.
  6. Access the OpenShift AI dashboard.
  7. Optionally, enable graphics processing units (GPUs) in OpenShift AI to ensure that your data scientists can use compute-heavy workloads in their models.

2.1. Requirements for OpenShift AI

You must meet the following requirements before you can install OpenShift AI on your Red Hat OpenShift Dedicated or Red Hat OpenShift Service on Amazon Web Services (ROSA) cluster.

  • A subscription for Red Hat OpenShift Dedicated or a subscription for ROSA

    You can deploy Red Hat OpenShift Dedicated on your Amazon Web Services (AWS) or Google Cloud Platform (GCP) account by using the Customer Cloud Subscription on AWS or Customer Cloud Subscription on GCP model. Note that while Red Hat provides an option to install OpenShift Dedicated on a Red Hat cloud account, if you want to install OpenShift AI then you must install OpenShift Dedicated on your own cloud account.

    Contact your Red Hat account manager to purchase a new Red Hat OpenShift Dedicated subscription. If you do not yet have an account manager, complete the form at https://cloud.redhat.com/products/dedicated/contact/ to request one.

    You can subscribe to Red Hat OpenShift Service on AWS (ROSA) directly from your AWS account or by contacting your Red Hat account manager.

  • A Red Hat customer account

    Go to OpenShift Cluster Manager (http://console.redhat.com/openshift) and log in or register for a new account.

  • Cluster administrator access to your OpenShift cluster

    Use an existing cluster or create a new cluster by following the steps in the relevant documentation:

  • An OpenShift Dedicated or ROSA cluster configuration that meets the following configuration requirements.

    At least 2 worker nodes with at least 8 CPUs and 32 GiB RAM available for OpenShift AI to use when you install the Add-on. If this requirement is not met, the installation process fails to start and an error is displayed.

    When you create a new cluster, select m6a.2xlarge for the computer node instance type to satisfy the requirements.

    For an existing ROSA cluster, you can get the compute node instance type by using this command:

    rosa list machinepools --cluster=cluster-name

    You cannot alter a cluster’s compute node instance type, but you can add an additional machine pool or modify the default pool to meet the minimum requirements. However, the minimum resource requirements must be met by a single machine pool in the cluster.

    For more information, see the relevant documentation:

  • For a ROSA cluster, select an access management strategy

    For installing OpenShift AI on a ROSA cluster, decide whether you want to install on a ROSA cluster that uses AWS Security Token Service (STS) or one that uses AWS Identity and Access Management (IAM) credentials. See Install ROSA Classic clusters for advice on deploying a ROSA cluster with or without AWS STS.

  • Install KServe dependencies

    • To support the KServe component, which is used by the single-model serving platform to serve large models, you must also install Operators for Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh and perform additional configuration. For more information, see Serving large models.
    • If you want to add an authorization provider for the single-model serving platform, you must install the Red Hat - Authorino Operator. For information, see Adding an authorization provider for the single-model serving platform.

2.2. Configuring an identity provider for your OpenShift cluster

Configure an identity provider for your OpenShift Dedicated or Red Hat OpenShift Service on Amazon Web Services (ROSA) cluster to manage users and groups.

Red Hat OpenShift AI supports the same authentication systems as Red Hat OpenShift Dedicated and ROSA. Check the appropriate documentation for your cluster for more information.

Important

Adding more than one OpenShift Identity Provider can create problems when the same user name exists in multiple providers.

When mappingMethod is set to claim (the default mapping method for identity providers) and multiple providers have credentials associated with the same user name, the first provider used to log in to OpenShift is the one that works for that user, regardless of the order in which identity providers are configured.

For more information about mapping methods, see Identity provider parameters in OpenShift Dedicated or Identity provider parameters in ROSA.

Prerequisites

Procedure

  1. Log in to OpenShift Cluster Manager (https://console.redhat.com/openshift/).
  2. Click Clusters. The Clusters page opens.
  3. Click the name of the cluster to configure.
  4. Click the Access control tab.
  5. Click Identity providers.
  6. Click Add identity provider.

    1. Select your provider from the Identity Provider list.
    2. Complete the remaining fields relevant to the identity provider that you selected. For more information, see Configuring identity providers in OpenShift Dedicated or Configuring identity providers in ROSA.
  7. Click Confirm.

Verification

  • The configured identity providers are visible on the Access control tab of the Cluster details page.

2.3. Adding administrative users

Before you can install and configure OpenShift AI for your data scientist users, you must define administrative users. Only administrative users can install and configure OpenShift AI.

Prerequisites

  • Credentials for OpenShift Cluster Manager (https://console.redhat.com/openshift/).
  • An existing OpenShift Dedicated or Red Hat OpenShift Service on AWS (ROSA) cluster with an identity provider configured.

Procedure

  1. Log in to OpenShift Cluster Manager (https://console.redhat.com/openshift/).
  2. Click Clusters. The Clusters page opens.
  3. Click the name of the cluster to configure.
  4. Click the Access control tab.
  5. Click Cluster Roles and Access.
  6. Under Cluster administrative users click the Add user button.

    The Add cluster user popover appears.

  7. Enter the user name in the User ID field.
  8. Select an appropriate Group for the user.

    Important

    If this user needs to use existing groups in an identity provider to control OpenShift AI access, select cluster-admins.

    For more information about these user types, see Managing administration roles and users in the OpenShift Dedicated documentation or Default cluster roles in the ROSA documentation.

  9. Click Add user.

Verification

  • The user name and selected group are visible in the list of Cluster administrative users.

2.4. Subscribing to the Red Hat OpenShift AI Cloud Service

You can subscribe to the Red Hat OpenShift AI managed cloud service in the following ways:

  • Subscribe through Red Hat if you have a Red Hat OpenShift Dedicated cluster deployed with a Customer Cloud Subscription (CCS) on Amazon Web Services (AWS) or Google Cloud Platform (GCP).
  • Subscribe through the AWS Marketplace if you have a Red Hat OpenShift Service on AWS (ROSA) cluster.
Note

You can also purchase Red Hat OpenShift AI as self-managed software. To purchase a new subscription, contact your Red Hat account manager. If you do not yet have an account manager, complete the form at https://www.redhat.com/en/contact to request one.

2.4.1. Subscribing to the OpenShift AI managed cloud service on AWS or GCP

For a Red Hat OpenShift Dedicated cluster that is deployed on AWS or GCP, contact your Red Hat account manager to purchase a new subscription. If you do not yet have an account manager, complete the form at https://cloud.redhat.com/products/dedicated/contact/ to request one.

Prerequisite

  • You have worked with Red Hat Sales to enable a private offer of OpenShift AI, follow these steps to accept your offer and deploy the solution.

Procedure

  1. Visit your Private Offer with the URL link provided by your Red Hat Sales representative.
  2. Click Accept Terms to subscribe to the AMI Private Offer named OpenShift AI from AWS Marketplace.
  3. After accepting the offer terms, click Continue to Configuration.

2.4.2. Subscribing to the OpenShift AI managed cloud service on Red Hat OpenShift Service on AWS (ROSA)

For a ROSA cluster, you can subscribe to the OpenShift AI managed cloud service through the Amazon Web Services (AWS) Marketplace.

Prerequisites

  • Access to a ROSA cluster, including permissions to view and install add-ons.
  • An AWS account with permission to view and subscribe to offerings in the AWS marketplace.

Procedure

  1. In the AWS Console, navigate to the AWS Marketplace. For example:

    1. Click the help icon and then select Getting Started Resource Center.
    2. Select AWS Marketplace > Browse AWS Marketplace.
  2. In the top Search field, type: Red Hat OpenShift AI.
  3. Select one of the two options depending on the geographical location of the billing address for your AWS account (note that this location might differ from the geographical location of the cluster):

    • Europe, the Middle East, and Africa (EMEA region)
    • North America and regions outside EMEA
  4. Click Continue to Subscribe.
  5. Click Continue to Configuration and then select the appropriate fulfillment options. Note that some selectors might have only one option.
  6. Click Continue to Launch.
  7. Link your AWS account with your Red Hat account to complete your registration:

    1. In the AWS Marketplace console, navigate to the Manage Subscriptions page.
    2. On the Red Hat OpenShift AI tile, click Set up product.
    3. On the top banner, click Set up account.

      This link takes you to the Red Hat Hybrid console.

    4. If you are not already logged in, log in.
    5. Review and then accept the terms and agreements.
    6. Click Connect accounts.

Verification

The Data Science product page opens.

2.5. Installing OpenShift AI on your OpenShift cluster

You can use Red Hat OpenShift Cluster Manager to install Red Hat OpenShift AI as an Add-on to your Red Hat OpenShift cluster.

Prerequisites

Note

For information about the lifecycle associated with Red Hat OpenShift AI, see Red Hat OpenShift AI Life Cycle.

Important

Data Science Pipelines (DSP) 2.0 contains an installation of Argo Workflows. OpenShift AI does not support direct customer usage of this installation of Argo Workflows. To install OpenShift AI with DSP 2.0, ensure that no separate installation of Argo Workflows exists on your cluster.

Procedure

  1. Log in to OpenShift Cluster Manager (https://console.redhat.com/openshift/).
  2. Click Clusters.

    The Clusters page opens.

  3. Click the name of the cluster you want to install OpenShift AI on.

    The Details page for the cluster opens.

  4. Click the Add-ons tab and locate the Red Hat OpenShift AI tile.

    Note

    If there is a Prerequisites not met warning message, click the Prerequisites tab. Note down the error message. If the error message states that you require a new machine pool, or that more resources are required, take the appropriate action to resolve the problem. You might need to add more resources to your cluster, or increase the size of your default machine pool. To increase your cluster’s resources, contact your infrastructure administrator. For more information about increasing the size of your machine pool, see Allocating additional resources to OpenShift AI users.

  5. Select a Subscription type:

    If you obtained your RHODS subscription through your Red Hat account manager, select Standard and then skip to Step 7.

    If you obtained your RHODS subscription directly from the AWS Marketplace, select Marketplace and then continue to Step 6.

  6. For a Marketplace subscription, select your AWS account number from the list.

    Note

    If your AWS account number is not in the list, you might need to link your Red Hat and AWS accounts, as described in Subscribing to the OpenShift AI managed cloud service on Red Hat OpenShift Service on AWS (ROSA).

  7. Click Install. The Configure Red Hat OpenShift AI pane appears.
  8. In the Notification email field, enter any email addresses that you want to receive important alerts about the state of Red Hat OpenShift AI, such as outage alerts.
  9. Click Install.

Verification

  • In OpenShift Cluster Manager, on the Add-ons tab for the cluster, confirm that the OpenShift AI tile shows one of the following states:

    • Installing - installation is in progress; wait for this to change to Installed. This takes around 30 minutes.
    • Installed - installation is complete; verify that the View in console button is visible.
  • In OpenShift Dedicated, click HomeProjects and confirm that the following project namespaces are visible and listed as Active:

    • redhat-ods-applications
    • redhat-ods-monitoring
    • redhat-ods-operator
    • rhods-notebooks

2.6. Installing Red Hat OpenShift AI components by using the web console

The following procedure shows how to use the OpenShift web console to install specific components of Red Hat OpenShift AI on your cluster.

Important

When you install Red Hat OpenShift AI as an add-on to your OpenShift cluster, the install process automatically creates a default DataScienceCluster object. The following procedure describes how to configure the DataScienceCluster object to install Red Hat OpenShift AI components as part of a new installation.

If you upgraded from version 1 of OpenShift AI (previously OpenShift Data Science), the upgrade process also automatically creates a default DataScienceCluster object. If you upgraded from a previous minor version, the upgrade process uses the settings from the previous version’s DataScienceCluster object. To inspect the DataScienceCluster object and change the installation status of Red Hat OpenShift AI components, see Updating the installation status of Red Hat OpenShift AI components by using the web console.

Prerequisites

  • Red Hat OpenShift AI is installed as an add-on to your Red Hat OpenShift cluster.
  • You have cluster administrator privileges for your OpenShift cluster.

Procedure

  1. Log in to the OpenShift web console as a cluster administrator.
  2. In the web console, click OperatorsInstalled Operators and then click the Red Hat OpenShift AI Operator.
  3. Configure the DataScienceCluster object to install OpenShift AI components by performing the following actions:

    1. Click the Data Science Cluster tab.
    2. Click the default-dsc object.
    3. Select the YAML tab.

      An embedded YAML editor opens showing a default custom resource (CR) for the DataScienceCluster object.

    4. In the spec.components section of the CR, for each OpenShift AI component shown, set the value of the managementState field to either Managed or Removed. These values are defined as follows:

      Managed
      The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
      Removed
      The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
      Important
      • To learn how to install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
      • To learn how to configure the distributed workloads feature that uses the CodeFlare and KubeRay components, see Configuring distributed workloads.
  4. Click Save.

Verification

  • Confirm that there is a running pod for each component:

    1. In the OpenShift web console, click WorkloadsPods.
    2. In the Project list at the top of the page, select redhat-ods-applications.
    3. In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
  • Confirm the status of all installed components:

    1. In the OpenShift web console, click OperatorsInstalled Operators.
    2. Click the Red Hat OpenShift AI Operator.
    3. Click the Data Science Cluster tab and select the DataScienceCluster object called default-dsc.
    4. Select the YAML tab.
    5. In the installedComponents section, confirm that the components you installed have a status value of true.

      Note

      If a component shows with the component-name: {} format in the spec.components section of the CR, the component is not installed.

2.7. Troubleshooting common installation problems

If you are experiencing difficulties installing the Red Hat OpenShift AI Add-on, read this section to understand what could be causing the problem, and how to resolve the problem.

If you cannot see the problem here or in the release notes, contact Red Hat Support.

2.7.1. The Red Hat OpenShift AI Operator cannot be retrieved from the image registry

Problem

When attempting to retrieve the Red Hat OpenShift AI Operator from the image registry, an Failure to pull from quay error message appears. The Red Hat OpenShift AI Operator might be unavailable for retrieval in the following circumstances:

  • The image registry is unavailable.
  • There is a problem with your network connection.
  • Your cluster is not operational and is therefore unable to retrieve the image registry.

Diagnosis

Check the logs in the Events section in OpenShift Dedicated for further information about the Failure to pull from quay error message.

Resolution

  • To resolve this issue, contact Red Hat support.

2.7.2. OpenShift AI cannot be installed due to insufficient cluster resources

Problem

When attempting to install OpenShift AI, an error message appears stating that installation prerequisites have not been met.

Diagnosis

  1. Log in to OpenShift Cluster Manager (https://console.redhat.com/openshift/).
  2. Click Clusters.

    The Clusters page opens.

  3. Click the name of the cluster you want to install OpenShift AI on.

    The Details page for the cluster opens.

  4. Click the Add-ons tab and locate the Red Hat OpenShift AI tile.
  5. Click Install. The Configure Red Hat OpenShift AI pane appears.
  6. If the installation fails, click the Prerequisites tab.
  7. Note down the error message. If the error message states that you require a new machine pool, or that more resources are required, take the appropriate action to resolve the problem.

Resolution

  • You might need to add more resources to your cluster, or increase the size of your machine pool. To increase your cluster’s resources, contact your infrastructure administrator. For more information about increasing the size of your machine pool, see Nodes and Allocating additional resources to OpenShift AI users.

2.7.3. The dedicated-admins Role-based access control (RBAC) policy cannot be created

Problem

The Role-based access control (RBAC) policy for the dedicated-admins group in the target project cannot be created. This issue occurs in unknown circumstances.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Attempt to create the RBAC policy for dedicated admins group in $target_project failed. error message.

Resolution

  • Contact Red Hat support.

2.7.4. OpenShift AI does not install on unsupported infrastructure

Problem

Customer deploying on an environment not documented as being supported by the RHODS operator.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Deploying on $infrastructure, which is not supported. Failing Installation error message.

Resolution

Before proceeding with a new installation, ensure that you have a fully supported environment on which to install OpenShift AI. For more information, see Requirements for OpenShift AI.

2.7.5. The creation of the OpenShift AI Custom Resource (CR) fails

Problem

During the installation process, the OpenShift AI Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Attempt to create the ODH CR failed. error message.

Resolution

Contact Red Hat support.

2.7.6. The creation of the OpenShift AI Notebooks Custom Resource (CR) fails

Problem

During the installation process, the OpenShift AI Notebooks Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Attempt to create the RHODS Notebooks CR failed. error message.

Resolution

Contact Red Hat support.

2.7.7. The Dead Man’s Snitch operator’s secret does not get created

Problem

An issue with Managed Tenants SRE automation process causes the Dead Man’s Snitch operator’s secret to not get created.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Dead Man Snitch secret does not exist. error message.

Resolution

Contact Red Hat support.

2.7.8. The PagerDuty secret does not get created

Problem

An issue with Managed Tenants SRE automation process causes the PagerDuty’s secret to not get created.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Pagerduty secret does not exist error message.

Resolution

Contact Red Hat support.

2.7.9. The SMTP secret does not exist

Problem

An issue with Managed Tenants SRE automation process causes the SMTP secret to not get created.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: SMTP secret does not exist error message.

Resolution

Contact Red Hat support.

2.7.10. The ODH parameter secret does not get created

Problem

An issue with the OpenShift AI Add-on’s flow could result in the ODH parameter secret to not get created.

Diagnosis

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Click WorkloadsPods.
  3. Set the Project to All Projects or redhat-ods-operator.
  4. Click the rhods-operator-<random string> pod.

    The Pod details page appears.

  5. Click Logs.
  6. Select rhods-deployer from the drop-down list
  7. Check the log for the ERROR: Addon managed odh parameter secret does not exist. error message.

Resolution

Contact Red Hat support.

2.7.11. Data Science Pipelines are not enabled after installing OpenShift AI 2.9 or later due to existing Argo Workflows resources

Problem

After installing OpenShift AI 2.9 with an Argo Workflows installation that is not installed by OpenShift AI on your cluster, Data Science Pipelines are not enabled despite the datasciencepipelines component being enabled in the DataScienceCluster object.

Diagnosis

After you install OpenShift AI 2.9 or later, the Data Science Pipelines tab is not visible on the OpenShift AI dashboard navigation menu.

Resolution

Delete the separate installation of Argo workflows on your cluster. After you have removed any Argo Workflows resources that are not created by OpenShift AI from your cluster, Data Science Pipelines will be enabled automatically.