Chapter 5. Creating a project workbench

To examine and work with models in an isolated area, you can create a workbench. You can use this workbench to create a Jupyter notebook from an existing notebook container image to access its resources and properties. For data science projects that require data retention, you can add container storage to the workbench you are creating. If you require extra power for use with large datasets, you can assign accelerators to your workbench to optimize performance.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you use specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have created a data science project that you can add a workbench to.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to add the workbench to.

    A project details page opens.

  3. Click the Workbenches tab.
  4. Click Create workbench.

    The Create workbench page opens.

  5. Configure the properties of the workbench you are creating.

    1. In the Name field, enter a name for your workbench.
    2. Optional: In the Description field, enter a description to define your workbench.
    3. In the Notebook image section, complete the fields to specify the notebook image to use with your workbench.

      1. From the Image selection list, select a notebook image.
    4. In the Deployment size section, specify the size of your deployment instance.

      1. From the Container size list, select a container size for your server.
      2. Optional: From the Accelerator list, select an accelerator.
      3. If you selected an accelerator in the preceding step, specify the number of accelerators to use.
    5. Optional: Select and specify values for any new environment variables.
  1. Configure the storage for your OpenShift AI cluster.

    1. Select Create new persistent storage to create storage that is retained after you log out of OpenShift AI. Complete the relevant fields to define the storage.
    2. Select Use existing persistent storage to reuse existing storage and select the storage from the Persistent storage list.
  2. To use a data connection, in the Data connections section, select the Use a data connection checkbox.

    • Create a new data connection as follows:

      1. Select Create new data connection.
      2. In the Name field, enter a unique name for the data connection.
      3. In the Access key field, enter the access key ID for the S3-compatible object storage provider.
      4. In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
      5. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
      6. In the Region field, enter the default region of your S3-compatible object storage account.
      7. In the Bucket field, enter the name of your S3-compatible object storage bucket.
    • Use an existing data connection as follows:

      1. Select Use existing data connection.
      2. From the Data connection list, select a data connection that you previously defined.
    1. Click Create workbench.

Verification

  • The workbench that you created appears on the Workbenches tab for the project.
  • Any cluster storage that you associated with the workbench during the creation process appears on the Cluster storage tab for the project.
  • The Status column on the Workbenches tab displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.

5.1. Launching Jupyter and starting a notebook server

Launch Jupyter and start a notebook server to start working with your notebooks. If you require extra power for use with large datasets, you can assign accelerators to your notebook server to optimize performance.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • You know the names and values you want to use for any environment variables in your notebook server environment, for example, AWS_SECRET_ACCESS_KEY.
  • If you want to work with a large data set, work with your administrator to proactively increase the storage capacity of your notebook server. If applicable, also consider assigning accelerators to your notebook server.

Procedure

  1. Locate the Jupyter tile on the Enabled applications page.
  2. Click Launch application.

    If you see an Access permission needed message, you are not in the default user group or the default administrator group for OpenShift AI. Ask your administrator to add you to the correct group by using Adding users.

    If you have not previously authorized the jupyter-nb-<username> service account to access your account, the Authorize Access page appears prompting you to provide authorization. Inspect the permissions selected by default, and click the Allow selected permissions button.

    If you credentials are accepted, the Notebook server control panel opens displaying the Start a notebook server page.

  3. Start a notebook server.

    This is not required if you have previously opened Jupyter.

    1. In the Notebook image section, select the notebook image to use for your server.
    2. If the notebook image contains multiple versions, select the version of the notebook image from the Versions section.

      Note

      When a new version of a notebook image is released, the previous version remains available and supported on the cluster. This gives you time to migrate your work to the latest version of the notebook image.

    3. From the Container size list, select a suitable container size for your server.
    4. Optional: From the Accelerator list, select an accelerator.
    5. If you selected an accelerator in the preceding step, specify the number of accelerators to use.

      Important

      Using accelerators is only supported with specific notebook images. For GPUs, only the PyTorch, TensorFlow, and CUDA notebook images are supported. For Habana Gaudi devices, only the HabanaAI notebook image is supported. In addition, you can only specify the number of accelerators required for your notebook server if accelerators are enabled on your cluster. To learn how to enable GPU support, see Enabling GPU support in OpenShift AI.

    6. Optional: Select and specify values for any new Environment variables.

      The interface stores these variables so that you only need to enter them once. Example variable names for common environment variables are automatically provided for frequently integrated environments and frameworks, such as Amazon Web Services (AWS).

      Important

      Select the Secret checkbox for variables with sensitive values that must remain private, such as passwords.

    7. Optional: Select the Start server in current tab checkbox if necessary.
    8. Click Start server.

      The Starting server progress indicator appears. Click Expand event log to view additional information about the server creation process. Depending on the deployment size and resources you requested, starting the server can take up to several minutes. Click Cancel to cancel the server creation.

      After the server starts, you see one of the following behaviors:

      • If you previously selected the Start server in current tab checkbox, the JupyterLab interface opens in the current tab of your web browser.
      • If you did not previously select the Start server in current tab checkbox, the Starting server dialog box prompts you to open the server in a new browser tab or in the current browser tab.

        The JupyterLab interface opens according to your selection.

Verification

  • The JupyterLab interface opens.

Troubleshooting

  • If you see the "Unable to load notebook server configuration options" error message, contact your administrator so that they can review the logs associated with your Jupyter pod and determine further details about the problem.

5.2. Options for notebook server environments

When you start Jupyter for the first time, or after stopping your notebook server, you must select server options in the Start a notebook server wizard so that the software and variables that you expect are available on your server. This section explains the options available in the Start a notebook server wizard in detail.

The Start a notebook server page consists of the following sections:

Notebook image

Specifies the container image that your notebook server is based on. Different notebook images have different packages installed by default. If the notebook image has multiple versions available, you can select the notebook image version to use from the Versions section.

Note

Notebook images are supported for a minimum of one year. Major updates to preconfigured notebook images occur about every six months. Therefore, two supported notebook image versions are typically available at any given time. Legacy notebook image versions, that is, not the two most recent versions, might still be available for selection. Legacy image versions include a label that indicates the image is out-of-date.

Versions 1.2 and 2023.1 of notebook images are no longer supported. Notebooks that are already running on versions 1.2 or 2023.1 of an image will continue to work normally, but they are not available to select for new users or notebooks.

If you want to use OpenShift AI with Data Science Pipelines (DSP) 2.0, you must use notebook image versions 2024.1 or later.

To use the latest package versions, Red Hat recommends that you use the most recently added notebook image.

After you start a notebook image, you can check which Python packages are installed on your notebook server and which version of the package you have by running the pip tool in a notebook cell.

The following table shows the package versions used in the available notebook images.

Important

Notebook images denoted with (Technology Preview) in this table are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using Technology Preview features in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Table 5.1. Notebook image options

Image nameImage versionPreinstalled packages

CUDA

2024.1 (Recommended)

  • CUDA 12.1
  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5

2023.2

  • CUDA 11.8
  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5

2023.1 (Deprecated)

  • CUDA 11.8
  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5

Minimal Python (default)

2024.1 (Recommended)

  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5

2023.2

  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5

2023.1 (Deprecated)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5

PyTorch

2024.1 (Recommended)

  • CUDA 12.1
  • Python 3.9
  • PyTorch 2.2
  • JupyterLab 3.6
  • Notebook 6.5
  • TensorBoard 2.16
  • Boto3 1.34
  • Kafka-Python 2.0
  • Kfp 2.7
  • Matplotlib 3.8
  • Numpy 1.26
  • Pandas 2.2
  • Scikit-learn 1.4
  • SciPy 1.12
  • ODH-Elyra 3.16
  • PyMongo 4.6
  • Pyodbc 5.1
  • Codeflare-SDK 0.16
  • Sklearn-onnx 1.16
  • Psycopg 3.1
  • MySQL Connector/Python 8.3

2023.2

  • CUDA 11.8
  • Python 3.9
  • PyTorch 2.0
  • JupyterLab 3.6
  • Notebook 6.5
  • TensorBoard 2.13
  • Boto3 1.28
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.3
  • SciPy 1.11
  • Elyra 3.15
  • PyMongo 4.5
  • Pyodbc 4.0
  • Codeflare-SDK 0.12
  • Sklearn-onnx 1.15
  • Psycopg 3.1
  • MySQL Connector/Python 8.0

2023.1 (Deprecated)

  • CUDA 11.8
  • Python 3.9
  • PyTorch 1.13
  • JupyterLab 3.5
  • Notebook 6.5
  • TensorBoard 2.11
  • Boto3 1.26
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10
  • Elyra 3.15

Standard Data Science

2024.1 (Recommended)

  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5
  • Boto3 1.34
  • Kafka-Python 2.0
  • Kfp 2.7
  • Matplotlib 3.8
  • Pandas 2.2
  • Numpy 1.26
  • Scikit-learn 1.4
  • SciPy 1.12
  • ODH-Elyra 3.16
  • PyMongo 4.6
  • Pyodbc 5.1
  • Codeflare-SDK 0.16
  • Sklearn-onnx 1.16
  • Psycopg 3.1
  • MySQL Connector/Python 8.3

2023.2

  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5
  • Boto3 1.28
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Pandas 1.5
  • Numpy 1.24
  • Scikit-learn 1.3
  • SciPy 1.11
  • Elyra 3.15
  • PyMongo 4.5
  • Pyodbc 4.0
  • Codeflare-SDK 0.12
  • Sklearn-onnx 1.15
  • Psycopg 3.1
  • MySQL Connector/Python 8.0

2023.1 (Deprecated)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • Boto3 1.26
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10
  • Elyra 3.15

TensorFlow

2024.1 (Recommended)

  • CUDA 12.1
  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5
  • TensorFlow 2.15
  • TensorBoard 2.15
  • Boto3 1.34
  • Kafka-Python 2.0
  • Kfp 2.5
  • Matplotlib 3.8
  • Numpy 1.26
  • Pandas 2.2
  • Scikit-learn 1.4
  • SciPy 1.12
  • ODH-Elyra 3.16
  • PyMongo 4.6
  • Pyodbc 5.1
  • Codeflare-SDK 0.16
  • Sklearn-onnx 1.16
  • Psycopg 3.1
  • MySQL Connector/Python 8.3

2023.2

  • CUDA 11.8
  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5
  • TensorFlow 2.13
  • TensorBoard 2.13
  • Boto3 1.28
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.3
  • SciPy 1.11
  • Elyra 3.15
  • PyMongo 4.5
  • Pyodbc 4.0
  • Codeflare-SDK 0.12
  • Sklearn-onnx 1.15
  • Psycopg 3.1
  • MySQL Connector/Python 8.0

2023.1 (Deprecated)

  • CUDA 11.8
  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • TensorFlow 2.11
  • TensorBoard 2.11
  • Boto3 1.26
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10
  • Elyra 3.15

TrustyAI

2024.1 (Recommended)

  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5
  • TrustyAI 0.5
  • Boto3 1.34
  • Kafka-Python 2.0
  • Kfp 2.7
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.4
  • SciPy 1.12
  • ODH-Elyra 3.16
  • PyMongo 4.6
  • Pyodbc 5.1
  • Codeflare-SDK 0.16
  • Sklearn-onnx 1.16
  • Psycopg 3.1
  • MySQL Connector/Python 8.3

2023.2

  • Python 3.9
  • JupyterLab 3.6
  • Notebook 6.5
  • TrustyAI 0.3
  • Boto3 1.28
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.3
  • SciPy 1.11
  • Elyra 3.15
  • PyMongo 4.5
  • Pyodbc 4.0
  • Codeflare-SDK 0.12
  • Sklearn-onnx 1.15
  • Psycopg 3.1
  • MySQL Connector/Python 8.0

2023.1 (Deprecated)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • TrustyAI 0.3
  • Boto3 1.26
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10
  • Elyra 3.15

HabanaAI

2024.1 (Recommended)

  • Python 3.8
  • Habana 1.13
  • JupyterLab 3.6
  • Boto3 1.34
  • Kafka-Python 2.0
  • Kfp 2.7
  • Matplotlib 3.7
  • Numpy 1.23
  • Pandas 2.0
  • Scikit-learn 1.3
  • Scipy 1.10
  • TensorFlow 2.13
  • PyTorch 2.1
  • ODH-Elyra v3.16

2023.2

  • Python 3.8
  • Habana 1.10
  • JupyterLab 3.5
  • TensorFlow 2.12
  • Boto3 1.26
  • Kafka-Python 2.0
  • Kfp-tekton 1.5
  • Matplotlib 3.6
  • Numpy 1.23
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10
  • PyTorch 2.0
  • Elyra 3.15

code-server (Technology Preview)

2024.1 (Recommended)

  • Python 3.9
  • Boto3 1.29
  • Kafka-Python 2.0
  • Matplotlib 3.8
  • Numpy 1.26
  • Pandas 2.1
  • Plotly 5.18
  • Scikit-learn 1.3
  • Scipy 1.11
  • Sklearn-onnx 1.15
  • Ipykernel 6.26
  • (code-server plugin) Python 2024.2.1
  • (code-server plugin) Jupyter 2023.9.100

2023.2

  • Python 3.9
  • Boto3 1.29
  • Kafka-Python 2.0
  • Matplotlib 3.8
  • Numpy 1.26
  • Pandas 2.1
  • Plotly 5.18
  • Scikit-learn 1.3
  • Scipy 1.11
  • Sklearn-onnx 1.15
  • Ipykernel 6.26
  • (code-server plugin) Python 2023.14.0
  • (code-server plugin) Jupyter 2023.3.100

RStudio Server (Technology preview)

2024.1 (Recommended)

  • Python 3.9
  • R 4.3

Disclaimer:
Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through https://rstudio.org/ and is subject to their licensing terms. Review their licensing terms before you use this sample workbench.

CUDA - RStudio Server (Technology preview)

2024.1 (Recommended)

  • Python 3.9
  • CUDA 12.1
  • R 4.3
Important

Disclaimer:
Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through https://rstudio.org/ and is subject to their licensing terms. Review their licensing terms before you use this sample workbench.

The CUDA - RStudio Server notebook image contains NVIDIA CUDA technology. CUDA licensing information is available at https://docs.nvidia.com/cuda/. Review their licensing terms before you use this sample workbench.

Deployment size

specifies the compute resources available on your notebook server.

Container size controls the number of CPUs, the amount of memory, and the minimum and maximum request capacity of the container.

Accelerators specifies the accelerators available on your notebook server.

Number of accelerators specifies the number of accelerators to use.

Important

Using accelerators is only supported with specific notebook images. For GPUs, only the PyTorch, TensorFlow, and CUDA notebook images are supported. For Habana Gaudi devices, only the HabanaAI notebook image is supported. In addition, you can only specify the number of accelerators required for your notebook server if accelerators are enabled on your cluster. To learn how to enable GPU support, see Enabling GPU support in OpenShift AI.

Environment variables

Specifies the name and value of variables to be set on the notebook server. Setting environment variables during server startup means that you do not need to define them in the body of your notebooks, or with the Jupyter command line interface. Some recommended environment variables are shown in the table.

Table 5.2. Recommended environment variables

Environment variable optionRecommended variable names

AWS

  • AWS_ACCESS_KEY_ID specifies your Access Key ID for Amazon Web Services.
  • AWS_SECRET_ACCESS_KEY specifies your Secret access key for the account specified in AWS_ACCESS_KEY_ID.