Chapter 6. Creating a project workbench

To examine and work with data models in an isolated area, you can create a workbench. This workbench enables you to create a new Jupyter notebook from an existing notebook container image to access its resources and properties. For data science projects that require data to be retained, you can add container storage to the workbench you are creating.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users) in OpenShift.
  • You have created a data science project that you can add a workbench to.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to add the workbench to.

    The Details page for the project opens.

  3. Click Create workbench in the Workbenches section.

    The Create workbench page opens.

  4. Configure the properties of the workbench you are creating.

    1. Enter a name for your workbench.
    2. Enter a description for your workbench.
    3. Select the notebook image to use for your workbench server.
    4. Select the container size for your server.
    5. Optional: Select and specify values for any new environment variables.
    6. Configure the storage for your OpenShift Data Science cluster.

      1. Select Create new persistent storage to create storage that is retained after you log out of OpenShift Data Science. Fill in the relevant fields to define the storage.
      2. Select Use existing persistent storage to reuse existing storage then select the storage from the Persistent storage list.
  5. Click Create workbench.

Verification

  • The workbench that you created appears on the Details page for the project.
  • Any cluster storage that you associated with the workbench during the creation process appears on the Details page for the project.
  • The Status column, located in the Workbenches section of the Details page, displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.

6.1. Launching Jupyter and starting a notebook server

Launch Jupyter and start a notebook server to start working with your notebooks.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • You know the names and values you want to use for any environment variables in your notebook server environment, for example, AWS_SECRET_ACCESS_KEY.
  • If you want to work with a very large data set, work with your administrator to proactively increase the storage capacity of your notebook server.

Procedure

  1. Locate the Jupyter card on the Enabled applications page.
  2. Click Launch application.

    1. If prompted, select your identity provider.
    2. Enter your credentials and click Log in (or equivalent for your identity provider).

      If you see Error 403: Forbidden, you are not in the default user group or the default administrator group for OpenShift Data Science. Contact your administrator so that they can add you to the correct group using Adding users for OpenShift Data Science.

      If you have not previously authorized the jupyter-nb-<username> service account to access your account, the Authorize Access page appears prompting you to provide authorization. Inspect the permissions selected by default, and click the Allow selected permissions button.

      If you credentials are accepted, the Notebook server control panel opens displaying the Start a notebook server page.

  3. Start a notebook server.

    This is not required if you have previously opened Jupyter.

    1. Select the Notebook image to use for your server.
    2. If the notebook image contains multiple versions, select the version of the notebook image from the Versions section.

      Note

      When a new version of a notebook image is released, the previous version remains available and supported on the cluster. This gives you time to migrate your work to the latest version of the notebook image.

      Notebook images can take up to 40 minutes to install. Notebooks images that have not finished installing are not available for you to select. If an installation of a notebook image has not completed, an alert is displayed.

    3. Select the Container size for your server.
    4. Optional: Select the Number of GPUs (graphics processing units) for your server.

      Important

      Using GPUs to accelerate workloads is only supported with the PyTorch, TensorFlow, and CUDA notebook server images.

    5. Optional: Select and specify values for any new Environment variables.

      For example, if you plan to integrate with Red Hat OpenShift Streams for Apache Kafka, create environment variables to store your Kafka bootstrap server and the service account username and password here.

      The interface stores these variables so that you only need to enter them once. Example variable names for common environment variables are automatically provided for frequently integrated environments and frameworks, such as Amazon Web Services (AWS).

      Important

      Ensure that you select the Secret checkbox for any variables with sensitive values that must be kept private, such as passwords.

    6. Optional: Select the Start server in current tab checkbox if necessary.
    7. Click Start server.

      The Starting server progress indicator appears. If you encounter a problem during this process, an error message appears with more information. Click Expand event log to view additional information about the server creation process. Depending on the deployment size and resources you requested, starting the server can take up to several minutes. Click Cancel to cancel the server creation. After the server starts, the JupyterLab interface opens.

      Warning

      You can be logged in to Jupyter for a maximum of 24 hours. After 24 hours, your user credentials expire, you are logged out of Jupyter, and your notebook server pod is stopped and deleted regardless of any work running in the notebook server. To help mitigate this, your administrator can configure OAuth tokens to expire after a set period of inactivity. See Configuring the internal OAuth server for more information.

Verification

  • The JupyterLab interface opens in a new tab.

Troubleshooting

  • If you see the "Unable to load notebook server configuration options" error message, contact your administrator so that they can review the logs associated with your Jupyter pod and determine further details about the problem.

6.2. Options for notebook server environments

When you start Jupyter for the first time, or after stopping your notebook server, you must select server options in the Start a notebook server wizard so that the software and variables that you expect are available on your server. This section explains the options available in the Start a notebook server wizard in detail.

The Start a notebook server page is divided into several sections:

Notebook image
Specifies the container image that your notebook server is based on. Different notebook images have different packages installed by default. See Notebook image options for details.
Deployment size

Specifies the compute resources available on your notebook server.

Container size controls the number of CPUs, the amount of memory, and the minimum and maximum request capacity of the container.

Environment variables
Specifies the name and value of variables to be set on the notebook server. Setting environment variables during server startup means that you do not need to define them in the body of your notebooks, or with the Jupyter command line interface. See Recommended environment variables for a list of reserved variable names for each item in the Environment variables list.

Table 6.1. Notebook image options

Image namePreinstalled packages

CUDA

  • Python 3.8
  • CUDA 11
  • JupyterLab 3.2
  • Notebook 6.4

Minimal Python (default)

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4

PyTorch

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • PyTorch 1.8
  • CUDA 11
  • TensorBoard 1.15
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Numpy 1.19
  • Pandas 1.2
  • Scikit-learn 0.24
  • SciPy 1.6

Standard Data Science

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Pandas 1.2
  • Numpy 1.19
  • Scikit-learn 0.24
  • SciPy 1.6

TensorFlow

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • TensorFlow 2.7
  • TensorBoard 2.6
  • CUDA 11
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Numpy 1.19
  • Pandas 1.2
  • Scikit-learn 0.24
  • SciPy 1.6