Menu Close
Settings Close

Language and Page Formatting Options

Getting started with Red Hat OpenShift Data Science

Red Hat OpenShift Data Science 1

Learn how to work in an OpenShift Data Science environment

Abstract

Log in and start up your notebook server to get started working with your notebooks in Jupyter.

Chapter 1. Providing feedback on Red Hat documentation

Let Red Hat know how we can make our documentation better. You can provide feedback directly from a documentation page by following the steps below.

  1. Make sure that you are logged in to the Customer Portal.
  2. Make sure that you are looking at the Multi-page HTML format of this document.
  3. Highlight the text that you want to provide feedback on. The Add Feedback prompt appears.
  4. Click Add Feedback.
  5. Enter your comments in the Feedback text box and click Submit.

Red Hat automatically creates a tracking issue each time you submit feedback. Open the link that is displayed after you click Submit and start watching the issue, or add more comments to give us more information about the problem.

Thank you for taking the time to provide your feedback.

Chapter 2. Logging in to OpenShift Data Science

Log in to OpenShift Data Science from a browser for easy access to Jupyter and your data science projects.

Procedure

  1. Browse to the OpenShift Data Science instance URL and click Log in with OpenShift.

    • If you are a data scientist user, your administrator must provide you with the OpenShift Data Science instance URL, for example, https://rhods-dashboard-redhat-ods-applications.apps.example.abc1.p1.openshiftapps.com/.
    • If you have access to OpenShift Dedicated, you can browse to the OpenShift Dedicated web console and click the Application Launcher ( The application launcher ) → Red Hat OpenShift Data Science.
  2. Click the name of your identity provider, for example, GitHub.
  3. Enter your credentials and click Log in (or equivalent for your identity provider).

    If you have not previously authorized the rhods-dashboard service account to access your account, the Authorize Access page appears prompting you to provide authorization. Inspect the permissions selected by default, and click the Allow selected permissions button.

Verification

  • OpenShift Data Science opens on the Enabled applications page.

Troubleshooting

  • If you see An authentication error occurred or Could not create user when you try to log in:

    • You might have entered your credentials incorrectly. Confirm that your credentials are correct.
    • You might have an account in more than one configured identity provider. If you have logged in with a different identity provider previously, try again with that identity provider.

Chapter 3. The OpenShift Data Science user interface

The Red Hat OpenShift Data Science interface is based on the OpenShift web console user interface.

The OpenShift Data Science user interface is divided into several areas:

  • The global navigation bar, which provides access to useful controls, such as Help and Notifications.

    Figure 3.1. The global navigation bar

    The global navigation bar
  • The side navigation menu, which contains different categories of pages available in OpenShift Data Science.

    Figure 3.2. The side navigation menu

    The side navigation menu
  • The main display area, which displays the current page and shares space with any drawers currently displaying information, such as notifications or quick start guides. The main display area also displays the Notebook server control panel where you can launch Jupyter by starting and configuring a notebook server. Administrators can also use the Notebook server control panel to manage other users' notebook servers.

    Figure 3.3. The main display area

    The main display area

3.2. Side navigation

There are three main sections in the side navigation:

Applications → Enabled

The Enabled page displays applications that are enabled and ready to use on OpenShift Data Science. This page is the default landing page for OpenShift Data Science.

Click the Launch application button on an application card to open the application interface in a new tab. If an application has an associated quick start tour, click the drop-down menu on the application’s card and select Open quick start to access it. This page also displays applications and components that have been disabled by your administrator. Disabled applications are denoted with Disabled on the application’s card. Click Disabled on the application’s card to access links allowing you to remove the card itself, and to re-validate its license, if the license had previously expired.

Applications → Explore
The Explore page displays applications that are available for use with OpenShift Data Science. Click on a card for more information about the application or to access the Enable button. The Enable button is only visible if your administrator has purchased and enabled an application at the OpenShift Dedicated level.
Resources
The Resources page displays learning resources such as documentation, how-to material, and quick start tours. You can filter visible resources using the options displayed on the left, or enter terms into the search bar.
Settings → Notebook images
The Notebook image settings page allows you to configure custom notebook images that cater to your project’s specific requirements. After you have added custom notebook images to your deployment of OpenShift Data Science, they are available for selection when creating a notebook server.
Settings → Cluster settings

The Cluster settings page allows you perform the following administrative tasks on your cluster:

  • Enable or disable Red Hat’s ability to collect data about OpenShift Data Science usage on your cluster.
  • Configure how resources are claimed within your cluster by changing the default size of the cluster’s persistent volume claim (PVC).
  • Reduce resource usage in your OpenShift Data Science deployment by stopping notebook servers that have been idle.
  • Schedule notebook pods on tainted nodes by adding tolerations.
Settings → User management
The User and group settings page allows you to define OpenShift Data Science user group and admin group membership.

Chapter 4. Notifications in OpenShift Data Science

Red Hat OpenShift Data Science displays notifications when important events happen in the cluster.

Notification messages are displayed in the lower left corner of the Red Hat OpenShift Data Science interface when they are triggered.

If you miss a notification message, click the Notifications button ( Notifications icon ) to open the Notifications drawer and view unread messages.

Figure 4.1. The Notifications drawer

The OpenShift Data Science interface with the Notifications drawer visible

Chapter 5. Creating a data science project

To start your data science work, create a data science project. Creating a project helps you organize your work in one place. You can also enhance the capabilities of your data science project by adding workbenches, adding storage to your project’s cluster, or by adding data connections.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users) in OpenShift.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click Create data science project.

    The Create a data science project dialog opens.

  3. Enter a name for your data science project.
  4. Optional: Edit the resource name for your data science project. The resource name must consist of lowercase alphanumeric characters, -, and must start and end with an alphanumeric character.
  5. Enter a description for your data science project.
  6. Click Create.

    The Project details page opens. From here, you can create workbenches, add cluster storage, and add data connections to your project.

Verification

  • The data science project that you created is displayed on the Data science projects page.

Chapter 6. Creating a project workbench

To examine and work with data models in an isolated area, you can create a workbench. This workbench enables you to create a new Jupyter notebook from an existing notebook container image to access its resources and properties. For data science projects that require data to be retained, you can add container storage to the workbench you are creating.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group (for example, rhods-users) in OpenShift.
  • You have created a data science project that you can add a workbench to.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to add the workbench to.

    The Details page for the project opens.

  3. Click Create workbench in the Workbenches section.

    The Create workbench page opens.

  4. Configure the properties of the workbench you are creating.

    1. Enter a name for your workbench.
    2. Enter a description for your workbench.
    3. Select the notebook image to use for your workbench server.
    4. Select the container size for your server.
    5. Optional: Select and specify values for any new environment variables.
    6. Configure the storage for your OpenShift Data Science cluster.

      1. Select Create new persistent storage to create storage that is retained after you log out of OpenShift Data Science. Fill in the relevant fields to define the storage.
      2. Select Use existing persistent storage to reuse existing storage then select the storage from the Persistent storage list.
  5. Click Create workbench.

Verification

  • The workbench that you created appears on the Details page for the project.
  • Any cluster storage that you associated with the workbench during the creation process appears on the Details page for the project.
  • The Status column, located in the Workbenches section of the Details page, displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.

6.1. Launching Jupyter and starting a notebook server

Launch Jupyter and start a notebook server to start working with your notebooks.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • You know the names and values you want to use for any environment variables in your notebook server environment, for example, AWS_SECRET_ACCESS_KEY.
  • If you want to work with a very large data set, work with your administrator to proactively increase the storage capacity of your notebook server.

Procedure

  1. Locate the Jupyter card on the Enabled applications page.
  2. Click Launch application.

    1. If prompted, select your identity provider.
    2. Enter your credentials and click Log in (or equivalent for your identity provider).

      If you see Error 403: Forbidden, you are not in the default user group or the default administrator group for OpenShift Data Science. Contact your administrator so that they can add you to the correct group using Adding users for OpenShift Data Science.

      If you have not previously authorized the odh-dashboard service account to access your account, the Authorize Access page appears prompting you to provide authorization. Inspect the permissions selected by default, and click the Allow selected permissions button.

      If you credentials are accepted, the Notebook server control panel opens displaying the Start a notebook server page.

  3. Start a notebook server.

    This is not required if you have previously opened Jupyter.

    1. Select the Notebook image to use for your server.
    2. If the notebook image contains multiple versions, select the version of the notebook image from the Versions section.

      Note

      When a new version of a notebook image is released, the previous version remains available and supported on the cluster. This gives you time to migrate your work to the latest version of the notebook image.

      Notebook images can take up to 40 minutes to install. Notebooks images that have not finished installing are not available for you to select. If an installation of a notebook image has not completed, an alert is displayed.

    3. Select the Container size for your server.
    4. Optional: Select the Number of GPUs (graphics processing units) for your server.

      Important

      Using GPUs to accelerate workloads is only supported with the PyTorch, TensorFlow, and CUDA notebook server images.

    5. Optional: Select and specify values for any new Environment variables.

      For example, if you plan to integrate with Red Hat OpenShift Streams for Apache Kafka, create environment variables to store your Kafka bootstrap server and the service account username and password here.

      The interface stores these variables so that you only need to enter them once. Example variable names for common environment variables are automatically provided for frequently integrated environments and frameworks, such as Amazon Web Services (AWS).

      Important

      Ensure that you select the Secret checkbox for any variables with sensitive values that must be kept private, such as passwords.

    6. Optional: Select the Start server in current tab checkbox if necessary.
    7. Click Start server.

      The Starting server progress indicator appears. If you encounter a problem during this process, an error message appears with more information. Click Expand event log to view additional information about the server creation process. Depending on the deployment size and resources you requested, starting the server can take up to several minutes. Click Cancel to cancel the server creation. After the server starts, the JupyterLab interface opens.

      Warning

      You can be logged in to Jupyter for a maximum of 24 hours. After 24 hours, your user credentials expire, you are logged out of Jupyter, and your notebook server pod is stopped and deleted regardless of any work running in the notebook server. To help mitigate this, your administrator can configure OAuth tokens to expire after a set period of inactivity. See Configuring the internal OAuth server for more information.

Verification

  • The JupyterLab interface opens in a new tab.

Troubleshooting

  • If you see the "Unable to load notebook server configuration options" error message, contact your administrator so that they can review the logs associated with your Jupyter pod and determine further details about the problem.

6.2. Options for notebook server environments

When you start Jupyter for the first time, or after stopping your notebook server, you must select server options in the Start a notebook server wizard so that the software and variables that you expect are available on your server. This section explains the options available in the Start a notebook server wizard in detail.

The Start a notebook server page is divided into several sections:

Notebook image
Specifies the container image that your notebook server is based on. Different notebook images have different packages installed by default. See Notebook image options for details.
Deployment size

Specifies the compute resources available on your notebook server.

Container size controls the number of CPUs, the amount of memory, and the minimum and maximum request capacity of the container.

Environment variables
Specifies the name and value of variables to be set on the notebook server. Setting environment variables during server startup means that you do not need to define them in the body of your notebooks, or with the Jupyter command line interface. See Recommended environment variables for a list of reserved variable names for each item in the Environment variables list.

Table 6.1. Notebook image options

Image namePreinstalled packages

CUDA

  • Python 3.8
  • CUDA 11
  • JupyterLab 3.2
  • Notebook 6.4

Minimal Python (default)

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4

PyTorch

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • PyTorch 1.8
  • CUDA 11
  • TensorBoard 1.15
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Numpy 1.19
  • Pandas 1.2
  • Scikit-learn 0.24
  • SciPy 1.6

Standard Data Science

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Pandas 1.2
  • Numpy 1.19
  • Scikit-learn 0.24
  • SciPy 1.6

TensorFlow

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • TensorFlow 2.7
  • TensorBoard 2.6
  • CUDA 11
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Numpy 1.19
  • Pandas 1.2
  • Scikit-learn 0.24
  • SciPy 1.6

Chapter 7. Tutorials for data scientists

To help you get started quickly, you can access learning resources for Red Hat OpenShift Data Science and its supported applications. These resources are available on the Resources tab of the Red Hat OpenShift Data Science user interface.

Table 7.1. Tutorials

Resource NameDescription

Accelerating scientific workloads in Python with Numba

Watch a video about how to make your Python code run faster.

Building interactive visualizations and dashboards in Python

Explore a variety of data across multiple notebooks and learn how to deploy full dashboards and applications.

Building machine learning models with scikit-learn

Learn how to build machine learning models with scikit-learn for supervised learning, unsupervised learning, and classification problems.

Building a binary classification model

Train a model to predict if a customer is likely to subscribe to a bank promotion.

Choosing Python tools for data visualization

Use the PyViz.org website to help you decide on the best open source Python data visualization tools for you.

Exploring Anaconda for data science

Learn about Anaconda, a freemium open source distribution of the Python and R programming languages.

Getting started with Pachyderm concepts

Learn Pachyderm’s main concepts by creating pipelines that perform edge detection on a few images.

GPU Computing in Python with Numba

Learn how to create GPU accelerated functions using Numba.

Run a Python notebook to generate results in IBM Watson OpenScale

Run a Python notebook to create, train, and deploy a machine learning model.

Running an AutoAI experiment to build a model

Watch a video about building a binary classification model for a marketing campaign.

Training a regression model in Pachyderm

Learn how to create a sample housing data repository using a Pachyderm cluster to run experiments, analyze data, and set up regression.

Using Dask for parallel data analysis

Analyze medium-sized datasets in parallel locally using Dask, a parallel computing library that scales the existing Python ecosystem.

Using Jupyter notebooks in Watson Studio

Watch a video about working with Jupyter notebooks in Watson Studio.

Using Pandas for data analysis in Python

Learn how to use pandas, a data analysis library for the Python programming language.

Table 7.2. Quick start guides

Resource NameDescription

Connecting to Red Hat OpenShift Streams for Apache Kafka

Connect to Red Hat Streams for Apache Kafka from a Jupyter notebook.

Creating a Jupyter notebook

Create a Jupyter notebook in JupyterLab.

Creating a Machine Learning Model using the NVIDIA GPU Add-on

Creating a Machine Learning model on Jupyter that uses the GPUs that you have made available.

Creating an Anaconda-enabled Jupyter notebook

Create an Anaconda-enabled Jupyter notebook and access Anaconda packages that are curated for security and compatibility.

Deploying a model with Watson Studio

Import a notebook in Watson Studio and use AutoAI to build and deploy a model.

Deploying a sample Python application using Flask and OpenShift

Deploy your data science model out of a Jupyter notebook and into a Flask application to use as a development sandbox.

Importing Pachyderm Beginner Tutorial Notebook

Load Pachyderm’s beginner tutorial notebook and learn about Pachyderm’s main concepts such as data repositories, pipelines, and using the pachctl CLI from your cells.

Installing and verifying the NVIDIA GPU add-on

Learn how to install and verify that Jupyter detects the GPUs available for use.

Opening and updating a SKLearn model with canary deployment

Open a SKLearn model and update it using canary deployment practices.

Querying data with Starburst Galaxy

Learn to query data using Starburst Galaxy from a Jupyter notebook.

Securing a deployed model using Red Hat OpenShift API Management

Protect a model service API using Red Hat OpenShift API Management.

Using the Intel® oneAPI AI Analytics Toolkit (AI Kit) Notebook

Run a data science notebook sample with the Intel® oneAPI AI Analytics Toolkit.

Using the OpenVINO toolkit

Quantize an ONNX computer vision model using the OpenVINO model optimizer and use the result for inference from a notebook.

Table 7.3. How to guides

Resource NameDescription

How to choose between notebook runtime environment options

Explore available options for configuring your notebook runtime environment.

How to clean, shape, and visualize data

Learn how to clean and shape tabular data using IBM Watson Studio data refinery.

How to create a connection to access data

Learn how to create connections to various data sources across the platform.

How to create a deployment space

Learn how to create a deployment space for machine learning.

How to create a notebook in Watson Studio

Learn how to create a basic Jupyter notebook in Watson Studio.

How to create a project in Watson Studio

Learn how to create an analytics project in Watson Studio.

How to create a project that integrates with Git

Learn how to add assets from a Git repository into a project.

How to install Python packages on your notebook server

Learn how to install additional Python packages on your notebook server.

How to load data into a Jupyter notebook

Learn how to integrate data sources into a Jupyter notebook by loading data.

How to serve a model using OpenVINO Model Server

Learn how to deploy optimized models with the OpenVINO Model Server using OpenVINO custom resources.

How to set up Watson OpenScale

Learn how to track and measure outcomes from models with OpenScale.

How to update notebook server settings

Learn how to update the settings or the notebook image on your notebook server.

How to use data from Amazon S3 buckets

Learn how to connect to data in S3 Storage using environment variables.

How to view installed packages on your notebook server

Learn how to see which packages are installed on your running notebook server.

7.1. Accessing tutorials

You can access learning resources for Red Hat OpenShift Data Science and supported applications.

Prerequisites

  • Ensure that you have logged in to Red Hat OpenShift Data Science.
  • You have logged in to the OpenShift Dedicated web console.

Procedure

  1. On the Red Hat OpenShift Data Science home page, click Resources.

    The Resources page opens.

  2. Click Access tutorial on the relevant card.

Verification

  • You can view and access the learning resources for Red Hat OpenShift Data Science and supported applications.

Additional resources

Chapter 8. Enabling services connected to OpenShift Data Science

You must enable SaaS-based services, such as Red OpenShift Streams for Apache Kafka and Anaconda, before using them with Red Hat OpenShift Data Science. On-cluster services are enabled automatically.

Typically, you can install services, or enable services connected to OpenShift Data Science using one of the following methods:

  • Enabling the service from the Explore page on the OpenShift Data Science dashboard, as documented in this procedure.
  • Installing the service’s operator from OperatorHub. OperatorHub is a web console for cluster administrators to discover and select Operators to install on their cluster. It is deployed by default in OpenShift Container Platform (Installing from OperatorHub using the web console).

    Note

    Deployments containing operators installed from OperatorHub may not be fully supported by Red Hat.

  • Installing the service’s operator from Red Hat Marketplace (Install operators).
  • Installing the service as an Add-on to your Red Hat OpenShift Dedicated cluster using Red Hat OpenShift Cluster Manager (Installing OpenShift Data Science on OpenShift Dedicated).

For most services, the service endpoint is available on the service’s tile on the Enabled page of OpenShift Data Science. Certain services cannot be accessed directly from their tiles, for example, OpenVINO and Anaconda provide notebook images for use in Jupyter and do not provide an endpoint link from their tile. Additionally, for services such as OpenShift Streams for Apache Kafka, it may be useful to store these endpoint URLs as environment variables for easy reference in a notebook environment.

Some independent software vendor (ISV) applications must be installed in specific OpenShift Data Science Add-on namespaces. However, do not install ISV applications in namespaces associated with OpenShift Data Science Add-ons unless you are specifically directed to do so on the application’s card on the dashboard.

To help you get started quickly, you can access the service’s learning resources and documentation on the Resources page, or by clicking the relevant link on the service’s tile on the Enabled page.

Prerequisites

  • You have logged in to OpenShift Data Science.
  • Your administrator has installed or configured the service on your OpenShift Dedicated cluster.

Procedure

  1. On the OpenShift Data Science home page, click Explore.

    The Explore page opens.

  2. Click the card of the service that you want to enable.
  3. Click Enable on the drawer for the service.
  4. If prompted, enter the service’s key and click Connect.
  5. Click Enable to confirm that you are enabling the service.

Verification

  • The service that you enabled appears on the Enabled page.
  • The service endpoint is displayed on the service’s tile on the Enabled page

Chapter 9. Disabling applications connected to OpenShift Data Science

You can disable applications and components so that they do not appear on the OpenShift Data Science dashboard when you no longer want to use them, for example, when data scientists no longer use an application or when the application’s license expires.

Disabling unused applications allows your data scientists to manually remove these application cards from their OpenShift Data Science dashboard so that they can focus on the applications that they are most likely to use. See Removing disabled applications from OpenShift Data Science for more information about manually removing application cards.

Important

Do not follow this procedure when disabling the following applications:

  • Anaconda Professional Edition. You cannot manually disable Anaconda Professional Edition. It is automatically disabled only when its license expires.
  • Red Hat OpenShift API Management. You can only uninstall Red Hat OpenShift API Management from OpenShift Cluster Manager.
  • Red Hat OpenShift Streams for Apache Kafka.

Prerequisites

  • You have logged in to the OpenShift Dedicated web console.
  • You are part of the cluster-admins user group in OpenShift Dedicated.
  • You have installed or configured the service on your OpenShift Dedicated cluster.
  • The application or component that you want to disable is enabled and appears on the Enabled page.

Procedure

  1. In the OpenShift Dedicated web console, change into the Administrator perspective.
  2. Change into the redhat-ods-applications project.
  3. Click OperatorsInstalled Operators.
  4. Click on the operator that you want to uninstall. You can enter a keyword into the Filter by name field to help you find the operator faster.
  5. Delete any operator resources or instances by using the tabs in the operator interface.

    During installation, some operators require the administrator to create resources or start process instances using tabs in the operator interface. These must be deleted before the operator can uninstall correctly.

  6. On the Operator Details page, click the Actions drop-down menu and select Uninstall Operator.

    An Uninstall Operator? dialog box is displayed.

  7. Select Uninstall to uninstall the operator, operator deployments, and pods. After this is complete, the operator stops running and no longer receives updates.
Important

Removing an operator does not remove any of that operator’s custom resource definitions or managed resources. Custom resource definitions and managed resources still exist and must be cleaned up manually. Any applications deployed by your operator and any configured off-cluster resources continue to run and must be cleaned up manually.

Verification

  • The operator is uninstalled from its target clusters.
  • The operator no longer appears on the Installed Operators page.
  • The disabled application is no longer available for your data scientists to use, and is marked as Disabled on the Enabled page of the OpenShift Data Science dashboard. This action may take a few minutes to occur following the removal of the operator.

9.1. Removing disabled applications from OpenShift Data Science

After your administrator has disabled your unused applications, you can manually remove them from the OpenShift Data Science dashboard. Disabling and removing unused applications allows you to focus on the applications that you are most likely to use.

Prerequisites

  • Ensure that you have logged in to Red Hat OpenShift Data Science.
  • You have logged in to the OpenShift Dedicated web console.
  • Your administrator has previously disabled the application that you want to remove.

Procedure

  1. In the OpenShift Data Science interface, click Enabled.

    The Enabled page opens. Disabled applications are denoted with Disabled on the application’s card.

  2. Click Disabled on the card of the application that you want to remove.
  3. Click the link to remove the application card.

Verification

  • The disabled application’s card no longer appears on the Enabled page.

Chapter 10. Support requirements and limitations

Review this section to understand the requirements for Red Hat support and any limitations to Red Hat support of Red Hat OpenShift Data Science.

10.1. Supported browsers

Red Hat OpenShift Data Science supports the latest version of the following browsers:

  • Google Chrome
  • Mozilla Firefox
  • Safari

10.2. Supported services

Red Hat OpenShift Data Science supports the following services:

Table 10.1. Supported services

Service NameDescription

Anaconda Professional Edition

Anaconda Professional Edition is a popular open source package distribution and management experience that is optimized for commercial use.

IBM Watson Studio

IBM Watson Studio is a platform for embedding AI and machine learning into your business and creating custom models with your own data.

Intel® oneAPI AI Analytics Toolkits

The AI Kit is a set of AI software tools to accelerate end-to-end data science and analytics pipelines on Intel® architectures.

Jupyter

Jupyter is a multi-user version of the notebook designed for companies, classrooms, and research labs.

Important

While every effort is made to make Red Hat OpenShift Data Science resilient to OpenShift node failure, upgrades, and similarly disruptive operations, individual users' notebook environments can be interrupted during these events. If an OpenShift node restarts or becomes unavailable, any user notebook environment on that node is restarted on a different node. When this occurs, any ongoing process executing in the user’s notebook environment is interrupted, and the user needs to re-execute it when their environment becomes available again.

Due to this limitation, Red Hat recommends that processes for which interruption is unacceptable are not executed in the Jupyter notebook server environment on OpenShift Data Science.

Pachyderm

Use Pachyderm’s data versioning, pipeline and lineage capabilities to automate the machine learning life cycle and optimize machine learning operations.

Red Hat OpenShift API Management

OpenShift API Management is a service that accelerates time-to-value and reduces the cost of delivering API-first, microservices-based applications.

Red Hat OpenShift Streams for Apache Kafka

OpenShift Streams for Apache Kafka is a service for streaming data that reduces the cost and complexity of delivering real-time applications.

OpenVINO

OpenVINO is an open-source toolkit to help optimize deep learning performance and deploy using an inference engine onto Intel hardware.

Starburst Galaxy

Starburst Galaxy is a fully managed service to run high-performance queries across your various data sources using SQL.

10.3. Supported packages

Notebook server images in Red Hat OpenShift Data Science are installed with Python 3.8 by default. See the table in Options for notebook server environments for a complete list of packages and versions included in these images.

You can install packages that are compatible with Python 3.8 on any notebook server that has the binaries required by that package. If the required binaries are not included on the notebook server image you want to use, contact Red Hat Support to request that the binary be considered for inclusion.

You can install packages on a temporary basis by using the pip install command. You can also provide a list of packages to the pip install command using a requirements.txt file. See Installing Python packages on your notebook server for more information.

You must re-install these packages each time you start your notebook server.

You can remove packages by using the pip uninstall command.

Chapter 11. Common questions

In addition to documentation, Red Hat provides several "how-to" documents that answer common questions a data scientist might have as they work.

The currently available "how to" documents are linked here:

Chapter 12. Troubleshooting common problems in Jupyter for administrators

If your users are experiencing errors in Red Hat OpenShift Data Science relating to Jupyter, their notebooks, or their notebook server, read this section to understand what could be causing the problem, and how to resolve the problem.

If you cannot see the problem here or in the release notes, contact Red Hat Support.

12.1. A user receives a 404: Page not found error when logging in to Jupyter

Problem

If you have configured specialized OpenShift Data Science user groups, the user name might not be added to the default user group for OpenShift Data Science.

Diagnosis

Check whether the user is part of the default user group.

  1. Find the names of groups allowed access to Jupyter.

    1. Log in to OpenShift Dedicated web console.
    2. Click WorkloadsConfigMaps and click on the rhods-groups-config ConfigMap to open it.
    3. Click on the YAML tab and check the values for allowed_groups. These are the names of groups that have access to Jupyter.

        data:
          admin_groups: <admin-group>
          allowed_groups: <user-group>

      Where <admin-group> is the name of your administrator group and <user-group> is the name of your user group.

  2. Click User managementGroups and click on the name of each group to see its members.

Resolution

  • If the user is not added to any of the groups allowed access to Jupyter, follow Adding users for OpenShift Data Science to add them.
  • If the user is already added to a group that is allowed to access Jupyter, contact Red Hat Support.

12.2. A user’s notebook server does not start

Problem

The OpenShift Dedicated cluster that hosts the user’s notebook server might not have access to enough resources, or the Jupyter pod may have failed.

Diagnosis

  1. Log in to OpenShift Dedicated web console.
  2. Delete and restart the notebook server pod for this user.

    1. Click WorkloadsPods and set the Project to rhods-notebooks.
    2. Search for the notebook server pod that belongs to this user, for example, jupyter-nb-<username>-*.

      If the notebook server pod exists, an intermittent failure may have occurred in the notebook server pod.

      If the notebook server pod for the user does not exist, continue with diagnosis.

  3. Check the resources currently available in the OpenShift Dedicated cluster against the resources required by the selected notebook server image.

    If worker nodes with sufficient CPU and RAM are available for scheduling in the cluster, continue with diagnosis.

  4. Check the state of the Jupyter pod.

Resolution

  • If there was an intermittent failure of the notebook server pod:

    1. Delete the notebook server pod that belongs to the user.
    2. Ask the user to start their notebook server again.
  • If the notebook server does not have sufficient resources to run the selected notebook server image, either add more resources to the OpenShift Dedicated cluster, or choose a smaller image size.
  • If the Jupyter pod is in a FAILED state:

    1. Retrieve the logs for the jupyter-nb-* pod and send them to Red Hat Support for further evaluation.
    2. Delete the jupyter-nb-* pod.
  • If none of the previous resolutions apply, contact Red Hat Support.

12.3. The user receives a database or disk is full error or a no space left on device error when they run notebook cells

Problem

The user might have run out of storage space on their notebook server.

Diagnosis

  1. Log in to Jupyter and start the notebook server that belongs to the user having problems. If the notebook server does not start, follow these steps to check whether the user has run out of storage space:

    1. Log in to OpenShift Dedicated web console.
    2. Click WorkloadsPods and set the Project to rhods-notebooks.
    3. Click the notebook server pod that belongs to this user, for example, jupyter-nb-<idp>-<username>-*.
    4. Click Logs. The user has exceeded their available capacity if you see lines similar to the following:

      Unexpected error while saving file: XXXX database or disk is full

Resolution

  • Increase the user’s available storage by expanding their persistent volume: Expanding persistent volumes
  • Work with the user to identify files that can be deleted from the /opt/app-root/src directory on their notebook server to free up their existing storage space.

Chapter 13. Troubleshooting common problems in Jupyter for users

If you are seeing errors in Red Hat OpenShift Data Science related to Jupyter, your notebooks, or your notebook server, read this section to understand what could be causing the problem.

If you cannot see your problem here or in the release notes, contact Red Hat Support.

13.1. I see a 403: Forbidden error when I log in to Jupyter

Problem

If your administrator has configured specialized OpenShift Data Science user groups, your user name might not be added to the default user group or the default administrator group for OpenShift Data Science.

Resolution

  • Contact your administrator so that they can add you to the correct group/s.

13.2. My notebook server does not start

Problem

The OpenShift Dedicated cluster that hosts your notebook server might not have access to enough resources, or the Jupyter pod may have failed.

Resolution

Check the logs in the Events section in OpenShift Dedicated for error messages associated with the problem. For example:

Server requested
2021-10-28T13:31:29.830991Z [Warning] 0/7 nodes are available: 2 Insufficient memory,
2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: },
that the pod didn't tolerate.

Contact your administrator with details of any relevant error messages so that they can perform further checks.

13.3. I see a database or disk is full error or a no space left on device error when I run my notebook cells

Problem

You might have run out of storage space on your notebook server.

Resolution

Contact your administrator so that they can perform further checks.

Legal Notice

Copyright © 2022 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.