Chapter 3. Working with notebooks on OpenShift Data Science

As a data scientist, you control what goes into your notebook server environment. You can install software as needed to ensure your server has everything your notebooks and your models require.

Important

OpenShift Data Science sends notifications to nominated email addresses (usually your administrator) when the storage for your notebook server is 90% full, and again when it is completely full.

If you download a very large data set, your storage can fill up before your administrator receives a notification, so you might run out of room before your administrator can give you more storage.

To avoid this issue, Red Hat recommends streaming larger data sets from external services where possible. Alternatively, you can proactively request more storage from your administrator when you plan to work with a very large data set, to ensure you have sufficient space.

3.1. Viewing Python packages installed on your notebook server

You can check which Python packages are installed on your notebook server and which version of the package you have by running the pip tool in a notebook cell.

Prerequisites

  • Log in to JupyterHub and open a notebook.

Procedure

  1. Enter the following in a new cell in your notebook:

    !pip list
  2. Run the cell.

Verification

  • The output shows an alphabetical list of all installed Python packages and their versions. For example, if you use this command immediately after creating a notebook server using the Minimal image the first packages shown are similar to the following:

    Package                           Version
    --------------------------------- ----------
    aiohttp                           3.7.3
    alembic                           1.5.2
    appdirs                           1.4.4
    argo-workflows                    3.6.1
    argon2-cffi                       20.1.0
    async-generator                   1.10
    async-timeout                     3.0.1
    attrdict                          2.0.1
    attrs                             20.3.0
    backcall                          0.2.0

3.2. Installing Python packages on your notebook server

You can install Python packages that are not part of the default notebook server image by adding the package and the version to a requirements.txt file and then running the pip install command in a notebook cell.

Note

You can also install packages directly, but Red Hat recommends using a requirements.txt file so that it is easier to deploy your model later.

Prerequisites

  • Log in to JupyterHub and open a notebook.

Procedure

  1. Create a new text file using one of the following methods:

    • Click + to open a new launcher and click Text file.
    • Click FileNewText File.
  2. Rename the text file to requirements.txt.

    1. Right-click on the name of the file and click Rename Text. The Rename File dialog opens.
    2. Enter requirements.txt in the New Name field and click Rename.
  3. Add the packages to install to the requirements.txt file.

    altair

    You can specify the exact version to install by using the == (equal to) operator, for example:

    altair==4.1.0

    To install multiple packages at the same time, place each package on a separate line.

  4. Install the packages in requirements.txt to your server using a notebook cell.

    1. Create a new cell in your notebook and enter the following command.

      !pip install -r requirements.txt
    2. Run the cell by pressing Shift and Enter.
    Important

    This installs the package on your notebook server, but you must still run the import directive in a code cell to use the package in your code.

    import altair

Verification

3.3. Updating notebook server settings by restarting your server

You can update the settings on your notebook server by stopping and relaunching the notebook server. For example, if your server runs out of memory, you can restart the server to make the container size larger.

Prerequisites

  • A running notebook server.
  • Log in to JupyterHub.

Procedure

  1. Click FileHub Control Panel.

    The control panel opens in a new tab.

  2. Click the Stop my server button.

    This button disappears when the server stops.

  3. Click My Server to restart the server and select new settings.

Verification

  • The notebook server launcher opens when the server restarts.