Developing and deploying data models

Red Hat OpenShift Data Science 1

Understand the development and deployment workflow and deploy your data models in intelligent applications

Abstract

Understand the development and deployment workflow and deploy your data models in intelligent applications.

Preface

This documentation is provided for the Field Trial release of Red Hat OpenShift Data Science.

See the following documents for service and life cycle information related to this Field Trial release:

Chapter 1. Overview of developing and deploying data models

Read this section to understand the work required to develop and deploy an application that uses a predictive model created using Red Hat OpenShift Data Science.

Your organization might split responsibility for this process between several roles, such as a data scientist and an application developer, or this work might be done by a single role. An appropriate role is noted for each step.

Table 1.1. Development tasks by role

Application developerData scientistTask description

 

Create a Python S2I project in Git using an OpenShift Data Science application template.

 

Configure user access to the Git project so that data scientists can push to and pull from the repository.

From this point, you can develop the model and the application that uses it simultaneously.

 

Create an OpenShift application using the project repository.

 

Build the OpenShift application to verify your code.

 

Automate the build process using webhooks.

 

Launch JupyterHub and either create or import a notebook.

 

Import the application Git project into JupyterHub.

 

Develop and test your model using notebooks in JupyterHub.

 

Save your model as an independent Python function in a separate Python file.

 

Update the requirements.txt file with dependencies your function requires.

 

Test the function on your notebook server.

 

Push your updates back to the remote Git project.

Test the deployed application endpoint.

Chapter 2. Creating a Python S2I application for OpenShift Data Science

2.1. Creating a Python S2I application for OpenShift Data Science from a GitHub template

You can create an application suitable for Red Hat OpenShift Data Science quickly by using Red Hat’s Python S2I application repository as a template in GitHub. Use the template to generate a new repository with the same format, directory structure and files as an existing Red Hat OpenShift Data Science repository.

Prerequisites

  • You have a GitHub account.
  • You have credentials to access the GitHub repository containing the relevant template that you want to use.

Procedure

  1. On GitHub, navigate to the main page of the template repository.
  2. Click Use this template.
  3. Optional: From the Owner list, select the account that you want to own the repository.
  4. In the Repository name field, enter a name for the new repository.
  5. Optional: In the Description field, enter a description for the new repository.
  6. Set the repository’s visibility level.

    1. To ensure that the repository is visible to anyone, leave Public selected. By default, the repository’s visibility is set to Public.
    2. Click Private to restrict who can see and commit to the repository.
  7. Optional: Select the Include all branches check box to copy the template repository’s branches to your new repository.
  8. Click Create repository from template.

Verification

  • The repository that you created from the template is visible and accessible from your GitHub account.

2.2. Creating a Python S2I application for OpenShift Data Science using Cookiecutter

You can create an application suitable for Red Hat OpenShift Data Science quickly by using Cookiecutter. Cookiecutter is a Python library that creates a flexible, standardized project structure for your data science work. You can use Cookiecutter to further customize your project’s repository. For example, you can modify the repository’s directory structure to suit your project’s requirements.

Prerequisites

  • A launched and running JupyterHub server.
  • You have a GitHub account.
  • You have credentials to access the GitHub repository containing the template that you want to use.

Procedure

  1. In the JupyterHub interface, click FileNewTerminal.
  2. In the terminal, run the pip install command to install Cookiecutter.

    pip install cookiecutter
  3. Run the cookiecutter command to create a project from a Cookiecutter repository template.

    cookiecutter template-repository-url

    Replace template-repository-url with the template repository’s URL. For example, https://github.com/cookiecutter-template/template.

  4. When prompted, provide the following information:

    1. A name for your project.
    2. A name for your repository.
    3. A name for the project’s author.
    4. A description for your project.
    5. Your open source license file type.

      The contents of the Cookiecutter template repository appear in the File Browser in the left sidebar.

  5. Create a repository in GitHub.

    1. In the upper-right corner of the GitHub home page, click +New repository.

      The Create a new repository page opens.

    2. In the Repository template field, select the template that you want to use.
    3. Optional: Select the Include all branches check box to copy the template repository’s branches to your new repository.
    4. In the Owner field, select the repository owner’s user name.
    5. In the Repository name field, enter a name for the repository.
    6. Optional: In the Description field, enter a description of the repository.
  6. Set the repository’s visibility level.

    1. To ensure that the repository is visible to anyone, leave Public selected. By default, the repository’s visibility is set to Public.
    2. Click Private to choose who can see and commit to the repository.
    3. Click Create repository.
  7. Clone the repository on your JupyterHub server.

    1. In the JupyterHub interface, click GitClone a Repository.

      The Clone a repo dialog appears.

    2. Enter the URL of the repository that you want to clone.
    3. Click Clone.

      The cloned repository appears in the File Browser in the left sidebar.

    4. In the File Browser, move the files and directories created by Cookiecutter to the repository that you cloned.
  8. Push your changes to the remote repository.

    1. In the left sidebar, click Git ( The Git icon ).
    2. If you have untracked changes, in the Changes tab, hover the cursor over the Untracked section bar and click jupyter git add or commit .
    3. If you have files that contain changes, in the Changes tab, hover the cursor over the Changed section bar and click jupyter git add or commit .
    4. In the Required field, enter a summary of your changes.
    5. In the Description field, enter a description of your changes.
    6. Click Commit.
    7. In the JupyterHub interface, click GitPush to Remote to push your changes to the remote repository.

      The Git credentials required dialog opens.

    8. Enter your credentials to access the remote repository.
    9. Click OK.

Verification

  • You can access the remote repository that you created from the template.
  • You can see the changes that you pushed in the remote repository.

Chapter 3. Configuring user access to the remote Git repository

Your data scientists and application developers need developer access to the remote Git repository in order to push and pull data from the repository. The repository owner can add these users to the repository as developers to enable this access.

For GitHub repositories, refer to the GitHub documentation:

Chapter 4. Creating an OpenShift application from a Git repository

You can import code from a Git repository and use it to create, build, and deploy a Red Hat OpenShift Data Science application on OpenShift Dedicated.

Prerequisites

  • You have logged in to the OpenShift Dedicated web console.
  • You are in the Developer perspective.
  • You have the appropriate roles and permissions in a project to create applications and other workloads in OpenShift Dedicated.
  • You have a configured Git repository.
  • You have permissions for importing the Git repository.

Procedure

  1. In OpenShift Dedicated, select the project to create the application in, or create a new project for the application.
  2. In the +Add view, click From Git to see the Import from Git form.
  3. In the Git section, enter the Git repository URL for the codebase you want to use to create an application.
  4. Optional: Click Show Advanced Git Options to add details such as:

    • Git Reference to point to code in a specific branch, tag, or commit to be used to build the application.
    • Context Dir to specify the subdirectory for the application source code you want to use to build the application.
    • Source Secret to create a Secret Name with credentials for pulling your source code from a private repository.
  5. In the Builder section the appropriate builder image is detected and selected by default.
  6. In the General section:

    1. In the Application field, enter a unique name for the application grouping. This must be unique in the project.
    2. The Name field is populated automatically based on the git repository URL. This is used to identify the resources created for this application.
  7. In the Resources section, select Deployment Config, to create an OpenShift style application.
  8. In the Advanced Options section:

    1. The Create a route to the application checkbox is selected by default so that you can access your application using a publicly available URL.

      If you do not want to expose your application on a public route, clear the checkbox.

    2. Optional: Click Routing to display advanced routing options.

      1. Customize the hostname for the route.
      2. Specify the path the router watches.
      3. Select the target port for traffic on the route.
      4. Configure transport security for the route.
    3. Optional: Click Build configuration to display advanced build configuration options, including any environment variables that your model requires to build.
    4. Optional: Click Deployment configuration to display advanced deployment configuration options, including any environment variables that your model requires in its deployment environment.

      For example, if your application integrates OpenShift Streams for Apache Kafka, add your Kafka environment variables here.

    5. Optional: Click Scaling to define the number of pods or application instances to deploy initially.
    6. Optional: Click Resource Limit to set the amount of CPU and Memory resources a container is guaranteed or allowed to use when running.
    7. Optional: Click Labels to add custom labels to your application.
  9. Click Create to create the application and see its build status in the Topology view.

Verification

  • You can view your application in the Topology view.
  • Click the application and check the Resources tab of the application details pane. Look for a success message under Builds, for example, Build #1 is complete.

Chapter 5. Building an OpenShift application using the web console

You can manually tell OpenShift Dedicated to build an existing OpenShift application with the Start Build button in the OpenShift Dedicated web console.

Prerequisites

  • You have Developer access to OpenShift Dedicated.
  • You have created an OpenShift Dedicated application.

Procedure

  1. In OpenShift Dedicated, set the Project dropdown to your application project.
  2. Click Topology.
  3. Click on the application to see the application details pane.
  4. Click the Start build button.

Chapter 6. Automatically rebuilding updated applications using webhooks

You can configure an OpenShift application to automatically rebuild and redeploy whenever updates are made to the Git repository that contains the application code. This ensures that the latest working version of your application is always available.

Prerequisites

  • An OpenShift application created using a GitHub repository as a source.
  • Permissions to change webhook settings in the GitHub repository.

Procedure

  1. In OpenShift Dedicated, change into the Developer perspective and set the Project dropdown to the appropriate project.
  2. Click Topology and click on your application to view the application details pane.
  3. Under Builds, click the name of the build configuration, marked with BC, to view the build configuration page.
  4. Under Webhooks, locate the entry for GitHub and click Copy URL with Secret.
  5. Navigate to your project page in GitHub and click Settings.
  6. Click WebhooksAdd webhook.
  7. Enter the following details on the Add webhook page:

    1. Paste the copied URL with secret into the Payload URL field.
    2. Set Content type to application/json.
    3. Leave all other options as the default.
    4. Click Add webhook.

Verification

  • Make an update to your application code and verify that the application rebuilds and deploys correctly.

Chapter 7. Creating or importing a notebook in JupyterHub

7.1. Creating a new notebook

You can create a new Jupyter notebook from an existing notebook container image to access its resources and properties. The JupyterHub Spawner contains a list of available container images that you can run as a single-user notebook server.

Prerequisites

  • Ensure that you have logged in to Red Hat OpenShift Data Science.
  • Ensure that you have logged in to JupyterHub and launched your notebook server.
  • The notebook image exists in a registry, image stream, and is accessible.

Procedure

  1. Click FileNewNotebook.
  2. If prompted, select a kernel for your notebook from the list.

    If you want to use a kernel, click Select. If you do not want to use a kernel, click No Kernel.

Verification

  • Check that the notebook file is visible in JupyterHub.

7.2. Importing an existing notebook file from a Git repository using JupyterLab

You can use the JupyterLab user interface to clone a Git repository into your workspace to continue your work or integrate files from an external project.

Prerequisites

  • A launched and running JupyterHub server.
  • Read access for the Git repository you want to clone.

Procedure

  1. Copy the HTTPS URL for the Git repository.

    • On GitHub, click ⤓ CodeHTTPS and click the Clipboard button.
    • On GitLab, click Clone and click the Clipboard button under Clone with HTTPS.
  2. In the JupyterLab interface click the Git Clone button ( Git Clone button ).

    You can also click GitClone a repository in the menu, or click the Git icon ( Git button ) and click the Clone a repository button.

    The Clone a repo dialog appears.

  3. Enter the HTTPS URL of the repository that contains your notebook.
  4. Click CLONE.
  5. If prompted, enter your username and password for the Git repository.

Verification

  • Check that the contents of the repository are visible in the file browser in JupyterLab, or run the ls command in the Terminal to verify that the repository is shown as a directory.

Chapter 8. Saving your model as an independent Python function

Turn your data model into an independent Python function so that you can run it outside your notebook server environment and use it in intelligent applications.

Prerequisites

  • You have access to the JupyterHub interface.
  • You have developed a prediction model in a Jupyter notebook.
  • Your Jupyter notebook is saved in a Git repository that was created from the Red Hat OpenShift Data Science sample S2I application repositories.

Procedure

  1. In JupyterHub, create a new prediction.py file.
  2. Edit the prediction.py file to define a predict function based on the prediction model in your Jupyter notebook.

    • Include only the code required to make the prediction. For example, you do not need to import libraries that only related to rendering plots in your Jupyter notebook.
    • If any new packages are required to run your prediction, update the contents of the requirements.txt file and run pip install -r requirements.txt to install the new packages.
  3. Test that you can run the independent Python function from your notebook by calling the function in a new notebook cell, for example:

    from prediction import predict
      predict(data)

Verification

  • The predict function runs correctly and returns the expected output when called from the notebook cell.

8.1. Installing Python packages on your notebook server

You can install Python packages that are not part of the default notebook server image by adding the package and the version to a requirements.txt file and then running the pip install command in a notebook cell.

Note

You can also install packages directly, but Red Hat recommends using a requirements.txt file so that it is easier to deploy your model later.

Prerequisites

  • Log in to JupyterHub and open a notebook.

Procedure

  1. Create a new text file using one of the following methods:

    • Click + to open a new launcher and click Text file.
    • Click FileNewText File.
  2. Rename the text file to requirements.txt.

    1. Right-click on the name of the file and click Rename Text. The Rename File dialog opens.
    2. Enter requirements.txt in the New Name field and click Rename.
  3. Add the packages to install to the requirements.txt file.

    altair

    You can specify the exact version to install by using the == (equal to) operator, for example:

    altair==4.1.0

    To install multiple packages at the same time, place each package on a separate line.

  4. Install the packages in requirements.txt to your server using a notebook cell.

    1. Create a new cell in your notebook and enter the following command.

      !pip install -r requirements.txt
    2. Run the cell by pressing Shift and Enter.
    Important

    This installs the package on your notebook server, but you must still run the import directive in a code cell to use the package in your code.

    import altair

Verification

Chapter 9. Testing your Python function using the sample Flask application

You should test that your independent Python function works as expected before it goes into your application.

Prerequisites

Procedure

  1. In JupyterHub, open the run_flask.ipynb notebook file.
  2. Run all cells in the notebook by clicking CellRun All.

    This starts the Flask application.

Verification

  • Open a terminal in JupyterHub by clicking FileNewTerminal and run the following command:

    curl -X POST -H "Content-Type: application/json" --data '{"data" : "hello world"}' http://localhost:5000/prediction

    Alternatively, enter the following in a new notebook cell and run the cell.

    !curl -X POST -H "Content-Type: application/json" --data '{"data" : "hello world"}' http://localhost:5000/prediction

    If no changes have been made to the sample app, you see a response in the browser similar to {"prediction" : "not implemented"}.

Chapter 10. Pushing project changes to a Git repository

To build and deploy your application in a production environment, upload your work to a remote Git repository.

Prerequisites

  • You have opened a notebook in the JupterHub interface.
  • You have already added the relevant Git repository to your notebook server.
  • You have permission to push changes to the relevant Git repository.
  • You have installed the Git version control extension.

Procedure

  1. Click FileSave All to save any unsaved changes.
  2. Click the Git icon ( Git button ) to open the Git pane in the JupyterHub interface.
  3. Confirm that your changed files appear under Changed.

    If your changed files appear under Untracked, click GitSimple Staging to enable a simplified Git process.

  4. Commit your changes.

    1. Ensure that all files under Changed have a blue checkmark beside them.
    2. In the Summary field, enter a brief description of the changes you made.
    3. Click Commit.
  5. Click GitPush to Remote to push your changes to the remote repository.
  6. When prompted, enter your Git credentials and click OK.

Verification

  • Your most recently pushed changes are visible in the remote Git repository.

Chapter 11. Testing the deployed application endpoint for your prediction function

After you deploy your application, you can test that your prediction function works properly at the deployed endpoint.

Prerequisites

  • Your application is built and deployed with your prediction function included.
  • You know the web address for the application containing your prediction function.

Procedure

  1. Open a terminal in JupyterHub by clicking FileNewTerminal.
  2. Run the following command, replacing <application-url> with the web address for the application, for example, http://myapp-myproject.apps.mycluster.abc1.s1.devshift.org.

    curl -X POST -H "Content-Type: application/json" --data {"data" : "hello world"} <application-url>/prediction

    For example:

    curl -X POST -H "Content-Type: application/json" --data {"data" : "hello world"} http://myapp-myproject.apps.mycluster.abc1.s1.devshift.org/prediction

    Alternatively, enter ! followed by the same command in a new notebook cell and run the cell.

    !curl -X POST -H "Content-Type: application/json" --data {"data" : "hello world"} <application-url>/prediction

Verification

  • The endpoint is working if you receive a response from the application, such as {"prediction" : "not implemented"}.

Legal Notice

Copyright © 2021 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.