Using RHAI trainers for progress tracking and checkpointing

Updated -

This guide explains how to use Red Hat OpenShift AI (RHAI) trainers to monitor training progress in real-time and protect your training jobs from interruptions using just-in-time (JIT) checkpointing. RHAI trainers extend the upstream Kubeflow CustomTrainer with automatic instrumentation for HuggingFace Transformers and Training Hub workloads.

Prerequisites

Before u...

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content