RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic model crashes with `Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED`

Solution Verified - Updated 2025-07-15T13:47:11+00:00 -

Issue

Unable to run RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic model system with NVIDIA H100 GPU, observing the following error.

[gpu_model_runner.py:1560] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 3 image items of the maximum feature size.
Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED

Environment

Red Hat AI Inference Server 3.0
NVIDIA H100

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content

Select Your Language

RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic model crashes with `Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED`

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links