RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic model crashes with `Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED`
Issue
- Unable to run RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic model system with NVIDIA H100 GPU, observing the following error.
[gpu_model_runner.py:1560] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 3 image items of the maximum feature size.
Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED
Environment
- Red Hat AI Inference Server 3.0
- NVIDIA H100
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.