PyTorch Optimizations from Intel

Posted on

Hi everyone,

I'm trying to deploy a model server on OpenShift AI using a TorchServe image that supports the Intel Extension for PyTorch. My goal is to leverage Intel AMX for optimized model inference.

Has anyone successfully configured this setup? If so, could you provide guidance or examples of the necessary YAML configuration and steps to get this running?

Any help or pointers to relevant documentation would be greatly appreciated!

The modle I am working with currently is the Llama-3-8b-inference.

Thanks in advance!

Responses