Description:
Is your feature request related to a problem? Please describe.
Currently, the seamless_m4t_v2 architecture (specifically facebook/seamless-m4t-v2-large) is not natively supported for OpenVINO export via optimum-intel.
When attempting to export the model using optimum-cli, it fails with the following error:
ValueError: Trying to export a seamless_m4t_v2 model, that is a custom or unsupported architecture, but no custom export configuration was passed as `custom_export_configs`.
Describe the solution you'd like
I would like native support for the seamless_m4t_v2 architecture in optimum-intel. Specifically, the ability to:
- Export: Export the model to OpenVINO format (
.xml, .bin).
- Quantization/Precision: Apply INT8 (and optionally INT4) weight compression via NNCF, as well as support for general FP16/bfloat16 precision options.
- Inference Pipeline: Provide parity with Transformers'
SeamlessM4Tv2ForSpeechToSpeech (or appropriate OVModel* wrappers/pipelines) to handle the specific graph of S2ST (units, vocoder, multiple heads) rather than standard encoder-decoder seq2seq.
Describe alternatives you've considered
Currently, the only way to run this model is via standard PyTorch, which lacks OpenVINO hardware acceleration on Intel integrated graphics and CPUs for this specific multimodal architecture. Cascaded pipelines (Whisper + MT + SpeechT5) are an alternative, but they lack the unified "latent-based" S2ST performance and quality of SeamlessM4T.
Environment & Reproduction Details:
- OS: Ubuntu 24.04 Server
- Python version: Python 3.12.3
optimum version: 2.1.0
optimum-intel version: 1.27.0
transformers version: 4.57.6
Command run:
optimum-cli export openvino --model facebook/seamless-m4t-v2-large --weight-format int8 seamless_s2st_int8/
Additional context
SeamlessM4T-v2 is a state-of-the-art open-source model capable of direct Speech-to-Speech translation. Adding OpenVINO export support would be a massive performance boost for edge computing and local AI applications running on Intel hardware.
Description:
Is your feature request related to a problem? Please describe.
Currently, the
seamless_m4t_v2architecture (specificallyfacebook/seamless-m4t-v2-large) is not natively supported for OpenVINO export viaoptimum-intel.When attempting to export the model using
optimum-cli, it fails with the following error:Describe the solution you'd like
I would like native support for the
seamless_m4t_v2architecture inoptimum-intel. Specifically, the ability to:.xml,.bin).SeamlessM4Tv2ForSpeechToSpeech(or appropriateOVModel*wrappers/pipelines) to handle the specific graph of S2ST (units, vocoder, multiple heads) rather than standard encoder-decoder seq2seq.Describe alternatives you've considered
Currently, the only way to run this model is via standard PyTorch, which lacks OpenVINO hardware acceleration on Intel integrated graphics and CPUs for this specific multimodal architecture. Cascaded pipelines (Whisper + MT + SpeechT5) are an alternative, but they lack the unified "latent-based" S2ST performance and quality of SeamlessM4T.
Environment & Reproduction Details:
optimumversion: 2.1.0optimum-intelversion: 1.27.0transformersversion: 4.57.6Command run:
optimum-cli export openvino --model facebook/seamless-m4t-v2-large --weight-format int8 seamless_s2st_int8/Additional context
SeamlessM4T-v2 is a state-of-the-art open-source model capable of direct Speech-to-Speech translation. Adding OpenVINO export support would be a massive performance boost for edge computing and local AI applications running on Intel hardware.