Skip to content

Feature Request: Native OpenVINO export and NNCF quantization support for SeamlessM4T-v2 (seamless_m4t_v2) #1667

@seferdemirci

Description

@seferdemirci

Description:

Is your feature request related to a problem? Please describe.
Currently, the seamless_m4t_v2 architecture (specifically facebook/seamless-m4t-v2-large) is not natively supported for OpenVINO export via optimum-intel.

When attempting to export the model using optimum-cli, it fails with the following error:

ValueError: Trying to export a seamless_m4t_v2 model, that is a custom or unsupported architecture, but no custom export configuration was passed as `custom_export_configs`.

Describe the solution you'd like
I would like native support for the seamless_m4t_v2 architecture in optimum-intel. Specifically, the ability to:

  1. Export: Export the model to OpenVINO format (.xml, .bin).
  2. Quantization/Precision: Apply INT8 (and optionally INT4) weight compression via NNCF, as well as support for general FP16/bfloat16 precision options.
  3. Inference Pipeline: Provide parity with Transformers' SeamlessM4Tv2ForSpeechToSpeech (or appropriate OVModel* wrappers/pipelines) to handle the specific graph of S2ST (units, vocoder, multiple heads) rather than standard encoder-decoder seq2seq.

Describe alternatives you've considered
Currently, the only way to run this model is via standard PyTorch, which lacks OpenVINO hardware acceleration on Intel integrated graphics and CPUs for this specific multimodal architecture. Cascaded pipelines (Whisper + MT + SpeechT5) are an alternative, but they lack the unified "latent-based" S2ST performance and quality of SeamlessM4T.

Environment & Reproduction Details:

  • OS: Ubuntu 24.04 Server
  • Python version: Python 3.12.3
  • optimum version: 2.1.0
  • optimum-intel version: 1.27.0
  • transformers version: 4.57.6

Command run:

optimum-cli export openvino --model facebook/seamless-m4t-v2-large --weight-format int8 seamless_s2st_int8/

Additional context
SeamlessM4T-v2 is a state-of-the-art open-source model capable of direct Speech-to-Speech translation. Adding OpenVINO export support would be a massive performance boost for edge computing and local AI applications running on Intel hardware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions