Skip to content

Add OpenVINO export and inference support for MedASR (google/medasr)#1745

Open
padatta wants to merge 1 commit into
huggingface:mainfrom
padatta:medasr-openvino-support
Open

Add OpenVINO export and inference support for MedASR (google/medasr)#1745
padatta wants to merge 1 commit into
huggingface:mainfrom
padatta:medasr-openvino-support

Conversation

@padatta
Copy link
Copy Markdown

@padatta padatta commented May 21, 2026

Description:

What does this PR do?

Adds OpenVINO export, inference, and full INT8 quantization support for google/medasr (model_type=lasr_ctc).

Changes

Export (optimum/exporters/openvino/model_configs.py):

  • DummyLasrCtcAudioInputGenerator: generates input_features [batch, time, features] + attention_mask [batch, time] with random_mask_tensor
  • LasrCtcOpenVINOConfig: registered for lasr_ctc  automatic-speech-recognition task via AutoModelForCTC

Inference (optimum/intel/openvino/modeling.py):

  • OVModelForCTC.forward(): handles input_features  input_values naming and conditionally passes attention_mask
  • OVModelForCTC._preprocess_quantization_config(): auto-sets processor from model_name_or_path (mirrors Whisper/Seq2Seq pattern)

Quantization (optimum/intel/openvino/quantization.py):

  • OVModelForCTC branch in build_from_quantization_config() to route CTC models to speech-to-text calibration datasets (librispeech)
  • OVModelForCTC added to build_from_dataset() isinstance check
  • _prepare_ctc_calibration_data(): collects audio calibration inputs via InferRequestWrapper

CLI (optimum/exporters/openvino/__main__.py):

  • CTC model detection in _main_quantize() for weight compression

Tests & Docs:

  • Gated tests behind RUN_SLOW_EXPORT_TESTS=1 + transformers>=5.0
  • MedASR entry in docs/source/openvino/models.mdx

Add OpenVINO export, inference, and quantization support for google/medasr
(model_type=lasr_ctc):

Export:
- Add LasrCtcOpenVINOConfig with custom DummyLasrCtcAudioInputGenerator
  (input_features [batch, time, features] + attention_mask)
- Register lasr_ctc in TasksManager custom classes (AutoModelForCTC)

Inference:
- Update OVModelForCTC.forward() to handle input_features naming
  and conditionally pass attention_mask

Quantization:
- Add OVModelForCTC._preprocess_quantization_config() for automatic
  processor resolution (mirrors Whisper/Seq2Seq pattern)
- Add OVModelForCTC branch in build_from_quantization_config() to
  route CTC models to speech-to-text calibration datasets
- Add OVModelForCTC to build_from_dataset() isinstance check
- Add _prepare_ctc_calibration_data() method for collecting audio
  calibration inputs via InferRequestWrapper
- Add CTC model detection in _main_quantize() for weight compression

Tests & Docs:
- Add gated tests (RUN_SLOW_EXPORT_TESTS=1, transformers>=5.0)
- Add MedASR entry to supported models documentation

Verified on Intel Arc iGPU + CPU:
- FP16 and INT8 weight-only: cosine sim >= 0.9999, token match >= 99%
- INT8 full quantization (32 LibriSpeech samples): 2.9x CPU speedup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant