Add OpenVINO export and inference support for MedASR (google/medasr)#1745
Open
padatta wants to merge 1 commit into
Open
Add OpenVINO export and inference support for MedASR (google/medasr)#1745padatta wants to merge 1 commit into
padatta wants to merge 1 commit into
Conversation
Add OpenVINO export, inference, and quantization support for google/medasr (model_type=lasr_ctc): Export: - Add LasrCtcOpenVINOConfig with custom DummyLasrCtcAudioInputGenerator (input_features [batch, time, features] + attention_mask) - Register lasr_ctc in TasksManager custom classes (AutoModelForCTC) Inference: - Update OVModelForCTC.forward() to handle input_features naming and conditionally pass attention_mask Quantization: - Add OVModelForCTC._preprocess_quantization_config() for automatic processor resolution (mirrors Whisper/Seq2Seq pattern) - Add OVModelForCTC branch in build_from_quantization_config() to route CTC models to speech-to-text calibration datasets - Add OVModelForCTC to build_from_dataset() isinstance check - Add _prepare_ctc_calibration_data() method for collecting audio calibration inputs via InferRequestWrapper - Add CTC model detection in _main_quantize() for weight compression Tests & Docs: - Add gated tests (RUN_SLOW_EXPORT_TESTS=1, transformers>=5.0) - Add MedASR entry to supported models documentation Verified on Intel Arc iGPU + CPU: - FP16 and INT8 weight-only: cosine sim >= 0.9999, token match >= 99% - INT8 full quantization (32 LibriSpeech samples): 2.9x CPU speedup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
What does this PR do?
Adds OpenVINO export, inference, and full INT8 quantization support for google/medasr (
model_type=lasr_ctc).Changes
Export (
optimum/exporters/openvino/model_configs.py):DummyLasrCtcAudioInputGenerator: generatesinput_features[batch, time, features]+attention_mask[batch, time]withrandom_mask_tensorLasrCtcOpenVINOConfig: registered forlasr_ctc→automatic-speech-recognitiontask viaAutoModelForCTCInference (
optimum/intel/openvino/modeling.py):OVModelForCTC.forward(): handlesinput_features→input_valuesnaming and conditionally passesattention_maskOVModelForCTC._preprocess_quantization_config(): auto-setsprocessorfrommodel_name_or_path(mirrors Whisper/Seq2Seq pattern)Quantization (
optimum/intel/openvino/quantization.py):OVModelForCTCbranch inbuild_from_quantization_config()to route CTC models to speech-to-text calibration datasets (librispeech)OVModelForCTCadded tobuild_from_dataset()isinstance check_prepare_ctc_calibration_data(): collects audio calibration inputs viaInferRequestWrapperCLI (
optimum/exporters/openvino/__main__.py):_main_quantize()for weight compressionTests & Docs:
RUN_SLOW_EXPORT_TESTS=1+transformers>=5.0docs/source/openvino/models.mdx