Add support for Ouro (ByteDance/Ouro-1.4B)#1783
Conversation
14df50e to
b32d639
Compare
There was a problem hiding this comment.
Pull request overview
Adds OpenVINO export/inference support for the ByteDance Ouro Universal Transformer decoder family by introducing a normalized config that reports an effective layer count (to expose the correct number of past key/value pairs), plus wiring the architecture into the OpenVINO test matrix and documentation.
Changes:
- Register
ouroin the OpenVINO TasksManager with a customNormalizedOuroConfigthat expandsnum_layers = num_hidden_layers * total_ut_steps. - Add
ouroto OpenVINO decoder/export/CLI/quantization test parametrizations (gated totransformers>=4.53.0,<5) and tiny model mappings. - Document Ouro as a supported OpenVINO architecture.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/openvino/utils_tests.py | Adds tiny Ouro test model mapping, expected node counts, and remote-code gating for <5. |
| tests/openvino/test_quantization.py | Adds Ouro to auto-compression test coverage under the <5 transformers gate. |
| tests/openvino/test_exporters_cli.py | Adds Ouro to CLI export test matrix and expected tokenizer model count. |
| tests/openvino/test_export.py | Adds Ouro to exporter integration tests under the <5 transformers gate. |
| tests/openvino/test_decoder.py | Adds Ouro to decoder integration tests and aligns comparison behavior with other custom-cache remote-code models. |
| optimum/exporters/openvino/model_configs.py | Registers ouro OpenVINO config and introduces NormalizedOuroConfig to expand effective layer count. |
| docs/source/openvino/models.mdx | Lists Ouro among supported OpenVINO architectures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class OuroOpenVINOConfig(TextDecoderWithPositionIdsOpenVINOConfig): | ||
| DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, MistralDummyPastKeyValuesGenerator) | ||
| DUMMY_PKV_GENERATOR_CLASS = MistralDummyPastKeyValuesGenerator | ||
| NORMALIZED_CONFIG_CLASS = NormalizedOuroConfig | ||
| MIN_TRANSFORMERS_VERSION = "4.53.0" | ||
| _MODEL_PATCHER = OVDecoderModelPatcher |
| - OLMo 2 | ||
| - OPT | ||
| - Orion | ||
| - Ouro |
There was a problem hiding this comment.
document this model in the bottom and share links. Check other trust-remote-code models like Kokoro.
Do we need any additinal library to setup for model loading?
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
b32d639 to
f3ad828
Compare
Ouro is a Universal Transformer decoder: the same num_hidden_layers decoder layers are looped total_ut_steps times, and every iteration stores its own key/value entry. Register an OpenVINO export config with a NormalizedOuroConfig that reports num_layers = num_hidden_layers * total_ut_steps so the exported model exposes the right number of past-key-value pairs. No model patching is required beyond the standard OVDecoderModelPatcher. Add Ouro to the decoder, export, exporters-cli and quantization tests (tiny-random-ouro, full vocab + real tokenizer, total_ut_steps=4). Because Ouro's custom UniversalTransformerCache makes cached generation diverge from uncached under left-padding + beam search (this happens in PyTorch too), Ouro joins the other remote-code custom-cache models that compare against transformers with use_cache=False. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, MistralDummyPastKeyValuesGenerator) | ||
| DUMMY_PKV_GENERATOR_CLASS = MistralDummyPastKeyValuesGenerator | ||
| NORMALIZED_CONFIG_CLASS = NormalizedOuroConfig | ||
| MIN_TRANSFORMERS_VERSION = "4.53.0" |
There was a problem hiding this comment.
since this is trust-remote-code model, what is max version?
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
What does this PR do?
This PR adds OpenVINO export and inference support for the Ouro model family (
OuroForCausalLM).Ouro is a Universal Transformer decoder: the same
num_hidden_layersdecoder layers are loopedtotal_ut_stepstimes, and every iteration stores its own key/value entry. The only thing required for a correct export is a normalized config that reportsnum_layers = num_hidden_layers * total_ut_steps, so the exported model exposes the right number of past-key-value pairs. No custom model patching or new operators are needed — the stockOVDecoderModelPatcheris reused.Export the model:
optimum-cli export openvino --model ByteDance/Ouro-1.4B --trust-remote-code ouro_ovRun inference:
Notes:
transformersv5, so tests are gated to>=4.53.0, <5(validated againsttransformers==4.57.1).UniversalTransformerCachemakes cached generation diverge from uncached under left-padding + beam search (this happens intransformerstoo), so in tests Ouro joins the other remote-code custom-cache models that compare against transformers withuse_cache=False.optimum-intel-internal-testing/tiny-random-ouro(full vocab + real tokenizer,total_ut_steps=4).Fixes # (issue)
Before submitting