Skip to content

Add support for Ouro (ByteDance/Ouro-1.4B)#1783

Open
openvino-dev-samples wants to merge 4 commits into
huggingface:mainfrom
openvino-dev-samples:add-ouro-support
Open

Add support for Ouro (ByteDance/Ouro-1.4B)#1783
openvino-dev-samples wants to merge 4 commits into
huggingface:mainfrom
openvino-dev-samples:add-ouro-support

Conversation

@openvino-dev-samples

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR adds OpenVINO export and inference support for the Ouro model family (OuroForCausalLM).

Ouro is a Universal Transformer decoder: the same num_hidden_layers decoder layers are looped total_ut_steps times, and every iteration stores its own key/value entry. The only thing required for a correct export is a normalized config that reports num_layers = num_hidden_layers * total_ut_steps, so the exported model exposes the right number of past-key-value pairs. No custom model patching or new operators are needed — the stock OVDecoderModelPatcher is reused.

Export the model:

optimum-cli export openvino --model ByteDance/Ouro-1.4B --trust-remote-code ouro_ov

Run inference:

from transformers import AutoTokenizer
from optimum.intel import OVModelForCausalLM

model_id = "ByteDance/Ouro-1.4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id, export=True, trust_remote_code=True)

inputs = tokenizer("The quick brown fox", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

Notes:

  • Ouro relies on remote modeling code that is incompatible with transformers v5, so tests are gated to >=4.53.0, <5 (validated against transformers==4.57.1).
  • Ouro's custom UniversalTransformerCache makes cached generation diverge from uncached under left-padding + beam search (this happens in transformers too), so in tests Ouro joins the other remote-code custom-cache models that compare against transformers with use_cache=False.
  • A tiny test model is added at optimum-intel-internal-testing/tiny-random-ouro (full vocab + real tokenizer, total_ut_steps=4).

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@openvino-dev-samples openvino-dev-samples marked this pull request as draft June 11, 2026 02:21
@openvino-dev-samples openvino-dev-samples marked this pull request as ready for review June 11, 2026 02:41
@rkazants rkazants requested review from Copilot and echarlaix June 11, 2026 09:56
@rkazants rkazants self-requested a review June 11, 2026 09:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenVINO export/inference support for the ByteDance Ouro Universal Transformer decoder family by introducing a normalized config that reports an effective layer count (to expose the correct number of past key/value pairs), plus wiring the architecture into the OpenVINO test matrix and documentation.

Changes:

  • Register ouro in the OpenVINO TasksManager with a custom NormalizedOuroConfig that expands num_layers = num_hidden_layers * total_ut_steps.
  • Add ouro to OpenVINO decoder/export/CLI/quantization test parametrizations (gated to transformers>=4.53.0,<5) and tiny model mappings.
  • Document Ouro as a supported OpenVINO architecture.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/openvino/utils_tests.py Adds tiny Ouro test model mapping, expected node counts, and remote-code gating for <5.
tests/openvino/test_quantization.py Adds Ouro to auto-compression test coverage under the <5 transformers gate.
tests/openvino/test_exporters_cli.py Adds Ouro to CLI export test matrix and expected tokenizer model count.
tests/openvino/test_export.py Adds Ouro to exporter integration tests under the <5 transformers gate.
tests/openvino/test_decoder.py Adds Ouro to decoder integration tests and aligns comparison behavior with other custom-cache remote-code models.
optimum/exporters/openvino/model_configs.py Registers ouro OpenVINO config and introduces NormalizedOuroConfig to expand effective layer count.
docs/source/openvino/models.mdx Lists Ouro among supported OpenVINO architectures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +497 to +502
class OuroOpenVINOConfig(TextDecoderWithPositionIdsOpenVINOConfig):
DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, MistralDummyPastKeyValuesGenerator)
DUMMY_PKV_GENERATOR_CLASS = MistralDummyPastKeyValuesGenerator
NORMALIZED_CONFIG_CLASS = NormalizedOuroConfig
MIN_TRANSFORMERS_VERSION = "4.53.0"
_MODEL_PATCHER = OVDecoderModelPatcher
Comment thread docs/source/openvino/models.mdx Outdated
- OLMo 2
- OPT
- Orion
- Ouro

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document this model in the bottom and share links. Check other trust-remote-code models like Kokoro.
Do we need any additinal library to setup for model loading?

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Ouro is a Universal Transformer decoder: the same num_hidden_layers
decoder layers are looped total_ut_steps times, and every iteration
stores its own key/value entry. Register an OpenVINO export config with a
NormalizedOuroConfig that reports num_layers = num_hidden_layers *
total_ut_steps so the exported model exposes the right number of
past-key-value pairs. No model patching is required beyond the standard
OVDecoderModelPatcher.

Add Ouro to the decoder, export, exporters-cli and quantization tests
(tiny-random-ouro, full vocab + real tokenizer, total_ut_steps=4).
Because Ouro's custom UniversalTransformerCache makes cached generation
diverge from uncached under left-padding + beam search (this happens in
PyTorch too), Ouro joins the other remote-code custom-cache models that
compare against transformers with use_cache=False.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread docs/source/openvino/models.mdx Outdated
DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, MistralDummyPastKeyValuesGenerator)
DUMMY_PKV_GENERATOR_CLASS = MistralDummyPastKeyValuesGenerator
NORMALIZED_CONFIG_CLASS = NormalizedOuroConfig
MIN_TRANSFORMERS_VERSION = "4.53.0"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is trust-remote-code model, what is max version?

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
Comment thread optimum/exporters/openvino/model_configs.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants