Skip to content

Auto Transformers Version For Model Export#1827

Open
apaniukov wants to merge 2 commits into
huggingface:mainfrom
apaniukov:auto-transformers-version
Open

Auto Transformers Version For Model Export#1827
apaniukov wants to merge 2 commits into
huggingface:mainfrom
apaniukov:auto-transformers-version

Conversation

@apaniukov

@apaniukov apaniukov commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Add --transformers-version to optimum-cli export openvino.

Different models need different transformers versions to export (some newer than optimum-intel's <5.1 pin). Today users must manually pip install transformers==X per model. This adds a flag that switches transformers on the fly in an ephemeral uv environment, leaving the user's env untouched:

optimum-cli export openvino -m <model> --transformers-version=auto <out>   # infer
optimum-cli export openvino -m <model> --transformers-version=5.2.0 <out>  # pin

When a switch is needed, the command re-execs under uv run --active --with transformers==<X>, overlaying only transformers on the existing venv (torch/openvino reused, ~3–5s). With no switch needed, uv is never invoked.

auto resolution:

  1. read the model's MIN/MAX_TRANSFORMERS_VERSION from the OpenVINO export config;
  2. fall back to the "transformers_version" in the model's config.json for unknown architectures;
  3. abort with a clear message if the model is newer than optimum-intel's MAX (use explicit --transformers-version=X to override).

Robustness: uv's overlay precedence is non-monotonic (0.8.0–0.8.3 silently ignore it), so the re-exec'd process re-verifies the active version actually satisfies the requirement and errors clearly otherwise. Adds a transformers-switch extra pinned to uv>=0.8.4.

Changes:

  • new transformers_version.py (version resolution + re-exec);
  • --transformers-version flag in openvino.py;
  • transformers-switch extra in setup.py;
  • 19 unit tests.
Usage example
(venv) ~/python/optimum-intel/temp$ pip show transformers | grep Version
Version: 4.57.6

(venv) ~/python/optimum-intel/temp$ optimum-cli export openvino -m optimum-intel-internal-testing/tiny-random-qwen3.5-moe --task="image-text-to-text" qwen3.5-moe
Traceback (most recent call last):
  File "/home/apaniuko/python/optimum-intel/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1360, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/apaniuko/python/optimum-intel/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__
    raise KeyError(key)
KeyError: 'qwen3_5_moe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/apaniuko/python/optimum-intel/venv/bin/optimum-cli", line 6, in <module>
    sys.exit(main())
  File "/home/apaniuko/python/optimum-intel/venv/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 219, in main
    service.run()
  File "/home/apaniuko/python/optimum-intel/optimum/commands/export/openvino.py", line 482, in run
    main_export(
  File "/home/apaniuko/python/optimum-intel/optimum/exporters/openvino/__main__.py", line 321, in main_export
    task = infer_task(
  File "/home/apaniuko/python/optimum-intel/optimum/exporters/openvino/__main__.py", line 132, in infer_task
    config = AutoConfig.from_pretrained(
  File "/home/apaniuko/python/optimum-intel/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1362, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `qwen3_5_moe` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

(venv) ~/python/optimum-intel/temp$ optimum-cli export openvino -m optimum-intel-internal-testing/tiny-random-qwen3.5-moe --task="image-text-to-text" --transformers-version=auto qwen3.5-moe
Re-running export in an isolated `uv` environment with `transformers>=5.2.0,<=5.2.99` (current: transformers==4.57.6).
`torch_dtype` is deprecated! Use `dtype` instead!
The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|████████████████████████████████████████████████████████████████████████| 105/105 [00:00<00:00, 1241.13it/s, Materializing param=model.visual.pos_embed.weight]
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
/home/apaniuko/.cache/uv/archive-v0/f3cfEd7Wg4jYL6Su/lib/python3.10/site-packages/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py:1473: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if position_ids.ndim == 3 and position_ids.shape[0] == 4:
/home/apaniuko/.cache/uv/archive-v0/f3cfEd7Wg4jYL6Su/lib/python3.10/site-packages/transformers/masking_utils.py:192: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (padding_length := kv_length + kv_offset - attention_mask.shape[-1]) > 0:
/home/apaniuko/python/optimum-intel/optimum/exporters/openvino/patching_utils.py:247: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(0.0, device=mask.device, dtype=dtype),
/home/apaniuko/python/optimum-intel/optimum/exporters/openvino/patching_utils.py:248: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(torch.finfo(torch.float16).min, device=mask.device, dtype=dtype),
/home/apaniuko/.cache/uv/archive-v0/f3cfEd7Wg4jYL6Su/lib/python3.10/site-packages/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py:1521: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if cache_position[0] > 0 or (attention_mask is not None and torch.all(attention_mask == 1)):
/home/apaniuko/.cache/uv/archive-v0/f3cfEd7Wg4jYL6Su/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py:77: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  is_causal = query.shape[2] > 1 and attention_mask is None and is_causal

(venv) ~/python/optimum-intel/temp$ optimum-cli export openvino -m optimum-intel-internal-testing/tiny-random-gemma4-unified --task="image-text-to-text" --transf
ormers-version=auto gemma4-unified
config.json: 2.33kB [00:00, 4.67MB/s]
Re-running export in an isolated `uv` environment with `transformers>=5.10,<=5.10.99` (current: transformers==4.57.6).
Installed 28 packages in 62ms
[transformers] `torch_dtype` is deprecated! Use `dtype` instead!
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34.8M/34.8M [00:04<00:00, 8.55MB/s]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:00<00:00, 4828.83it/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 216/216 [00:00<00:00, 544kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 2.39MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32.2M/32.2M [00:01<00:00, 18.0MB/s]
processor_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.41k/1.41k [00:00<00:00, 2.13MB/s]
[transformers] `loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
/home/apaniuko/python/optimum-intel/optimum/exporters/openvino/model_patcher.py:9308: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  vision_group_ids = torch.where(is_vision, vision_group_ids, torch.tensor(-1, dtype=torch.int32, device=device))

(venv) ~/python/optimum-intel/temp$ ls gemma4-unified/
config.json               openvino_detokenizer.xml     openvino_text_embeddings_model.bin  openvino_tokenizer.xml                preprocessor_config.json  tokenizer.json
generation_config.json    openvino_language_model.bin  openvino_text_embeddings_model.xml  openvino_vision_embeddings_model.bin  processor_config.json
openvino_detokenizer.bin  openvino_language_model.xml  openvino_tokenizer.bin              openvino_vision_embeddings_model.xml  tokenizer_config.json

(venv) ~/python/optimum-intel/temp$ pip show transformers | grep Version
Version: 4.57.6

🤖 Generated with Claude Code

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

[--quantization-statistics-path QUANTIZATION_STATISTICS_PATH]
[--num-samples NUM_SAMPLES] [--disable-stateful] [--disable-convert-tokenizer]
[--smooth-quant-alpha SMOOTH_QUANT_ALPHA]
[--transformers-version TRANSFORMERS_VERSION]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this new option. let us have it OOB to be working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants