Skip to content

OpenVINO GenAI tests NPU support and Windows fixes#1660

Open
helena-intel wants to merge 3 commits intohuggingface:mainfrom
helena-intel:helena/test-ov-genai-npu
Open

OpenVINO GenAI tests NPU support and Windows fixes#1660
helena-intel wants to merge 3 commits intohuggingface:mainfrom
helena-intel:helena/test-ov-genai-npu

Conversation

@helena-intel
Copy link
Copy Markdown
Collaborator

Update OpenVINO GenAI tests

  • Fix issues and access violations caused by TemporaryDirectory on Windows
  • Add initial support for NPU
    • Speech2Text (whisper), selected LLMs and selected VLMs are supported for now. More models will be added later.
  • For LLMs, compare tokens instead of detokenized text. This fixes issues on GPU
  • On GPU, there are a few known failures, mentioned at the top of the file. We are looking into this.
  • Use chat template in VLM tests, which is also done in preprocess_inputs in optimum-intel (if models added in the future will not have this we can make this an option but for now all VLM tests pass with this)

The solution for the temporary directory looks convoluted but this was trickier than expected when we also want to delete the directory if the test fails.

I tested GPU and NPU on LNL 258V with Linux and Windows.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the OpenVINO GenAI integration tests to improve cross-device stability (notably Windows/GPU) and introduce initial NPU coverage.

Changes:

  • Add a pytest-based temp directory/traceback cleanup mechanism to avoid Windows file-handle issues after failures.
  • Add initial NPU support by restricting the tested model sets and skipping unsupported test classes.
  • Make LLM comparisons more robust on GPU by comparing generated token IDs rather than detokenized text; align VLM generation with chat templates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



# NPU does not support f32 inference
TEST_CONFIG = {"CACHE_DIR": ""} if OPENVINO_DEVICE == "NPU" else {**F32_CONFIG, "CACHE_DIR": ""}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEST_CONFIG sets CACHE_DIR to an empty string. In Optimum's OpenVINO integration, CACHE_DIR is treated as an actual directory path when present, so passing "" can lead to an invalid cache path (or unexpected caching behavior) when compiling models. Prefer omitting CACHE_DIR entirely, or set it to a real directory under self.temp_dir if you need deterministic caching behavior in these tests.

Suggested change
TEST_CONFIG = {"CACHE_DIR": ""} if OPENVINO_DEVICE == "NPU" else {**F32_CONFIG, "CACHE_DIR": ""}
TEST_CONFIG = {} if OPENVINO_DEVICE == "NPU" else {**F32_CONFIG}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CACHE_DIR may be set by default to a particular directory, CACHE_DIR="" prevents that. We do not want model caching to be used for testing, even if the default for a particular device is to use model caching.

Comment thread tests/openvino/test_genai.py
Comment thread tests/openvino/test_genai.py
@rkazants
Copy link
Copy Markdown
Collaborator

@helena-intel, please resolve merge conflict. We re-run CI again.

in the meantime, @popovaan, please take a look at this PR.

@helena-intel helena-intel force-pushed the helena/test-ov-genai-npu branch from eef8826 to 0137597 Compare May 1, 2026 15:15
- Fix TemporaryDirectory issues on Windows
- Compare model output tokens instead of tokenized outputs for LLMs
- Initial NPU support
- Use chat template for VLM test
@helena-intel helena-intel force-pushed the helena/test-ov-genai-npu branch from 0137597 to dee6428 Compare May 1, 2026 15:23
- Change supported versions for deepseek and qwen
- ChatGLM issue is caused by NaN in tiny model outputs, tracked by
  internal ticket. For now, remove chatglm from genai tests. This only
  affects chatglm, not chatglm4.
@helena-intel helena-intel force-pushed the helena/test-ov-genai-npu branch from e180fc1 to 57eef28 Compare May 4, 2026 15:57
@rkazants
Copy link
Copy Markdown
Collaborator

rkazants commented May 5, 2026

@anatyrova, @regisss, please take a look at this PR

@rkazants rkazants requested a review from regisss May 5, 2026 11:58
Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants