Skip to content

MODEL_PROPERTIES support for all VLM model roles#3781

Open
dkalinowski wants to merge 10 commits into
openvinotoolkit:masterfrom
dkalinowski:vlm-design-imp2
Open

MODEL_PROPERTIES support for all VLM model roles#3781
dkalinowski wants to merge 10 commits into
openvinotoolkit:masterfrom
dkalinowski:vlm-design-imp2

Conversation

@dkalinowski
Copy link
Copy Markdown
Collaborator

Description

This change enables per-model config for VLMPipeline and ContinuousBatchingPipeline with VLM adapter via MODEL_PROPERTIES key.

Property resolution (low -> high)

  1. Global - top level keys
  2. DEVICE_PROPERTIES - properties propagated to given device
  3. MODEL_PROPERTIES - properties propagated to model with exact role in VLMPipeline

Available roles: vision_embeddings, text_embeddings, resampler, vision_embeddings_merger, vision_embeddings_pos, vision_projection, multi_modal_projector, language_model

Example:

pipe = VLMPipeline(model_dir, "CPU",
    NUM_STREAMS="1",                      # global default
    MODEL_PROPERTIES={
        "vision_embeddings": {"NUM_STREAMS": "4"},   # override for vision encoder
        "language_model":    {"NUM_STREAMS": "2"},   # override for LM
    })

CVS-162621

Following VLM API redesign document:
https://github.com/dkalinowski/openvino.genai/blob/vlm-design/design.md

Checklist:

  • This PR follows GenAI Contributing guidelines.
  • Tests have been updated or added to cover the new code.
  • This PR fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Copilot AI review requested due to automatic review settings April 29, 2026 15:07
@github-actions github-actions Bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: Python API Python API for GenAI no-match-files labels Apr 29, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for per-submodel OpenVINO property overrides for VLM pipelines via the MODEL_PROPERTIES key (with intended precedence over DEVICE_PROPERTIES and global properties), plus validation of supported VLM role names and Python binding support.

Changes:

  • Extend utils::get_model_properties to merge global + DEVICE_PROPERTIES[device] + MODEL_PROPERTIES[role] with defined precedence.
  • Apply per-role property resolution across VLM sub-model compilation sites (vision/text encoders, projectors, resamplers, LM, etc.) and Continuous Batching language model compilation.
  • Add validation helpers (get_known_vlm_model_roles, validate_vlm_model_properties) and expand C++ tests; update Python property conversion to treat MODEL_PROPERTIES/DEVICE_PROPERTIES as AnyMap.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/cpp/utils.cpp Adds unit tests for three-layer property resolution and role validation.
src/python/py_utils.cpp Ensures MODEL_PROPERTIES and DEVICE_PROPERTIES are converted to ov::AnyMap in Python bindings.
src/cpp/src/utils.hpp Updates API docs/signature for property resolution and declares new VLM role helpers.
src/cpp/src/utils.cpp Implements three-layer merge + role validation helpers and known-role list.
src/cpp/src/visual_language/pipeline.cpp Validates role names and applies per-role properties when reading/compiling the language model.
src/cpp/src/continuous_batching/pipeline.cpp Validates role names and reads LM with MODEL_PROPERTIES-filtered props; adjusts InputsEmbedder creation.
src/cpp/src/continuous_batching/pipeline_impl.cpp Resolves per-role props for LM before adapter extraction/compilation.
src/cpp/src/visual_language/vision_encoder.cpp Uses per-role properties for vision embeddings compilation.
src/cpp/src/visual_language/videochat_flash/classes.cpp Uses per-role properties for vision embeddings / vision projection compilation.
src/cpp/src/visual_language/qwen3_vl/classes.cpp Uses per-role properties for vision embeddings position model compilation.
src/cpp/src/visual_language/qwen2vl/classes.cpp Uses per-role properties for vision embeddings + merger compilation.
src/cpp/src/visual_language/phi4mm/classes.cpp Uses per-role properties for vision projection compilation.
src/cpp/src/visual_language/phi3_vision/classes.cpp Uses per-role properties for vision embeddings + vision projection compilation.
src/cpp/src/visual_language/minicpm/classes.cpp Uses per-role properties for resampler compilation.
src/cpp/src/visual_language/llava_next_video/classes.cpp Uses per-role properties for multi-modal projector + resampler + vision embeddings compilation.
src/cpp/src/visual_language/embedding_model.cpp Uses per-role properties for text embeddings read/compile.

Comment thread tests/cpp/utils.cpp
Comment thread src/cpp/src/utils.hpp
Comment thread src/cpp/src/utils.cpp Outdated
Comment thread src/cpp/src/visual_language/pipeline.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces per-submodel property overrides for VLM and Continuous Batching pipelines via the MODEL_PROPERTIES meta key, with defined precedence across global, DEVICE_PROPERTIES, and role-specific overrides.

Changes:

  • Extend utils::get_model_properties to resolve properties using a 3-layer precedence (globals → DEVICE_PROPERTIES[device]MODEL_PROPERTIES[role]) and strip meta keys before they reach OpenVINO plugins.
  • Apply per-role property resolution across VLM sub-model compile sites and ContinuousBatching language-model initialization.
  • Add role validation (validate_vlm_model_properties / get_known_vlm_model_roles), Python binding support for dict→ov::AnyMap conversion, and comprehensive C++ unit tests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/cpp/utils.cpp Adds unit tests covering DEVICE_PROPERTIES + MODEL_PROPERTIES precedence, stripping behavior, and role validation.
src/python/py_utils.cpp Treats MODEL_PROPERTIES/DEVICE_PROPERTIES as ov::AnyMap in Python bindings to support nested dicts.
src/cpp/src/utils.hpp Updates API/docs for get_model_properties; declares known-role list and validation helper.
src/cpp/src/utils.cpp Implements 3-layer property resolution plus known-role list and validation.
src/cpp/src/visual_language/pipeline.cpp Validates roles and applies per-role properties to language-model compile paths (NPU + non-NPU).
src/cpp/src/continuous_batching/pipeline.cpp Validates VLM role names and strips MODEL_PROPERTIES before model read/initialization.
src/cpp/src/continuous_batching/pipeline_impl.cpp Resolves/strips per-role props for language-model compilation before adapter extraction.
src/cpp/src/visual_language/vision_encoder.cpp Routes vision-encoder compilation through get_model_properties for role overrides.
src/cpp/src/visual_language/videochat_flash/classes.cpp Applies per-role properties to vision/projection model compilation.
src/cpp/src/visual_language/qwen2vl/classes.cpp Applies per-role properties to vision encoder + merger compilation.
src/cpp/src/visual_language/qwen3_vl/classes.cpp Applies per-role properties to positional-embeddings model compilation.
src/cpp/src/visual_language/phi4mm/classes.cpp Applies per-role properties to vision-projection model compilation.
src/cpp/src/visual_language/phi3_vision/classes.cpp Applies per-role properties to vision/projection model compilation.
src/cpp/src/visual_language/minicpm/classes.cpp Applies per-role properties to resampler compilation.
src/cpp/src/visual_language/llava_next_video/classes.cpp Applies per-role properties to multi-modal projector + resampler compilation.
src/cpp/src/visual_language/embedding_model.cpp Applies per-role properties consistently for text-embeddings read/compile steps.

Comment thread src/cpp/src/visual_language/pipeline.cpp
Comment thread src/cpp/src/visual_language/pipeline.cpp Outdated
Comment thread src/cpp/src/utils.hpp
Comment on lines 149 to +163
/// @brief Resolve properties for @p model_role by merging two layers (priority low to high):
/// 1. global (top-level keys, excluding meta keys PER_MODEL_PROPERTIES)
/// 2. PER_MODEL_PROPERTIES[model_role]
/// 1. global (top-level keys, excluding meta keys PER_MODEL_PROPERTIES
/// and DEVICE_PROPERTIES if device is specified)
/// 2. DEVICE_PROPERTIES[device] (only when @p device is non-empty)
/// 3. PER_MODEL_PROPERTIES[model_role]
/// MODEL_PROPERTIES wins over DEVICE_PROPERTIES wins
/// over globals.
/// @param properties The main properties map. Not modified.
/// @param model_role Sub-model role (e.g. "vision_embeddings").
/// @param device Target device for the compile site. When empty,
/// DEVICE_PROPERTIES is forwarded as-is (used at read_model sites
/// which are not bound to a specific device).
/// @return A new ov::AnyMap with the merged result. The input map is left
/// untouched so callers may continue using the meta keys.
ov::AnyMap get_model_properties(ov::AnyMap& properties, const std::string& model_role);
ov::AnyMap get_model_properties(const ov::AnyMap& properties, const std::string& model_role, const std::string& device = "");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in next PR

const ov::AnyMap& properties) {
ov::Core core = utils::singleton_core();
std::shared_ptr<ov::Model> m_model = core.read_model(model_dir / "openvino_text_embeddings_model.xml", {}, properties);
const auto plugin_props = utils::get_model_properties(properties, "text_embeddings", device);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would const auto text_embeddings_props = ... be more relevant here?

Comment thread src/cpp/src/utils.hpp
Comment on lines +352 to +357
const std::vector<std::string>& get_known_vlm_model_roles();

/// @brief Throws if `properties[MODEL_PROPERTIES]` contains a role name
/// not in `known_roles`. No-op if the key is absent.
void validate_vlm_model_properties(const ov::AnyMap& properties,
const std::vector<std::string>& known_roles);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need known_roles parameter? And exposing get_known_vlm_model_roles() function?
As I can see, in all validate_vlm_model_properties() usages get_known_vlm_model_roles() is called. Only in tests another vector of know roles is passed, but I think test case with invalid properties map would be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants