ComfyUI-MultiModal-Prompt-Nodes

Version: 1.0.14 License: GPL-3.0

Multimodal prompt generator nodes for ComfyUI, designed to generate prompts for Qwen-Image-Edit and Wan2.2.
Supports local LLM / local GGUF models (Qwen2.5-VL, Qwen3-VL, Qwen3.5 and Qwen3.6) and Qwen API for image and video prompt generation and enhancement.

Upgrade Notes for Existing Users

The following notes are intended for existing users upgrading from older releases.

Qwen API model list updated for Qwen3.6 and Qwen3.7

Cloud API model selection now includes newer Qwen3.6 and Qwen3.7 models, including qwen3.7-plus, qwen3.7-max, qwen3.6-plus, and qwen3.6-flash. qwen3.7-plus is used as the default API model and is available for vision input workflows. qwen3.7-max is available for text-only API workflows. Older API models remain selectable for existing workflow compatibility, with deprecation notices such as:

deprecated: announced offline since 2026-05-13 for models already announced as offline.
deprecated: offline scheduled 2026-07-13 for models announced for the July 13, 2026 offline window.
deprecated: legacy notices for legacy models kept for compatibility.

Expanded search paths for local Qwen-family GGUF models

In addition to models/LLM, this release now searches models/text_encoders and its subdirectories for GGUF files. Because this changes how model paths are handled internally, you may need to reselect your models the first time you run the node after updating.

Improved prompt generation for Qwen2.5-VL in Qwen Image Edit Prompt Generator

There was previously a bug in how the system prompt was applied, which prevented Qwen2.5-VL from producing appropriate output in Qwen Image Edit Prompt Generator. In v1.0.9, this has been fixed and the system prompts have been strengthened, resulting in more reliable output.

Vision Input Compatibility

Starting from v1.0.8, image input for Qwen2.5-VL is now available in version 0.3.16 of llama-cpp-python(official). Vision input support varies by model and llama-cpp-python version. See Installation section for detailed compatibility information. Results may vary based on your specific environment.

Recommendation: Use Qwen API or Qwen3-VL with Qwen-Image-Edit. Qwen2.5-VL currently shows insufficient adherence to user prompts under the existing system prompt configuration.

Local GGUF Model Stability

Starting from v1.0.6, internal GGUF model handling has been improved to ensure stable behavior when switching between different Qwen3-VL models (e.g. 8B ↔ 4B), with mmproj files now being properly reloaded as part of the model switching process.

Important Notes

Language Recommendation for Optimal Results

Based on extensive testing, Wan2.2 and Qwen-Image-Edit respond significantly better to Chinese prompts than English prompts.

Recommendation: Set target_language to "zh" (Chinese) for best results with these models, even if your input is in English. The models will generate more coherent and instruction-following outputs.

Features

1. Vision LLM Node

Flexible prompting styles:
- raw: Direct LLM response without system prompt
- default: Balanced prompt enhancement
- detailed: Rich visual details (colors, textures, lighting, atmosphere)
- concise: Minimal keywords, focused on core elements
- creative: Artistic interpretation with unique perspectives
Multi-image input: Support batch image input via ComfyUI's batch nodes (e.g., Images Batch Multiple)
Local GGUF support: Run Qwen2.5-VL, Qwen3-VL, Qwen3.5 and Qwen3.6 models locally
Auto-detect mmproj: Automatic detection or manual selection

2. Qwen Image Edit Prompt Generator

Image editing prompts: Specialized for Qwen-Image-Edit tasks
Optimized for Chinese: Better performance with Chinese language prompts
Multi-image support: Up to 3 images via optional inputs (image2/image3)
Dynamic model selection: Auto-detect local GGUF models and cloud API models
Auto-detect mmproj: Automatic detection or manual selection
API key management: Centralized configuration via api_key.txt

3. Wan Video Prompt Generator

Video generation prompts: Optimized for Wan2.2 text-to-video and image-to-video
Task-specific optimization: Separate prompts for T2V and I2V workflows
Optimized for Chinese: Better performance with Chinese language prompts
Extended token limit: 2048 tokens to support longer Chinese prompts (600+ characters)
Dynamic model selection: Auto-detect local GGUF models and cloud API models
Auto-detect mmproj: Automatic detection or manual selection
API key management: Centralized configuration via api_key.txt

Installation

1. Clone Repository

Clone this repository into your ComfyUI custom_nodes folder:

cd ComfyUI/custom_nodes
git clone https://github.com/yourusername/ComfyUI-MultiModal-Prompt-Nodes.git

2. Install Non-LLM Dependencies

Install the lightweight dependencies used by the nodes for Dashscope API access, image handling, and array processing:

cd ComfyUI-MultiModal-Prompt-Nodes
pip install dashscope pillow numpy

These packages are required for both API-based workflows and local GGUF workflows.

Do not use pip install -r requirements.txt if you plan to install a custom llama-cpp-python build for local multimodal models. The requirements file includes the official PyPI package, which may overwrite or conflict with a custom build. Install llama-cpp-python separately in the next step according to the model family and backend you want to use.

3. Install llama-cpp-python (REQUIRED for local models)

Skip this step if you only use Qwen API / Dashscope models. Local GGUF models require llama-cpp-python in the same Python environment that runs ComfyUI.

Model support depends heavily on the llama-cpp-python build. Vision support, GPU acceleration, and newer Qwen multimodal handlers can vary by version, operating system, Python version, GPU driver, and backend.

Recommended choice:

Qwen3-VL or Qwen3.5 / Qwen3.6 local models: use a recent JamePeng llama-cpp-python fork build that matches your OS, Python version, and acceleration backend.
Qwen2.5-VL only: the official PyPI package can be used as a fallback.

Based on the author's test environment:

Version	Qwen2.5-VL	Qwen3-VL	Qwen3.5/3.6
0.3.16 (official)	✅	❌	❌
0.3.21+ (JamePeng fork)	✅	✅	❌
0.3.36+ (JamePeng fork)	✅	✅	✅

Note: This table is a compatibility reference, not a guarantee. Your results may differ depending on your hardware, drivers, Python environment, and the exact wheel or build options used.

Recommended installation for Qwen3-VL / Qwen3.5 / Qwen3.6:

Follow the build and installation instructions in the JamePeng fork: https://github.com/JamePeng/llama-cpp-python

This fork usually requires a custom build or a backend-specific wheel. A simple pip install llama-cpp-python may install the official package instead, which does not provide the same multimodal compatibility.

Fallback for Qwen2.5-VL only:

pip install llama-cpp-python

After installing or changing llama-cpp-python, restart ComfyUI so the nodes load the updated package.

4. Place Models

Place your GGUF models in ComfyUI/models/LLM/ or ComfyUI/models/text_encoders/:

ComfyUI/models/LLM/
├── Qwen2.5VL-7B-F16_0.gguf
├── Qwen3VL-8B-Instruct-Q8_0.gguf
├── mmproj-Qwen2.5-VL-7B-Instruct-F16.gguf
└── mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf

5. Configure API Key (Optional, for cloud models)

For cloud API usage, create api_key.txt in the node folder:

ComfyUI/custom_nodes/ComfyUI-MultiModal-Prompt-Nodes/api_key.txt

Add your Alibaba Cloud Dashscope API key to this file.

Usage

Vision LLM Node

Inputs:

prompt: Text prompt to rewrite/enhance
style: Prompt rewriting style
- raw: Direct LLM response without system prompt (useful for custom prompting)
- default: Balanced prompt enhancement
- detailed: Rich visual details
- concise: Minimal, focused keywords
- creative: Artistic interpretation
target_language: Output language (auto/en/zh)
model: Select from auto-detected local GGUF models
mmproj: mmproj file selection
- (Auto-detect): Automatically search for matching mmproj
- (Not required): For text-only mode
- Specific file: Manually select mmproj file
max_tokens: Maximum tokens to generate (default: 512)
temperature: Sampling temperature (0.0-2.0, default: 0.7)
device: CPU or GPU execution
image (optional): Input image for vision-language processing

Example workflow:

Load Vision LLM Node
Enter basic prompt: "a cat sitting on a windowsill"
Attach image via batch node (optional)
Select model
Choose (Auto-detect) for mmproj or select specific file
Select style: default
Set device: CPU or GPU
Run to get enhanced prompt

Qwen Image Edit Prompt Generator

Inputs:

image: Primary input image (optional)
prompt: Edit instruction or image description
prompt_style:
- Qwen-Image-Edit: For image editing tasks
- Qwen-Image: For general image understanding
target_language: Output language (auto/zh/en)
llm_model: Model selection
- Local: xxx: Local GGUF models (auto-detected)
- API models: qwen3.7-plus, qwen3.7-max, qwen3.6-plus, qwen3.6-flash, etc.
mmproj: mmproj file (required for local models)
- (Auto-detect): Automatic detection
- (Not required): For API models or text-only mode
- Specific file: Manual selection
max_retries: Retry attempts for API calls (default: 3)
device: CPU/GPU selection for local models
save_tokens: Compress images to save API tokens
image2/image3 (optional): Additional context images

Use cases:

Image editing prompt generation
Multi-image context prompts
Style transfer descriptions
Visual question answering

Recommended settings:

For best results: Set target_language to zh (Chinese)
Use local models for privacy, API models for quality
Enable save_tokens when using API models

Wan Video Prompt Generator

Inputs:

prompt: Video scene description
task_type:
- Text-to-Video: Generate video from text description
- Image-to-Video: Generate video from image + text
target_language: Output language (auto/zh/en)
llm_model: Model selection
- Local: xxx: Local GGUF models
- API models: qwen3.7-plus, qwen3.7-max, qwen3.6-plus, qwen3.6-flash, etc.
mmproj: mmproj selection (same as other nodes)
max_retries: API retry attempts
device: CPU/GPU for local models
save_tokens: Image compression for API
image (optional): Reference frame for I2V tasks

Optimized for:

Wan2.2 video generation
Temporal coherence descriptions
Camera movement instructions
Scene transitions

Important notes:

Use Chinese prompts (target_language: zh) for best results
Supports up to 600+ Chinese characters (2048 tokens)
For I2V tasks, use qwen3.7-plus, Qwen3.6, or qwen-vl-* models

Example T2V workflow:

Enter prompt: "A cat looking out from a windowsill"
Set task_type: Text-to-Video
Set target_language: zh
Select model (local or API)
Run to get optimized video prompt

Example I2V workflow:

Attach input image
Enter motion description: "The camera slowly pushes in"
Set task_type: Image-to-Video
Set target_language: zh
Ensure model supports vision (qwen3.7-plus, Qwen3.6, or qwen-vl-*)
Run to get I2V prompt

Model Compatibility

Qwen2.5-VL (Separate mmproj)

✅ Qwen2.5-VL(3B/7B/32B): Full vision support
✅ Requires matching mmproj file
❌ Insufficient adherence to user prompts under the existing system prompt configuration with Qwen-Image-Edit

Qwen3-VL (Separate mmproj)

✅ Qwen3-VL(4B/8B): Full vision support with JamePeng fork
✅ Requires matching mmproj file

Qwen3.5/3.6 (Separate mmproj)

✅ Qwen3.5/3.6(9B/27B/35B-A3B): Full vision support with JamePeng fork
✅ Requires matching mmproj file

Model Sources

Qwen models: https://huggingface.co/Qwen
GGUF conversions: https://huggingface.co/models?search=qwen+gguf
mmproj files: Usually bundled with GGUF conversions

Troubleshooting

Installation Issues

Q: "No module named 'llama_cpp'" error
A: Install llama-cpp-python: pip install llama-cpp-python

Q: pip install fails with "externally-managed-environment"
A: Use --break-system-packages flag or create a virtual environment

Q: "Failed to load model" with Qwen3-VL
A: Ensure you're using llama-cpp-python 0.3.21+ (JamePeng fork). Version 0.3.16 doesn't support Qwen3-VL.

Runtime Issues

Q: After updating to v1.0.9, I get an error like Value not in list: llm_model: 'Local:'.
A: Try reselecting your GGUF model and mmproj file. In v1.0.9, the GGUF search paths were expanded to include models/text_encoders and its subdirectories in addition to models/LLM. As a result, internal model paths may change, and the first run after updating may require you to select the GGUF model and mmproj again.

Q: Qwen Image Edit Prompt Generator does not produce appropriate output when I use Qwen2.5-VL.
A: There was a bug in how the system prompt was applied. This was fixed in v1.0.9, and the system prompts were also improved to produce more appropriate output.

Q: "mmproj not specified" error
A: Select an mmproj file (or choose (Auto-detect)) in the mmproj dropdown for local models

Q: "No models found" in model dropdown
A:

Place GGUF models in ComfyUI/models/LLM/ or ComfyUI/models/text_encoders/
Restart ComfyUI
Verify file extensions are .gguf

Q: Models stored via extra_model_paths.yaml (e.g. on a different drive) are not listed in the dropdown
A: Fixed in v1.0.11. Versions ≤ 1.0.10 only searched folder_paths.models_dir/LLM and folder_paths.models_dir/text_encoders directly and ignored paths registered by extra_model_paths.yaml. This meant models on a separate drive — such as W:\ai-models\text_encoders\ — were never discovered even though ComfyUI itself could load them.

Current versions call folder_paths.get_folder_paths("text_encoders") and folder_paths.get_folder_paths("llm") first, which returns every path registered by ComfyUI, including entries from extra_model_paths.yaml. Models on other drives or in non-default locations are resolved as absolute paths so the rest of the load pipeline still works correctly.

Q: mmproj auto-detect fails with "no mmproj matched the model family prefix" even though a mmproj-*.gguf file exists next to the model
A: Fixed in v1.0.11. Auto-detect previously required the mmproj filename to begin with the model's family prefix (e.g. mmproj-qwen3.5-BF16.gguf). Files with generic names like mmproj-BF16.gguf were rejected.

Current versions still prefer a family-prefixed mmproj. If no family-prefixed mmproj is found but there is exactly one mmproj-*.gguf in the model's directory, that file is used automatically with a fallback warning in the log. If multiple unmatched mmproj files are present, you still need to select one manually.

Q: Vision input not working with Qwen2.5-VL
A: Use v1.0.8 or later. Fixed bug.

Q: Out of memory errors
A:

Use smaller quantization (Q4_K_M instead of Q8_0)
Reduce max_tokens parameter
Close other applications
Use a smaller model (4B instead of 7B)

Q: Slow inference on CPU
A: Normal for large models. Consider:

Q4_K_M quantization (faster than Q8_0)
Smaller models (4B faster than 7B)
GPU acceleration if available

Q: "API_KEY is not set" error with local models
A: This error should only appear when using API models. If using local models (starting with "Local:"), this is a bug - please report it.

Output Quality Issues

Q: Wan2.2 output is incoherent or doesn't follow instructions
A: Set target_language to zh (Chinese). Wan2.2 performs significantly better with Chinese prompts, even if your input is in English.

Q: Qwen-Image-Edit not understanding my edits
A:

Use target_language: zh for better results
Be specific in edit instructions
Try using reference examples in your prompt

Q: Output is cut off or incomplete
A: Increase max_tokens parameter (Vision LLM Node) or note that other nodes have fixed limits (512 for Qwen, 2048 for Wan)

Device Selection Issues

Q: How to choose between CPU and GPU?
A:

GPU: Faster inference, requires compatible hardware (NVIDIA with CUDA)
CPU: Universal compatibility, slower but stable
Recommendation: Start with CPU, switch to GPU if available and working

Q: GPU selected but still using CPU
A: Your GPU may not be compatible with llama-cpp-python. Check:

NVIDIA GPU with CUDA support
llama-cpp-python built with CUDA support
Driver installation

API Key Management

For Cloud API Models

Create api_key.txt in the node directory:

ComfyUI/custom_nodes/ComfyUI-MultiModal-Prompt-Nodes/api_key.txt

Add your Alibaba Cloud Dashscope API key (single line, no quotes)
The key will be automatically loaded by Qwen and Wan nodes when using cloud API models

Security Notes

Never commit api_key.txt to version control
The file is listed in .gitignore by default
API keys are only loaded when using cloud API models
Local models don't require API keys

Examples

See the examples/ directory for:

Basic prompt enhancement workflows
Multi-image vision processing
Image editing prompt generation
Video prompt generation (T2V and I2V)
Style-specific optimizations

License

This project is licensed under the GNU General Public License v3.0.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Note: GPL-3.0 is required due to llama-cpp-python dependency.

For full details, see the LICENSE file and AUTHORS.md.

Internal Structure Notes (for Advanced Users)

This repository may introduce internal structural changes over time (e.g. extracting Local GGUF or Cloud API implementations into separate modules) to improve maintainability and stability.

Node interfaces (INPUT / RETURN types) are intended to remain stable
Internal refactors will be documented in the changelog
The backends/ directory added in v1.0.6 is a non-functional placeholder for future internal refactoring

No user action is required.

Credits

Derived From / Inspirations

This project is a restructured and extended ComfyUI custom node collection, derived from the following GPL-3.0 licensed projects:

ComfyUI-QwenPromptRewriter: lihaoyun6 (GPL-3.0)
ComfyUI-QwenVL: 1038lab (GPL-3.0)

For detailed attribution, file-level mapping, and contribution notes, see AUTHORS.md.

Key Dependencies / Providers

llama-cpp-python: Andrei Betlen
Qwen3-VL support: JamePeng's llama-cpp-python fork
Qwen models: Alibaba Cloud Qwen Team
Dashscope API: Alibaba Cloud

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Areas needing help:

Testing on different hardware configurations
Documenting vision input compatibility across environments
Additional workflow examples
Performance optimizations

Support

Issues: Report bugs or request features via GitHub Issues
Documentation: See CHANGELOG.md for version history
Examples: Check examples/ for workflow templates

Changelog

See CHANGELOG.md for detailed version history.

Current Version: 1.0.14

Added compatibility with JamePeng llama-cpp-python MTMD handlers that use either clip_model_path or mmproj_path
Fixed recent Qwen local GGUF vision builds falling back to text-only mode when the handler expects mmproj_path
Improved mmproj logging so the selected projector is reported consistently without printing None

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
backends		backends
examples		examples
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
import_utils.py		import_utils.py
local_gguf_utils.py		local_gguf_utils.py
pyproject.toml		pyproject.toml
qwen_api_models.py		qwen_api_models.py
qwen_nodes.py		qwen_nodes.py
requirements.txt		requirements.txt
vision_llm_node.py		vision_llm_node.py
wan_nodes.py		wan_nodes.py

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-MultiModal-Prompt-Nodes

Qwen API model list updated for Qwen3.6 and Qwen3.7

Expanded search paths for local Qwen-family GGUF models

Improved prompt generation for Qwen2.5-VL in Qwen Image Edit Prompt Generator

Vision Input Compatibility

Local GGUF Model Stability

Important Notes

Language Recommendation for Optimal Results

Features

1. Vision LLM Node

2. Qwen Image Edit Prompt Generator

3. Wan Video Prompt Generator

Installation

1. Clone Repository

2. Install Non-LLM Dependencies

3. Install llama-cpp-python (REQUIRED for local models)

4. Place Models

5. Configure API Key (Optional, for cloud models)

Usage

Vision LLM Node

Qwen Image Edit Prompt Generator

Wan Video Prompt Generator

Model Compatibility

Qwen2.5-VL (Separate mmproj)

Qwen3-VL (Separate mmproj)

Qwen3.5/3.6 (Separate mmproj)

Model Sources

Troubleshooting

Installation Issues

Runtime Issues

Output Quality Issues

Device Selection Issues

API Key Management

For Cloud API Models

Security Notes

Examples

License

Internal Structure Notes (for Advanced Users)

Credits

Derived From / Inspirations

Key Dependencies / Providers

Contributing

Support

Changelog

Current Version: 1.0.14

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages