Add Qwen3-VL multimodal support#132
Conversation
|
Just tried the commit on RTX4090 and L40 , it has error report like following: |
|
Don't format the original code of others. |
|
Hi @tuanhe Thanks for the detailed report and stack trace. I don’t have RTX 4090 or L40 on hand, so I’m not able to reproduce the OOM locally on the same hardware. If you’re mainly constrained by VRAM, you could try a smaller vision–language checkpoint (e.g. Qwen3.5-0.8B), which tends to use noticeably less memory than larger VL models. I’ve also opened a newer PR that focuses on Qwen3.5 support: #232 — feedback there is welcome too. |
|
@linzm1007 |
Summary
bench_multimodal.pyandexample_multimodal.pyfor benchmarking and quick testingBenchmark
CUDA_VISIBLE_DEVICES=0 python3 bench_multimodal.py --model ~/huggingface/Qwen3-VL-2B-InstructTesting
python3 example_multimodal.pyNotes