You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`net.ladenthin.llama.vision.mmproj`|`MultimodalIntegrationTest`| matching mmproj for the vision model, e.g. `mmproj-SmolVLM-500M-Instruct-Q8_0.gguf`|
587
587
|`net.ladenthin.llama.vision.image`|`MultimodalIntegrationTest`| committed default `src/test/resources/images/test-image.jpg`; override to any png/jpeg/webp/gif on disk |
588
+
|`net.ladenthin.llama.audio.model`|`AudioInputIntegrationTest` (llama.cpp discussion #13759) | audio-input model GGUF, e.g. `ultravox-v0_5-llama-3_2-1b.gguf`|
589
+
|`net.ladenthin.llama.audio.mmproj`|`AudioInputIntegrationTest`| matching audio mmproj/encoder, e.g. `mmproj-ultravox-v0_5-llama-3_2-1b-f16.gguf`|
590
+
|`net.ladenthin.llama.audio.input`|`AudioInputIntegrationTest`| a `.wav`/`.mp3` clip on disk (no committed default — audio is not committed) |
588
591
589
592
Run those tests by setting the property:
590
593
```bash
@@ -596,6 +599,12 @@ mvn test -Dtest=MultimodalIntegrationTest \
596
599
# The vision.image property defaults to src/test/resources/images/test-image.jpg
597
600
# (a CC-BY-4.0 / MIT-granted photo of flowers and bees by the project author);
598
601
# override only if you want to test a different image.
602
+
603
+
# Audio input (Ultravox / Qwen2.5-Omni; the audio clip has no committed default):
Copy file name to clipboardExpand all lines: README.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -284,8 +284,11 @@ Every `net.ladenthin.llama.*` system property recognised by the library, deep-sc
284
284
|`net.ladenthin.llama.vision.model`| unset (test self-skips) | test |`MultimodalIntegrationTest` (upstream kherud/java-llama.cpp#103 / #34) | Path to a vision-capable model GGUF. Any vision-capable GGUF works; CI default is `SmolVLM-500M-Instruct-Q8_0.gguf`. |
285
285
|`net.ladenthin.llama.vision.mmproj`| unset (test self-skips) | test |`MultimodalIntegrationTest`| Matching mmproj GGUF for the vision model. |
286
286
|`net.ladenthin.llama.vision.image`|`src/test/resources/images/test-image.jpg` (a CC-BY-4.0 / MIT-granted photo committed to the repo) | test |`MultimodalIntegrationTest`| Visual prompt image. Any png/jpeg/webp/gif works; the extension drives MIME detection. |
287
+
|`net.ladenthin.llama.audio.model`| unset (test self-skips) | test |`AudioInputIntegrationTest` (llama.cpp discussion #13759) | Path to an audio-input model GGUF (e.g. Ultravox, Qwen2.5-Omni). |
|`net.ladenthin.llama.audio.input`| unset (test self-skips) | test |`AudioInputIntegrationTest`|`.wav`/`.mp3` audio prompt clip; the extension drives format detection. |
287
290
288
-
`MultimodalIntegrationTest` self-skips when any of the three `vision.*` properties points at a missing path, so a partial setup (just the vision model + the committed image, no mmproj) lets the test class load without erroring.
291
+
`MultimodalIntegrationTest` self-skips when any of the three `vision.*` properties points at a missing path, so a partial setup (just the vision model + the committed image, no mmproj) lets the test class load without erroring.`AudioInputIntegrationTest` self-skips the same way over the three `audio.*` properties.
289
292
290
293
## Documentation
291
294
@@ -415,6 +418,30 @@ OpenAI-compatible `/v1/chat/completions` server. For a strictly CPU-only run, us
415
418
`setDevices("none").setMmprojOffload(false)` in addition to `setGpuLayers(0)`; projector offload
416
419
has its own upstream default.
417
420
421
+
**Audio input** works identically — load an audio-capable model (Ultravox, Qwen2.5-Omni, …) with its
422
+
audio `--mmproj` and add a `ContentPart.audioFile(...)` (or `inputAudio(bytes, "wav"|"mp3")`) part. It
423
+
serializes to the OpenAI `input_audio` content part and routes through the same `mtmd` pipeline:
0 commit comments