You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
15
15
- OpenSSF Best Practices badge (project 12862) on README.
16
16
- OpenAI-compatible `parallel_tool_calls` support: `ChatRequest.withParallelToolCalls(Boolean)` / `getParallelToolCalls()`, `InferenceParameters.withParallelToolCalls(boolean)`, and pass-through in the `/v1/chat/completions` server mapper.
17
17
- Real-model tool-calling integration tests for blocking and streaming required tool calls (`ToolCallingIntegrationTest`, Qwen2.5-1.5B-Instruct), wired into CI and `validate-models`.
18
+
- End-to-end vision input across blocking, typed `ChatRequest`, streaming, and OpenAI-compatible request mapping; real-model tests verify that distinct red and blue images produce the correct semantic answers.
19
+
- Explicit `setMmprojAuto(boolean)` and `setMmprojOffload(boolean)` controls, including the upstream `--no-mmproj-auto` and `--no-mmproj-offload` flags.
18
20
19
21
### Changed
20
22
- Unified `CONTRIBUTING.md` and `SECURITY.md` structure with sibling repositories in the project family.
@@ -24,6 +26,11 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
24
26
- Upgraded llama.cpp from b9151 to b9172.
25
27
- Extracted the `chatWithTools` agent loop into `ToolCallingAgent`; tool-result errors (unknown tool / handler exception) are now JSON-serialized so tool names containing special characters remain valid JSON.
26
28
29
+
### Fixed
30
+
- Preserved decoded image buffers across the JNI chat boundary and submitted media requests through llama.cpp's upstream multimodal task path instead of silently tokenizing them as text-only prompts.
31
+
- Preserved multipart image content when using the typed `ChatRequest` serializer.
32
+
- The standalone OpenAI-compatible server now advertises vision only when the loaded model confirms usable vision support.
The same multipart `messages[].content` shape works through `ChatRequest` and the embedded
399
+
OpenAI-compatible `/v1/chat/completions` server. For a strictly CPU-only run, use
400
+
`setDevices("none").setMmprojOffload(false)` in addition to `setGpuLayers(0)`; projector offload
401
+
has its own upstream default.
402
+
376
403
### Tool Calling
377
404
378
405
Use a tool-aware instruct model and enable Jinja when loading it. A typed request can either return
@@ -732,7 +759,7 @@ Forward-looking ideas being tracked for this fork:
732
759
733
760
-**Adopt feature ideas from the Kotlin Llama Stack client.** Candidates (multimodal image input, typed chat messages, async API, batch inference, typed usage/timings) are inventoried with effort estimates in [`docs/feature-investigation-llama-stack-client-kotlin.md`](docs/feature-investigation-llama-stack-client-kotlin.md), derived from [`ogx-ai/llama-stack-client-kotlin`](https://github.com/ogx-ai/llama-stack-client-kotlin).
734
761
-**Ship a directly Android-capable artifact.** Building on the existing [Importing in Android](#importing-in-android) flow and the `opencl-android-aarch64` classifier (see [Choosing the right classifier](#choosing-the-right-classifier)), the goal is a first-class Android Maven artifact — including a typed image-input helper for VLMs such as Qwen2.5-VL — so downstream Android projects can drop their dependency on [`ogx-ai/llama-stack-client-kotlin`](https://github.com/ogx-ai/llama-stack-client-kotlin) entirely.
735
-
-**Resolve all upstream `kherud/java-llama.cpp` open issues.** All 37 open issues at fork time are catalogued with per-issue verdicts in [`docs/history/49be664_open_issues.md`](docs/history/49be664_open_issues.md); fixes land in this fork as they are completed. The remaining headline item is a typed Java image API for multimodal inputs (issues [#103](docs/history/49be664_open_issues.md#103--vlm-support--image-input-for-multimodal-models) and [#34](docs/history/49be664_open_issues.md#34--support-multimodal-inputs), both PARTIALLY FIXED) — the same work that closes §2.1 of the Kotlin feature inventory.
762
+
-**Resolve all upstream `kherud/java-llama.cpp` open issues.** All 37 open issues at fork time are catalogued with per-issue verdicts in [`docs/history/49be664_open_issues.md`](docs/history/49be664_open_issues.md); fixes land in this fork as they are completed. Vision inputs (issues [#103](docs/history/49be664_open_issues.md#103--vlm-support--image-input-for-multimodal-models) and [#34](docs/history/49be664_open_issues.md#34--support-multimodal-inputs)) are now wired end to end through blocking, typed, streaming, and OpenAI-compatible request surfaces.
emitting OAI multipart content; the upstream chat path already routes
26
26
`image_url` blocks through the compiled-in `mtmd` pipeline, so zero new
27
27
JNI was needed).
@@ -305,13 +305,11 @@ Java API.
305
305
Feature request: support visual-language models such as Qwen2.5-VL (image
306
306
inputs) on Android.
307
307
308
-
**Status in fork:** FIXED in the same PR that closes the typed Java surface gap. The build still links the upstream `mtmd` multimodal library into `jllama` (`CMakeLists.txt:125-145, 253-255`) and `ModelParameters`still exposes `setMmproj`, `setMmprojUrl`, `enableMmprojAuto`, `enableMmprojOffload` (`ModelParameters.java:1250-1281`). The previously-missing typed image API now exists: `ContentPart.text(...)`, `ContentPart.imageUrl(...)`, `ContentPart.imageBytes(byte[], mime)`, `ContentPart.imageFile(Path)` (auto-detects png/jpeg/webp/gif), and `ChatMessage(role, List<ContentPart>)` + `ChatMessage.userMultimodal(ContentPart...)`. `InferenceParameters.setMessages(List<ChatMessage>)` serializes parts-bearing messages to the OAI array-form `content` that the upstream `oaicompat_chat_params_parse` already routes through the compiled-in `mtmd` pipeline — zero new JNI required. See PR #189 (§2.1 of `docs/feature-investigation-llama-stack-client-kotlin.md`).
308
+
**Status in fork:** FIXED end to end. The build links the upstream `mtmd` multimodal library into `jllama`, `ModelParameters` exposes projector loading and explicit auto/offload controls, and `ContentPart` + `ChatMessage.userMultimodal(...)` provide the typed image API. The native parser now preserves decoded media buffers and submits them through the upstream multimodal task path; previously those buffers were discarded and requests silently became text-only. Blocking, typed `ChatRequest`, streaming, and OpenAI-compatible mapping are covered, including a real-model semantic red/blue regression.
309
309
310
-
**Deep-dive analysis:**Definitively confirmable from code inspection — no runtime test changes this verdict. Two distinct surfaces exist for VLM:
310
+
**Deep-dive analysis:** Two distinct surfaces exist for VLM:
311
311
1.**Model loading:** fully wired (mmproj path, auto-detect, GPU offload) — these flags reach the upstream server-context unchanged.
312
-
2.**Request payload:** the only path is `LlamaModel.handleChatCompletions(json, oaiCompat=true)` with manually-constructed `messages[].content = [{type:"text",...},{type:"image_url",image_url:{url:"data:image/png;base64,..."}}]` JSON. No typed helper.
313
-
314
-
This is genuinely PARTIALLY FIXED and only a Java-side enhancement closes the gap; no runtime investigation is required to confirm.
312
+
2.**Request payload:** typed and raw OpenAI multipart content is decoded by the OAI parser, retained across JNI dispatch, and processed by `mtmd` before generation.
315
313
316
314
---
317
315
@@ -696,9 +694,9 @@ prebuilt artifact for Android targets.
696
694
Feature request: add multimodal input support (referencing
**Status in fork:** FIXED. The upstream `mtmd` library is built and linked into `jllama` (`CMakeLists.txt:125-145, 253-255`), `ModelParameters` exposes `setMmproj`, `setMmprojUrl`, `enableMmprojAuto`, `enableMmprojOffload` (`ModelParameters.java:1250-1281`), and the typed image API now exists via `ContentPart` + `ChatMessage(role, List<ContentPart>)` + `InferenceParameters.setMessages(List<ChatMessage>)`. The serializer emits OAI array-form `content`; the upstream chat path already understands `image_url` blocks. See #103 for the parallel write-up and PR #189 for the implementation.
697
+
**Status in fork:** FIXED. The upstream `mtmd` library is built and linked into `jllama`; projector controls and the typed `ContentPart` API are exposed; and decoded media is retained by the native bridge and processed through the upstream multimodal task path for blocking and streaming requests. See #103 for the full write-up.
700
698
701
-
**Deep-dive analysis:** Same conclusion as #103 — confirmable from code, no runtime needed. The original 2023 feature request asked for "multimodal input support"; in 2025 terms this splits into model loading (DONE) and request payload (DONE: typed `ChatMessage(role, List<ContentPart>)` + `InferenceParameters.setMessages(List<ChatMessage>)` emit the OAI multipart `content` the upstream chat path already consumes). Verdict is FIXED as of PR #189.
699
+
**Deep-dive analysis:** Same conclusion as #103. Model loading and request execution are both verified; the real-model regression confirms image content affects the answer rather than merely producing a non-empty text-only response.
| 98 | FIXED |`enableEmbedding` + `setPoolingType`; covered by `LlamaEmbeddingsTest#testNomicEmbedLoads` (commit `cba693c`, PR #185; CI downloads `nomic-embed-text-v1.5.f16.gguf`) |`ModelParameters.java:1040,606`|
@@ -804,7 +802,7 @@ or change the verdict.
804
802
| 98 | Reporter's config was *literally*`new ModelParameters().setModel(...).setBatchSize(8192).setUbatchSize(8192)` — **no `enableEmbedding()` call**. The original "bug" was that the bindings did not forward `--embedding` at all; the upstream `result_output` assertion fired because the embedding pipeline was never initialised. |**DONE** — `LlamaEmbeddingsTest#testNomicEmbedLoads` (commit `713d426`) runs the reporter's exact config plus `enableEmbedding()`; gated on `net.ladenthin.llama.nomic.path`; CI downloads the model via `NOMIC_EMBED_MODEL_URL` in `publish.yml`. |
805
803
| 95 | Reporter pastes the `next()` method and argues the design is wrong: when `output.stop=true`, the method returns that output and ends. No model, prompt or reproduction provided. |**DONE** — `LlamaModelTest#testIteratorTerminatesOnRepetitivePrompt` (commit `713d426`) drives the iterator with a repetitive prompt at `nPredict=30`, `temperature=0.0f` and asserts termination within `nPredict+1` outputs. |
806
804
| 80 | Exact repro: Kotlin-style 3 lines (`val params...`, `val model = new LlamaModel(params)`, `model.close()`) with `qwen2-0_5b-instruct-q4_0.gguf`. JDK 17.0.12+7, java-llama.cpp 3.4.1. SIGSEGV in `std::_Rb_tree` during `delete`. Reporter said they intended to follow up with a `-DLLAMA_DEBUG` build but never did. |**DONE** — `MemoryManagementTest#testOpenCloseWithoutGeneration` (commit `713d426`) maps the 3-line repro to 20 iterations of try-with-resources open + immediate close; a JVM crash exits the runner non-zero. |
807
-
| 103 | Specifically asks about **Qwen2.5-VL on Android**. No code attempted. |**DONE (typed API)** — PR #189 ships `ContentPart` + `ChatMessage(role, List<ContentPart>)` + `InferenceParameters.setMessages(List<ChatMessage>)`. Android-sample tail tracked separately. |
805
+
| 103 | Specifically asks about **Qwen2.5-VL on Android**. No code attempted. |**DONE (typed API)** — PR #189 ships `ContentPart` + `ChatMessage(role, List<ContentPart>)` + `InferenceParameters.withMessages(List<ChatMessage>)`. Android-sample tail tracked separately. |
808
806
| 86 | Just a question: "does the CUDA jar handle CPU fallback?". No code. | Not unit-testable. Documentation task. |
| 121 | (Not refetched — Android `aarch64` vs `arm64-v8a` mismatch; already analysed in deep-dive.) | Verified by code; needs an Android boot test, not a unit test. |
@@ -930,7 +928,7 @@ the same pattern as the existing CodeLlama / Jina-Reranker model downloads.
930
928
931
929
| # | Why not unit-testable | Action |
932
930
|---|---|---|
933
-
| 103, 34 | (Historic — typed image API was missing at the time of this audit.) |**DONE in PR #189**: `ContentPart` + `ChatMessage(role, List<ContentPart>)`+ `InferenceParameters.setMessages(List<ChatMessage>)`emit the OAI multipart `content`the upstream chat path already routes through `mtmd`. `MultimodalIntegrationTest` is added under model-gated `Assume`. |
931
+
| 103, 34 | (Historic — typed image API and native media dispatch were incomplete at the time of this audit.) |**DONE**: `ContentPart` + `ChatMessage(role, List<ContentPart>)` emit OAI multipart content, and the JNI bridge retains decoded media for upstream `mtmd` processing. The model-gated integration test covers blocking, typed, streaming, and semantic image handling. |
934
932
| 86 | Question about jar packaging behaviour, not code defect. | Documentation: add a README section "Choosing the right classifier" stating that the CUDA jar requires the CUDA runtime libraries at load time and does not auto-fall-back. |
935
933
| 121, 50 | Android runtime / cross-host build path — needs an emulator boot or a macOS-M2 cross-compile, not a JVM test. | CI matrix expansion: add an Android emulator job that boots a stock `arm64-v8a` AVD and runs the existing `LlamaModelTest` against the dockcross-built `libjllama.so`. |
936
934
@@ -954,7 +952,7 @@ the same pattern as the existing CodeLlama / Jina-Reranker model downloads.
954
952
the residual gap on #121.
955
953
3.**Third PR (feature): SHIPPED as PR #189.** Adds the typed multimodal
0 commit comments