You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add verification plan with original-issue research and test sketches
Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from
github.com/kherud/java-llama.cpp and append a Verification plan section
with: (a) a table of new info extracted from each issue body, (b) four
concrete JUnit test sketches that would close out #80, #95, #98, #102,
(c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the
corresponding action (feature, docs, CI matrix), (d) a recommended PR
sequencing.
Notable finding: #98's original repro did not call enableEmbedding()
at all — the binding never forwarded --embedding to the upstream
server-context, so the result_output assertion fired because the
embedding pipeline was never initialised. enableEmbedding() now
exists in ModelParameters (line 1040), so the fix is essentially
code-confirmed; an integration test against nomic-embed-text is
optional confirmation.
| 34 | PARTIALLY FIXED | mtmd linked, no typed image API |`CMakeLists.txt:255`|
699
+
700
+
---
701
+
702
+
## Verification plan (from original-issue research)
703
+
704
+
After fetching the verbatim text of each `LIKELY FIXED` / `PARTIALLY FIXED` issue
705
+
on github.com/kherud/java-llama.cpp, the reproduction details are clearer than
706
+
the summary lines above suggest. None of the original issues carry attached
707
+
files; all relevant context is in the issue body itself, and several refine
708
+
or change the verdict.
709
+
710
+
### What the original issues actually contain
711
+
712
+
| # | New info from issue body | Test feasibility |
713
+
|---|---|---|
714
+
| 102 | Exact repro: 10-iteration `new LlamaModel(...).close()` loop with `setThreads(4).setKeep(-1).setCtxSize(1024).setGpuLayers(0)`. Failure mode: GPU OOM exception, CPU swap thrash. No stack trace attached. |**Unit-testable.** Direct port of the reporter's loop to `MemoryManagementTest` with an assertion on `Runtime.getRuntime().totalMemory()` and `/proc/self/status:VmRSS`. |
715
+
| 98 | Reporter's config was *literally*`new ModelParameters().setModel(...).setBatchSize(8192).setUbatchSize(8192)` — **no `enableEmbedding()` call**. The original "bug" was that the bindings did not forward `--embedding` at all; the upstream `result_output` assertion fired because the embedding pipeline was never initialised. |**Already proven fixed by code.**`ModelParameters.enableEmbedding()` (line 1040) now sets `--embedding`. Optional confirmation test: same config + `enableEmbedding()` against `nomic-embed-text-v1.5.f16.gguf`. |
716
+
| 95 | Reporter pastes the `next()` method and argues the design is wrong: when `output.stop=true`, the method returns that output and ends. No model, prompt or reproduction provided. |**Not a real bug** — the cited behaviour is correct iterator semantics. Optional defensive test asserting termination on a repetitive prompt remains useful. |
717
+
| 80 | Exact repro: Kotlin-style 3 lines (`val params...`, `val model = new LlamaModel(params)`, `model.close()`) with `qwen2-0_5b-instruct-q4_0.gguf`. JDK 17.0.12+7, java-llama.cpp 3.4.1. SIGSEGV in `std::_Rb_tree` during `delete`. Reporter said they intended to follow up with a `-DLLAMA_DEBUG` build but never did. |**Unit-testable.** Three-line repro maps directly to a `@Test` method. No generation step. |
718
+
| 103 | Specifically asks about **Qwen2.5-VL on Android**. No code attempted. | Not unit-testable until a typed image API + an Android sample exist. Tracked as feature work. |
719
+
| 86 | Just a question: "does the CUDA jar handle CPU fallback?". No code. | Not unit-testable. Documentation task. |
720
+
| 34 | One-line feature request linking upstream PR #3436 (LLaVA). No specifics. | Subsumed by #103. |
721
+
| 121 | (Not refetched — Android `aarch64` vs `arm64-v8a` mismatch; already analysed in deep-dive.) | Verified by code; needs an Android boot test, not a unit test. |
722
+
| 50 | (Not refetched — Android cross-build on macOS-M2 host; already analysed in deep-dive.) | Verified by CMake logic; needs a cross-compile smoke test, not a unit test. |
723
+
724
+
### Concrete test plan
725
+
726
+
Four small JUnit tests close out four `LIKELY FIXED` items. All four belong in
727
+
`src/test/java/net/ladenthin/llama/MemoryManagementTest.java` or
728
+
`src/test/java/net/ladenthin/llama/LlamaModelTest.java`, reusing the existing
729
+
`TestConstants` model path so no new model download is needed except for #98.
730
+
731
+
#### 1. `MemoryManagementTest.testOpenCloseLoopDoesNotLeak()` — for #102
float[] embedding = m.embed("search_query: What is TSNE?");
825
+
Assert.assertEquals(768, embedding.length);
826
+
}
827
+
}
828
+
```
829
+
830
+
Gated on a system property so CI without the 120 MB model file simply skips it.
831
+
Optional CI step: download `nomic-embed-text-v1.5.f16.gguf` from HuggingFace in
832
+
the same pattern as the existing CodeLlama / Jina-Reranker model downloads.
833
+
834
+
### Issues that cannot be closed by unit tests alone
835
+
836
+
| # | Why not unit-testable | Action |
837
+
|---|---|---|
838
+
| 103, 34 | No image API yet — would require building the feature first. | Roadmap item: add `InferenceParameters.addImage(byte[]|Path)` that constructs the OAI-style multipart `content` array; then add `MultimodalTest` against `Qwen2.5-VL-Q4_K_M.gguf` + matching mmproj. |
839
+
| 86 | Question about jar packaging behaviour, not code defect. | Documentation: add a README section "Choosing the right classifier" stating that the CUDA jar requires the CUDA runtime libraries at load time and does not auto-fall-back. |
840
+
| 121, 50 | Android runtime / cross-host build path — needs an emulator boot or a macOS-M2 cross-compile, not a JVM test. | CI matrix expansion: add an Android emulator job that boots a stock `arm64-v8a` AVD and runs the existing `LlamaModelTest` against the dockcross-built `libjllama.so`. |
841
+
842
+
### Recommended sequencing
843
+
844
+
1.**First PR (small, high-value):** add the four JUnit tests above. Run CI. If
845
+
green, update this document to upgrade #80, #95, #102 to **FIXED** and
846
+
#98 to **FIXED** (subject to nomic model download in CI).
847
+
2.**Second PR (docs):** add the README "Choosing the right classifier" section
848
+
to close out #86, and document the 32-bit Android limitation to close out
849
+
the residual gap on #121.
850
+
3.**Third PR (feature):** add typed `InferenceParameters.addImage(...)` to
851
+
close out #103 and #34.
852
+
4.**Fourth PR (CI):** add an Android emulator job to formally close #121 and
853
+
#50.
854
+
855
+
Steps 1 and 2 are mechanical and require no design decisions. Step 3 requires
856
+
choosing an image-input encoding (`data:` URL vs raw bytes) and is the natural
857
+
follow-up. Step 4 is the largest investment but closes the remaining `STILL
858
+
POSSIBLE` Android cluster (#79, #81, #82, #91, #117, #121) all at once.
0 commit comments