Skip to content

Commit d72ea18

Browse files
committed
Add verification plan with original-issue research and test sketches
Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation.
1 parent 1fa7feb commit d72ea18

1 file changed

Lines changed: 160 additions & 0 deletions

File tree

docs/history/49be664_open_issues.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -696,3 +696,163 @@ Feature request: add multimodal input support (referencing
696696
| 70 | FIXED | `setVocabOnly()` + native branch | `ModelParameters.java:1336`, `jllama.cpp:710-718` |
697697
| 50 | PARTIALLY FIXED | CMake handles ANDROID_ABI; needs e2e test | `CMakeLists.txt:151-153,243` |
698698
| 34 | PARTIALLY FIXED | mtmd linked, no typed image API | `CMakeLists.txt:255` |
699+
700+
---
701+
702+
## Verification plan (from original-issue research)
703+
704+
After fetching the verbatim text of each `LIKELY FIXED` / `PARTIALLY FIXED` issue
705+
on github.com/kherud/java-llama.cpp, the reproduction details are clearer than
706+
the summary lines above suggest. None of the original issues carry attached
707+
files; all relevant context is in the issue body itself, and several refine
708+
or change the verdict.
709+
710+
### What the original issues actually contain
711+
712+
| # | New info from issue body | Test feasibility |
713+
|---|---|---|
714+
| 102 | Exact repro: 10-iteration `new LlamaModel(...).close()` loop with `setThreads(4).setKeep(-1).setCtxSize(1024).setGpuLayers(0)`. Failure mode: GPU OOM exception, CPU swap thrash. No stack trace attached. | **Unit-testable.** Direct port of the reporter's loop to `MemoryManagementTest` with an assertion on `Runtime.getRuntime().totalMemory()` and `/proc/self/status:VmRSS`. |
715+
| 98 | Reporter's config was *literally* `new ModelParameters().setModel(...).setBatchSize(8192).setUbatchSize(8192)`**no `enableEmbedding()` call**. The original "bug" was that the bindings did not forward `--embedding` at all; the upstream `result_output` assertion fired because the embedding pipeline was never initialised. | **Already proven fixed by code.** `ModelParameters.enableEmbedding()` (line 1040) now sets `--embedding`. Optional confirmation test: same config + `enableEmbedding()` against `nomic-embed-text-v1.5.f16.gguf`. |
716+
| 95 | Reporter pastes the `next()` method and argues the design is wrong: when `output.stop=true`, the method returns that output and ends. No model, prompt or reproduction provided. | **Not a real bug** — the cited behaviour is correct iterator semantics. Optional defensive test asserting termination on a repetitive prompt remains useful. |
717+
| 80 | Exact repro: Kotlin-style 3 lines (`val params...`, `val model = new LlamaModel(params)`, `model.close()`) with `qwen2-0_5b-instruct-q4_0.gguf`. JDK 17.0.12+7, java-llama.cpp 3.4.1. SIGSEGV in `std::_Rb_tree` during `delete`. Reporter said they intended to follow up with a `-DLLAMA_DEBUG` build but never did. | **Unit-testable.** Three-line repro maps directly to a `@Test` method. No generation step. |
718+
| 103 | Specifically asks about **Qwen2.5-VL on Android**. No code attempted. | Not unit-testable until a typed image API + an Android sample exist. Tracked as feature work. |
719+
| 86 | Just a question: "does the CUDA jar handle CPU fallback?". No code. | Not unit-testable. Documentation task. |
720+
| 34 | One-line feature request linking upstream PR #3436 (LLaVA). No specifics. | Subsumed by #103. |
721+
| 121 | (Not refetched — Android `aarch64` vs `arm64-v8a` mismatch; already analysed in deep-dive.) | Verified by code; needs an Android boot test, not a unit test. |
722+
| 50 | (Not refetched — Android cross-build on macOS-M2 host; already analysed in deep-dive.) | Verified by CMake logic; needs a cross-compile smoke test, not a unit test. |
723+
724+
### Concrete test plan
725+
726+
Four small JUnit tests close out four `LIKELY FIXED` items. All four belong in
727+
`src/test/java/net/ladenthin/llama/MemoryManagementTest.java` or
728+
`src/test/java/net/ladenthin/llama/LlamaModelTest.java`, reusing the existing
729+
`TestConstants` model path so no new model download is needed except for #98.
730+
731+
#### 1. `MemoryManagementTest.testOpenCloseLoopDoesNotLeak()` — for #102
732+
733+
Direct port of the reporter's repro:
734+
735+
```java
736+
@Test
737+
public void testOpenCloseLoopDoesNotLeak() {
738+
ModelParameters params = new ModelParameters()
739+
.setModel(TestConstants.MODEL_PATH)
740+
.setThreads(4).setKeep(-1).setCtxSize(1024).setGpuLayers(0);
741+
long baseline = currentVmRss();
742+
for (int i = 0; i < 20; i++) {
743+
try (LlamaModel m = new LlamaModel(params)) {
744+
// no-op: open + close
745+
}
746+
}
747+
System.gc();
748+
long after = currentVmRss();
749+
// Allow some slop for JIT/heap growth, but rule out monotonic leak
750+
Assert.assertTrue("VmRSS grew by " + (after - baseline) + " kB over 20 iters",
751+
after - baseline < 200_000); // 200 MB tolerance
752+
}
753+
754+
private static long currentVmRss() {
755+
try {
756+
for (String line : Files.readAllLines(Path.of("/proc/self/status"))) {
757+
if (line.startsWith("VmRSS:")) {
758+
return Long.parseLong(line.replaceAll("\\D+", ""));
759+
}
760+
}
761+
} catch (IOException e) { /* non-Linux */ }
762+
return 0;
763+
}
764+
```
765+
766+
`currentVmRss()` is a no-op on macOS/Windows; the test then degenerates to a
767+
"does not throw / does not crash" smoke test, which is still useful. For
768+
CUDA, add an `nvidia-smi` poll in a sibling `@Test` gated on
769+
`Assume.assumeTrue(hasCuda())`.
770+
771+
#### 2. `MemoryManagementTest.testOpenCloseWithoutGeneration()` — for #80
772+
773+
```java
774+
@Test
775+
public void testOpenCloseWithoutGeneration() {
776+
ModelParameters params = new ModelParameters()
777+
.setModel(TestConstants.MODEL_PATH)
778+
.setCtxSize(512).setGpuLayers(0);
779+
// The original 3.4.1 crash was a one-shot SIGSEGV. Repeat to harden.
780+
for (int i = 0; i < 20; i++) {
781+
try (LlamaModel m = new LlamaModel(params)) {
782+
// No generation between construction and close.
783+
}
784+
}
785+
}
786+
```
787+
788+
A JVM crash exits the JUnit runner with non-zero; if all 20 iterations complete,
789+
the verdict moves to FIXED.
790+
791+
#### 3. `LlamaModelTest.testIteratorTerminatesOnRepetitivePrompt()` — for #95
792+
793+
```java
794+
@Test
795+
public void testIteratorTerminatesOnRepetitivePrompt() {
796+
InferenceParameters infer = new InferenceParameters("Repeat AAA forever: AAA AAA")
797+
.setNPredict(30)
798+
.setTemperature(0.0f);
799+
int count = 0;
800+
try (LlamaIterable it = model.generate(infer)) {
801+
for (LlamaOutput out : it) {
802+
count++;
803+
Assert.assertTrue("iterator overran nPredict", count <= 31);
804+
}
805+
}
806+
Assert.assertTrue("iterator must produce at least 1 token", count >= 1);
807+
}
808+
```
809+
810+
The original "bug" is a design objection, not a defect. Test confirms iteration
811+
terminates deterministically.
812+
813+
#### 4. `LlamaEmbeddingsTest.testNomicEmbedLoads()` — for #98
814+
815+
```java
816+
@Test
817+
public void testNomicEmbedLoads() {
818+
String path = System.getProperty("net.ladenthin.llama.nomic.path");
819+
Assume.assumeNotNull("nomic model path not set", path);
820+
ModelParameters params = new ModelParameters()
821+
.setModel(path).setBatchSize(8192).setUbatchSize(8192)
822+
.enableEmbedding(); // <-- the fix
823+
try (LlamaModel m = new LlamaModel(params)) {
824+
float[] embedding = m.embed("search_query: What is TSNE?");
825+
Assert.assertEquals(768, embedding.length);
826+
}
827+
}
828+
```
829+
830+
Gated on a system property so CI without the 120 MB model file simply skips it.
831+
Optional CI step: download `nomic-embed-text-v1.5.f16.gguf` from HuggingFace in
832+
the same pattern as the existing CodeLlama / Jina-Reranker model downloads.
833+
834+
### Issues that cannot be closed by unit tests alone
835+
836+
| # | Why not unit-testable | Action |
837+
|---|---|---|
838+
| 103, 34 | No image API yet — would require building the feature first. | Roadmap item: add `InferenceParameters.addImage(byte[]|Path)` that constructs the OAI-style multipart `content` array; then add `MultimodalTest` against `Qwen2.5-VL-Q4_K_M.gguf` + matching mmproj. |
839+
| 86 | Question about jar packaging behaviour, not code defect. | Documentation: add a README section "Choosing the right classifier" stating that the CUDA jar requires the CUDA runtime libraries at load time and does not auto-fall-back. |
840+
| 121, 50 | Android runtime / cross-host build path — needs an emulator boot or a macOS-M2 cross-compile, not a JVM test. | CI matrix expansion: add an Android emulator job that boots a stock `arm64-v8a` AVD and runs the existing `LlamaModelTest` against the dockcross-built `libjllama.so`. |
841+
842+
### Recommended sequencing
843+
844+
1. **First PR (small, high-value):** add the four JUnit tests above. Run CI. If
845+
green, update this document to upgrade #80, #95, #102 to **FIXED** and
846+
#98 to **FIXED** (subject to nomic model download in CI).
847+
2. **Second PR (docs):** add the README "Choosing the right classifier" section
848+
to close out #86, and document the 32-bit Android limitation to close out
849+
the residual gap on #121.
850+
3. **Third PR (feature):** add typed `InferenceParameters.addImage(...)` to
851+
close out #103 and #34.
852+
4. **Fourth PR (CI):** add an Android emulator job to formally close #121 and
853+
#50.
854+
855+
Steps 1 and 2 are mechanical and require no design decisions. Step 3 requires
856+
choosing an image-input encoding (`data:` URL vs raw bytes) and is the natural
857+
follow-up. Step 4 is the largest investment but closes the remaining `STILL
858+
POSSIBLE` Android cluster (#79, #81, #82, #91, #117, #121) all at once.

0 commit comments

Comments
 (0)