Skip to content

Commit df2656d

Browse files
Merge pull request #264 from bernardladenthin/claude/todo-audit-cleanup
fix(loader): finish SpotBugs cleanup (LlamaLoader AFBR + PRMC) + docs(todo): mark audit findings resolved
2 parents 14c5f8e + 7c2c663 commit df2656d

2 files changed

Lines changed: 60 additions & 92 deletions

File tree

TODO.md

Lines changed: 45 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,14 @@ cross-cutting initiative.
1313

1414
## Open — jllama-specific
1515

16-
### Code audit — pre-existing correctness / safety findings (open)
16+
### Code audit — pre-existing correctness / safety findings (RESOLVED — PRs #258 + #260)
1717

18-
A multi-area audit (2026-06-20) of the **existing** codebase (independent of the tools / vision /
19-
session PRs) surfaced the items below. None are regressions from recent work. They are intentionally
20-
**split into tiers so each can land as its own small, focused PR** rather than one large change.
21-
Items marked **✓ verified** were re-confirmed by reading the cited code directly; the rest were
22-
read-confirmed by the audit but warrant a quick re-check before the fix.
23-
24-
**Progress (`feature/todo_fixes`):** Tier 1 (N1, N2, J1, P1), Tier 2 (N3, N4, S1, J3, J5) and Tier 3
25-
(S2, S3, P3, NaN/Inf JSON, OSInfo `exec`, discovery-route auth doc, `completeBatch` leak, probabilities
26-
doc) **DONE** — verified (452/452 C++ tests, affected Java tests + new regressions, Javadoc + Spotless +
27-
clang-format 22.1.5 clean).
18+
A multi-area audit (2026-06-20) of the **existing** codebase surfaced 18 correctness/safety findings,
19+
intentionally **split into tiers so each could land as its own small, focused PR**. **All 18 are now
20+
fixed and merged** — Tiers 1–3 in **#258**, the deferred `LlamaLoader` extraction race in **#260**
21+
with regression tests added in **#261 / #262**. The full per-finding rationale lives in those PRs and
22+
their commits; the concise record below is kept for traceability. Nothing in this section is open
23+
except the optional follow-up noted at the end.
2824

2925
**`LlamaLoader` native-lib extraction temp-path race — DONE (atomic write + content-reuse).**
3026
`extractFile` now (1) reuses a byte-identical existing copy instead of rewriting it — so it never
@@ -35,81 +31,44 @@ extracted file is self-contained — no multi-DLL co-location to coordinate. Ver
3531
`NativeLibraryLoadSmokeTest` (real extract+load on macOS) + a `resourceMatchesFile` unit test; the
3632
Windows locked-replace path is exercised by CI's Windows jobs.
3733

38-
*Optional follow-up (lower priority):* full per-process extraction **directory** isolation + a
39-
`cleanup()` that recursively removes dead-process dirs. Now that writes are atomic and content-checked,
40-
this is a tidiness improvement (stops the shared-tmpdir `cleanup()` from racing a live peer's flat
41-
file), not a correctness fix — and it still needs the Windows locked-file co-design noted before.
42-
43-
**Tier 1 — high impact, fix first**
44-
45-
- **✓ Unhandled C++ exceptions cross the JNI boundary → JVM abort (UB).** Several `JNIEXPORT`
46-
entry points call throwing code with no `try/catch`, so a C++ exception propagates across JNI
47-
(no Java exception is raised — the process crashes). Most reachable: the **public** API
48-
`LlamaModel.jsonSchemaToGrammar(String)` (`LlamaModel.java:436`) → `jllama.cpp:1090`
49-
`ordered_json::parse(c_schema)` deterministically aborts the JVM on a malformed schema string.
50-
Same class on `encode` / `handleTokenize` / `handleEmbeddings` / `handleRerank` / `applyTemplate`
51-
with malformed JSON or an untokenizable prompt, and `applyTemplate`'s unchecked `.at("prompt")`.
52-
**Fix:** wrap each entry-point body and convert to `ThrowNew(c_llama_error, e.what())` (the pattern
53-
`handleCompletionsOai` / `configureParallelInference` already use). A shared macro covers them uniformly.
54-
- **`parse_string_array` — null deref + JNI local-reference leak** (`jllama.cpp:339`). The
55-
`GetStringUTFChars` result is not null-checked (OOM → `strdup(NULL)`), a `null` array element is UB,
56-
and `GetObjectArrayElement` leaks a local ref each iteration — a `handleRerank` with many documents
57-
overflows the ~16-slot local-ref table. **Fix:** null-check both, `DeleteLocalRef` per iteration,
58-
free already-allocated slots on the error path. Reachable from `loadModel` + `handleRerank`.
59-
- **`close()` / native `delete()` double-free under concurrent close** (`LlamaModel.java:387`,
60-
`jllama.cpp` delete path). `close()` is not `synchronized`, `ctx` is a plain non-volatile `long`,
61-
and native `delete()` does check-then-act on the handle with no atomicity; the library explicitly
62-
supports async/parallel use, so two closes (or a close racing a try-with-resources unwind) can both
63-
free the same handle. **Fix:** `synchronized` close (or atomic handle exchange).
64-
- **`ServerMetrics.getCumulativeTimings()` truncates cumulative token totals to `int` → negative**
65-
(`ServerMetrics.java:174`). `long` startup-cumulative counters cast `(int)` overflow past ~2.1B
66-
tokens (e.g. `(int)3_000_000_000L == -1294967296`). **Fix:** widen `Timings.promptN`/`predictedN`
67-
to `long` (the per-request `fromJson` path benefits too).
68-
69-
**Tier 2 — medium**
70-
71-
- **✓ Unbounded request-body read → OOM DoS** (`OpenAiCompatServer.java:820`,
72-
`OBJECT_MAPPER.readTree(getRequestBody())`). No size cap / `Content-Length` check; unauthenticated
73-
when bound to `0.0.0.0` (the bind is configurable and the default has no key). **Fix:** cap body size
74-
/ reject oversized `Content-Length` (config-driven, sane default).
75-
- **Streaming reader raw-pointer use-after-free** (`jllama.cpp` `receiveCompletionJson` /
76-
`receiveChatCompletionChunk`). The reader pointer is cached under `readers_mutex`, then `next()` is
77-
called outside the lock while `delete()`/`erase_reader` can free it. **Fix:** hold a `shared_ptr`
78-
across `next()`.
79-
- **`Session` permanently wedged on an abandoned stream** (`SessionState`). `streamingActive` is cleared
80-
only by `commitStreamedReply`; a consumer that throws / `break`s / closes the iterable without
81-
committing leaves the flag stuck `true` forever (every later `send`/`stream`/`save`/`restore` throws),
82-
and the user turn is committed with no assistant turn. **Fix:** clear the flag on stream close/cancel.
83-
- **`LlamaIterator.hasNext` not `volatile`** (`LlamaIterator.java:41`). `cancel()` (documented as a
84-
cross-thread call) writes it from another thread; the consumer may never observe `false`. The sibling
85-
`CancellationToken.cancelled` is `volatile` for exactly this reason. **Fix:** `volatile`.
86-
- **Log callback throws from a JVM-unattached native thread** (`jllama.cpp` log lambda + `get_jni_env`).
87-
ggml/llama internal threads can fire logging; `get_jni_env` throws, unwinding through C frames →
88-
`std::terminate`/UB. **Fix:** make the lambda `noexcept`, skip when the thread is unattached.
89-
90-
**Tier 3 — hardening / low**
91-
92-
- **✓ Timing-unsafe API-key comparison** (`OpenAiCompatServer.java:817`, `String.equals`) →
93-
`MessageDigest.isEqual`.
94-
- **`ChatMessage.toolCalls` not defensively copied** (`ChatMessage.java:102`) — `parts` is copied +
95-
wrapped, `toolCalls` is stored/returned by reference, breaking the documented-immutable contract.
96-
- **Single-thread heartbeat pool** (`OpenAiCompatServer.java:171`) — one stalled SSE client blocks the
97-
lone scheduler thread, starving every other stream's keep-alives. **Fix:** size the pool / non-blocking
98-
heartbeat write.
99-
- **Float `NaN`/`Infinity` → invalid JSON** in scalar withers (`JsonParameters.withScalar` via
100-
`String.valueOf`); the whole request then fails opaquely in the native parser. **Fix:** reject at the
101-
public wither.
102-
- **Native-lib extraction to a fixed temp path** (`LlamaLoader`) — no per-process uniqueness; concurrent
103-
JVMs (CI matrices) can interleave a partial `copy` with a peer's `System.load`, and `cleanup()` can
104-
delete a file a live peer needs. **Fix:** per-process temp dir / atomic move.
105-
- **`OSInfo` raw `exec()`** (32-bit ARM detection) leaks `Process` + undrained pipes (can hang init).
106-
**Fix:** route through `ProcessRunner` / drain+close streams.
107-
- **Discovery routes unauthenticated** (`/props`, `/api/show`) — minor metadata disclosure when a key is
108-
set; decide intentional-and-document vs. gate.
109-
- **`completeBatch` partial-failure leak** — on the first `join()` throw, sibling futures keep running
110-
unobserved (native slot pressure); await/cancel the rest before rethrowing.
111-
- **`probabilities` map is last-wins on duplicate token text** (`CompletionResponseParser`) — document,
112-
or key by id+position (lossless data already in `logprobs`).
34+
**Tier 1 — high impact (#258, `a4325ff`)**
35+
36+
- **N1** — unhandled C++ exceptions crossing the JNI boundary → JVM abort; every entry point (incl. the
37+
public `LlamaModel.jsonSchemaToGrammar`, plus encode/tokenize/embeddings/rerank/infill/applyTemplate)
38+
now converts the failure to a `LlamaException` instead of crashing the process.
39+
- **N2**`parse_string_array` null-deref (null element / OOM) + per-iteration JNI local-ref leak.
40+
- **J1**`close()` / native `delete()` double-free under concurrent close → `synchronized` close.
41+
- **P1**`ServerMetrics` cumulative token totals truncated `int` → negative → `Timings.promptN` /
42+
`predictedN` widened to `long`.
43+
44+
**Tier 2 — medium (#258, `a4325ff` + `3e500aa`)**
45+
46+
- **S1** — unbounded request body → OOM DoS → 16 MiB cap + `Content-Length` pre-check; oversized → HTTP 413.
47+
- **N3** — streaming-reader use-after-free → reader held as a `shared_ptr` and copied out under the lock
48+
before `next()`.
49+
- **J5**`Session` permanently wedged on an abandoned stream → `Session.cancelStream()` clears the guard
50+
and rolls back the pending user turn.
51+
- **J3**`LlamaIterator.hasNext` made `volatile` (observed across a cross-thread `cancel()`).
52+
- **N4** — log callback made `noexcept` + non-throwing env lookup (no exception unwinds through llama.cpp
53+
C frames from an unattached thread).
54+
55+
**Tier 3 — hardening (#258, `ac3ad6d`)**
56+
57+
- **S3** — constant-time bearer-key comparison (`MessageDigest.isEqual`).
58+
- **S2** — SSE heartbeat pool sized to the core count (one stalled client can't starve other streams).
59+
- **P3**`ChatMessage.toolCalls` defensively copied + wrapped unmodifiable.
60+
- **NaN/Inf** — non-finite `float`/`double` rejected at `JsonParameters.withScalar` (they would serialize
61+
to the invalid JSON tokens `NaN`/`Infinity`).
62+
- **OSInfo** — armhf-detection `exec()` routed through a drain-and-close helper (no fd leak / pipe-full hang).
63+
- **completeBatch**`completeBatch` / `completeBatchWithStats` / `chatBatch` join every future before
64+
propagating the first failure (no abandoned in-flight requests).
65+
- **Docs**`/props` + Ollama discovery routes documented as intentionally-unauthenticated metadata;
66+
`parseProbabilities` documented as last-wins on duplicate token text (use `parseLogprobs` for lossless data).
67+
68+
**Still open — optional follow-up (lower priority):** full per-process extraction **directory** isolation
69+
+ a `cleanup()` that recursively removes dead-process dirs. Now that writes are atomic and content-checked
70+
this is a tidiness improvement (stops the shared-tmpdir `cleanup()` racing a live peer's flat file), not a
71+
correctness fix — and it still needs the Windows locked-file co-design noted above.
11372

11473
### OpenAI-compatible HTTP endpoint (shipped; follow-ups open)
11574

src/main/java/net/ladenthin/llama/loader/LlamaLoader.java

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -208,15 +208,17 @@ public static boolean loadNativeLibrary(Path path) {
208208
String nativeLibraryFilePath = sourceDirectory + "/" + fileName;
209209

210210
Path extractedFilePath = Paths.get(targetDirectory, fileName);
211+
// Resolve the File once and reuse it (avoids repeated Path.toFile() calls).
212+
File extractedFile = extractedFilePath.toFile();
211213

212214
try {
213215
// Fast path: a byte-identical copy already exists — extracted by a previous run or by a
214216
// concurrent JVM sharing this tmpdir. Reuse it rather than rewriting in place: replacing a
215217
// file another process has already loaded fails on Windows (the lib is locked), and an
216218
// in-place rewrite risks a partial file a concurrent loader could observe.
217219
if (Files.exists(extractedFilePath) && resourceMatchesFile(nativeLibraryFilePath, extractedFilePath)) {
218-
permissionSetter.apply(extractedFilePath.toFile());
219-
extractedFilePath.toFile().deleteOnExit();
220+
permissionSetter.apply(extractedFile);
221+
extractedFile.deleteOnExit();
220222
return extractedFilePath;
221223
}
222224

@@ -236,13 +238,20 @@ public static boolean loadNativeLibrary(Path path) {
236238
}
237239
moveIntoPlace(tempFile, extractedFilePath);
238240
} finally {
239-
// No-op once moveIntoPlace consumed it; cleans up if any step above bailed out.
240-
Files.deleteIfExists(tempFile);
241+
// Best-effort cleanup: a no-op once moveIntoPlace consumed it, otherwise it removes the
242+
// temp file left behind if any step above bailed out. The delete must never throw out of
243+
// the finally block — that would mask the primary result/exception from the try — so the
244+
// IOException is swallowed (a leftover .tmp in java.io.tmpdir is harmless).
245+
try {
246+
Files.deleteIfExists(tempFile);
247+
} catch (IOException ignored) {
248+
// ignore: best-effort temp-file cleanup
249+
}
241250
}
242251

243252
// Set executable (x) flag to enable Java to load the native library.
244-
permissionSetter.apply(extractedFilePath.toFile());
245-
extractedFilePath.toFile().deleteOnExit();
253+
permissionSetter.apply(extractedFile);
254+
extractedFile.deleteOnExit();
246255

247256
System.out.println("Extracted '" + fileName + "' to '" + extractedFilePath + "'");
248257
return extractedFilePath;

0 commit comments

Comments
 (0)