@@ -13,6 +13,97 @@ cross-cutting initiative.
1313
1414## Open — jllama-specific
1515
16+ ### Code audit — pre-existing correctness / safety findings (open)
17+
18+ A multi-area audit (2026-06-20) of the ** existing** codebase (independent of the tools / vision /
19+ session PRs) surfaced the items below. None are regressions from recent work. They are intentionally
20+ ** split into tiers so each can land as its own small, focused PR** rather than one large change.
21+ Items marked ** ✓ verified** were re-confirmed by reading the cited code directly; the rest were
22+ read-confirmed by the audit but warrant a quick re-check before the fix.
23+
24+ ** Progress (` feature/todo_fixes ` ):** Tier 1 (N1, N2, J1, P1), Tier 2 (N3, N4, S1, J3, J5) and Tier 3
25+ (S2, S3, P3, NaN/Inf JSON, OSInfo ` exec ` , discovery-route auth doc, ` completeBatch ` leak, probabilities
26+ doc) ** DONE** — verified (452/452 C++ tests, affected Java tests + new regressions, Javadoc + Spotless +
27+ clang-format 22.1.5 clean).
28+
29+ ** Deferred — ` LlamaLoader ` native-lib extraction temp-path race (open).** Fixing it (per-process temp
30+ dir / atomic move) is the one audit item left undone: it sits on * the* native-library load path, so the
31+ change has a high blast radius and needs careful Windows / ` deleteOnExit ` / ` cleanup() ` co-design plus
32+ cross-platform load testing (a locked-DLL replace on Windows, leftover unique-dir accumulation) that the
33+ fix-batch could not safely exercise. The race only bites concurrent JVMs sharing one ` java.io.tmpdir `
34+ (e.g. some CI matrices). Pick it up as its own change with a Windows test pass.
35+
36+ ** Tier 1 — high impact, fix first**
37+
38+ - ** ✓ Unhandled C++ exceptions cross the JNI boundary → JVM abort (UB).** Several ` JNIEXPORT `
39+ entry points call throwing code with no ` try/catch ` , so a C++ exception propagates across JNI
40+ (no Java exception is raised — the process crashes). Most reachable: the ** public** API
41+ ` LlamaModel.jsonSchemaToGrammar(String) ` (` LlamaModel.java:436 ` ) → ` jllama.cpp:1090 `
42+ ` ordered_json::parse(c_schema) ` deterministically aborts the JVM on a malformed schema string.
43+ Same class on ` encode ` / ` handleTokenize ` / ` handleEmbeddings ` / ` handleRerank ` / ` applyTemplate `
44+ with malformed JSON or an untokenizable prompt, and ` applyTemplate ` 's unchecked ` .at("prompt") ` .
45+ ** Fix:** wrap each entry-point body and convert to ` ThrowNew(c_llama_error, e.what()) ` (the pattern
46+ ` handleCompletionsOai ` / ` configureParallelInference ` already use). A shared macro covers them uniformly.
47+ - ** ✓ ` parse_string_array ` — null deref + JNI local-reference leak** (` jllama.cpp:339 ` ). The
48+ ` GetStringUTFChars ` result is not null-checked (OOM → ` strdup(NULL) ` ), a ` null ` array element is UB,
49+ and ` GetObjectArrayElement ` leaks a local ref each iteration — a ` handleRerank ` with many documents
50+ overflows the ~ 16-slot local-ref table. ** Fix:** null-check both, ` DeleteLocalRef ` per iteration,
51+ free already-allocated slots on the error path. Reachable from ` loadModel ` + ` handleRerank ` .
52+ - ** ✓ ` close() ` / native ` delete() ` double-free under concurrent close** (` LlamaModel.java:387 ` ,
53+ ` jllama.cpp ` delete path). ` close() ` is not ` synchronized ` , ` ctx ` is a plain non-volatile ` long ` ,
54+ and native ` delete() ` does check-then-act on the handle with no atomicity; the library explicitly
55+ supports async/parallel use, so two closes (or a close racing a try-with-resources unwind) can both
56+ free the same handle. ** Fix:** ` synchronized ` close (or atomic handle exchange).
57+ - ** ✓ ` ServerMetrics.getCumulativeTimings() ` truncates cumulative token totals to ` int ` → negative**
58+ (` ServerMetrics.java:174 ` ). ` long ` startup-cumulative counters cast ` (int) ` overflow past ~ 2.1B
59+ tokens (e.g. ` (int)3_000_000_000L == -1294967296 ` ). ** Fix:** widen ` Timings.promptN ` /` predictedN `
60+ to ` long ` (the per-request ` fromJson ` path benefits too).
61+
62+ ** Tier 2 — medium**
63+
64+ - ** ✓ Unbounded request-body read → OOM DoS** (` OpenAiCompatServer.java:820 ` ,
65+ ` OBJECT_MAPPER.readTree(getRequestBody()) ` ). No size cap / ` Content-Length ` check; unauthenticated
66+ when bound to ` 0.0.0.0 ` (the bind is configurable and the default has no key). ** Fix:** cap body size
67+ / reject oversized ` Content-Length ` (config-driven, sane default).
68+ - ** Streaming reader raw-pointer use-after-free** (` jllama.cpp ` ` receiveCompletionJson ` /
69+ ` receiveChatCompletionChunk ` ). The reader pointer is cached under ` readers_mutex ` , then ` next() ` is
70+ called outside the lock while ` delete() ` /` erase_reader ` can free it. ** Fix:** hold a ` shared_ptr `
71+ across ` next() ` .
72+ - ** ` Session ` permanently wedged on an abandoned stream** (` SessionState ` ). ` streamingActive ` is cleared
73+ only by ` commitStreamedReply ` ; a consumer that throws / ` break ` s / closes the iterable without
74+ committing leaves the flag stuck ` true ` forever (every later ` send ` /` stream ` /` save ` /` restore ` throws),
75+ and the user turn is committed with no assistant turn. ** Fix:** clear the flag on stream close/cancel.
76+ - ** ✓ ` LlamaIterator.hasNext ` not ` volatile ` ** (` LlamaIterator.java:41 ` ). ` cancel() ` (documented as a
77+ cross-thread call) writes it from another thread; the consumer may never observe ` false ` . The sibling
78+ ` CancellationToken.cancelled ` is ` volatile ` for exactly this reason. ** Fix:** ` volatile ` .
79+ - ** Log callback throws from a JVM-unattached native thread** (` jllama.cpp ` log lambda + ` get_jni_env ` ).
80+ ggml/llama internal threads can fire logging; ` get_jni_env ` throws, unwinding through C frames →
81+ ` std::terminate ` /UB. ** Fix:** make the lambda ` noexcept ` , skip when the thread is unattached.
82+
83+ ** Tier 3 — hardening / low**
84+
85+ - ** ✓ Timing-unsafe API-key comparison** (` OpenAiCompatServer.java:817 ` , ` String.equals ` ) →
86+ ` MessageDigest.isEqual ` .
87+ - ** ✓ ` ChatMessage.toolCalls ` not defensively copied** (` ChatMessage.java:102 ` ) — ` parts ` is copied +
88+ wrapped, ` toolCalls ` is stored/returned by reference, breaking the documented-immutable contract.
89+ - ** Single-thread heartbeat pool** (` OpenAiCompatServer.java:171 ` ) — one stalled SSE client blocks the
90+ lone scheduler thread, starving every other stream's keep-alives. ** Fix:** size the pool / non-blocking
91+ heartbeat write.
92+ - ** Float ` NaN ` /` Infinity ` → invalid JSON** in scalar withers (` JsonParameters.withScalar ` via
93+ ` String.valueOf ` ); the whole request then fails opaquely in the native parser. ** Fix:** reject at the
94+ public wither.
95+ - ** Native-lib extraction to a fixed temp path** (` LlamaLoader ` ) — no per-process uniqueness; concurrent
96+ JVMs (CI matrices) can interleave a partial ` copy ` with a peer's ` System.load ` , and ` cleanup() ` can
97+ delete a file a live peer needs. ** Fix:** per-process temp dir / atomic move.
98+ - ** ` OSInfo ` raw ` exec() ` ** (32-bit ARM detection) leaks ` Process ` + undrained pipes (can hang init).
99+ ** Fix:** route through ` ProcessRunner ` / drain+close streams.
100+ - ** Discovery routes unauthenticated** (` /props ` , ` /api/show ` ) — minor metadata disclosure when a key is
101+ set; decide intentional-and-document vs. gate.
102+ - ** ` completeBatch ` partial-failure leak** — on the first ` join() ` throw, sibling futures keep running
103+ unobserved (native slot pressure); await/cancel the rest before rethrowing.
104+ - ** ` probabilities ` map is last-wins on duplicate token text** (` CompletionResponseParser ` ) — document,
105+ or key by id+position (lossless data already in ` logprobs ` ).
106+
16107### OpenAI-compatible HTTP endpoint (shipped; follow-ups open)
17108
18109` net.ladenthin.llama.server.OpenAiCompatServer ` is the single OpenAI-compatible server (JDK
0 commit comments