@@ -13,18 +13,14 @@ cross-cutting initiative.
1313
1414## Open — jllama-specific
1515
16- ### Code audit — pre-existing correctness / safety findings (open )
16+ ### Code audit — pre-existing correctness / safety findings (RESOLVED — PRs # 258 + # 260 )
1717
18- A multi-area audit (2026-06-20) of the ** existing** codebase (independent of the tools / vision /
19- session PRs) surfaced the items below. None are regressions from recent work. They are intentionally
20- ** split into tiers so each can land as its own small, focused PR** rather than one large change.
21- Items marked ** ✓ verified** were re-confirmed by reading the cited code directly; the rest were
22- read-confirmed by the audit but warrant a quick re-check before the fix.
23-
24- ** Progress (` feature/todo_fixes ` ):** Tier 1 (N1, N2, J1, P1), Tier 2 (N3, N4, S1, J3, J5) and Tier 3
25- (S2, S3, P3, NaN/Inf JSON, OSInfo ` exec ` , discovery-route auth doc, ` completeBatch ` leak, probabilities
26- doc) ** DONE** — verified (452/452 C++ tests, affected Java tests + new regressions, Javadoc + Spotless +
27- clang-format 22.1.5 clean).
18+ A multi-area audit (2026-06-20) of the ** existing** codebase surfaced 18 correctness/safety findings,
19+ intentionally ** split into tiers so each could land as its own small, focused PR** . ** All 18 are now
20+ fixed and merged** — Tiers 1–3 in ** #258 ** , the deferred ` LlamaLoader ` extraction race in ** #260 ** —
21+ with regression tests added in ** #261 / #262 ** . The full per-finding rationale lives in those PRs and
22+ their commits; the concise record below is kept for traceability. Nothing in this section is open
23+ except the optional follow-up noted at the end.
2824
2925** ` LlamaLoader ` native-lib extraction temp-path race — DONE (atomic write + content-reuse).**
3026` extractFile ` now (1) reuses a byte-identical existing copy instead of rewriting it — so it never
@@ -35,81 +31,44 @@ extracted file is self-contained — no multi-DLL co-location to coordinate. Ver
3531` NativeLibraryLoadSmokeTest ` (real extract+load on macOS) + a ` resourceMatchesFile ` unit test; the
3632Windows locked-replace path is exercised by CI's Windows jobs.
3733
38- * Optional follow-up (lower priority):* full per-process extraction ** directory** isolation + a
39- ` cleanup() ` that recursively removes dead-process dirs. Now that writes are atomic and content-checked,
40- this is a tidiness improvement (stops the shared-tmpdir ` cleanup() ` from racing a live peer's flat
41- file), not a correctness fix — and it still needs the Windows locked-file co-design noted before.
42-
43- ** Tier 1 — high impact, fix first**
44-
45- - ** ✓ Unhandled C++ exceptions cross the JNI boundary → JVM abort (UB).** Several ` JNIEXPORT `
46- entry points call throwing code with no ` try/catch ` , so a C++ exception propagates across JNI
47- (no Java exception is raised — the process crashes). Most reachable: the ** public** API
48- ` LlamaModel.jsonSchemaToGrammar(String) ` (` LlamaModel.java:436 ` ) → ` jllama.cpp:1090 `
49- ` ordered_json::parse(c_schema) ` deterministically aborts the JVM on a malformed schema string.
50- Same class on ` encode ` / ` handleTokenize ` / ` handleEmbeddings ` / ` handleRerank ` / ` applyTemplate `
51- with malformed JSON or an untokenizable prompt, and ` applyTemplate ` 's unchecked ` .at("prompt") ` .
52- ** Fix:** wrap each entry-point body and convert to ` ThrowNew(c_llama_error, e.what()) ` (the pattern
53- ` handleCompletionsOai ` / ` configureParallelInference ` already use). A shared macro covers them uniformly.
54- - ** ✓ ` parse_string_array ` — null deref + JNI local-reference leak** (` jllama.cpp:339 ` ). The
55- ` GetStringUTFChars ` result is not null-checked (OOM → ` strdup(NULL) ` ), a ` null ` array element is UB,
56- and ` GetObjectArrayElement ` leaks a local ref each iteration — a ` handleRerank ` with many documents
57- overflows the ~ 16-slot local-ref table. ** Fix:** null-check both, ` DeleteLocalRef ` per iteration,
58- free already-allocated slots on the error path. Reachable from ` loadModel ` + ` handleRerank ` .
59- - ** ✓ ` close() ` / native ` delete() ` double-free under concurrent close** (` LlamaModel.java:387 ` ,
60- ` jllama.cpp ` delete path). ` close() ` is not ` synchronized ` , ` ctx ` is a plain non-volatile ` long ` ,
61- and native ` delete() ` does check-then-act on the handle with no atomicity; the library explicitly
62- supports async/parallel use, so two closes (or a close racing a try-with-resources unwind) can both
63- free the same handle. ** Fix:** ` synchronized ` close (or atomic handle exchange).
64- - ** ✓ ` ServerMetrics.getCumulativeTimings() ` truncates cumulative token totals to ` int ` → negative**
65- (` ServerMetrics.java:174 ` ). ` long ` startup-cumulative counters cast ` (int) ` overflow past ~ 2.1B
66- tokens (e.g. ` (int)3_000_000_000L == -1294967296 ` ). ** Fix:** widen ` Timings.promptN ` /` predictedN `
67- to ` long ` (the per-request ` fromJson ` path benefits too).
68-
69- ** Tier 2 — medium**
70-
71- - ** ✓ Unbounded request-body read → OOM DoS** (` OpenAiCompatServer.java:820 ` ,
72- ` OBJECT_MAPPER.readTree(getRequestBody()) ` ). No size cap / ` Content-Length ` check; unauthenticated
73- when bound to ` 0.0.0.0 ` (the bind is configurable and the default has no key). ** Fix:** cap body size
74- / reject oversized ` Content-Length ` (config-driven, sane default).
75- - ** Streaming reader raw-pointer use-after-free** (` jllama.cpp ` ` receiveCompletionJson ` /
76- ` receiveChatCompletionChunk ` ). The reader pointer is cached under ` readers_mutex ` , then ` next() ` is
77- called outside the lock while ` delete() ` /` erase_reader ` can free it. ** Fix:** hold a ` shared_ptr `
78- across ` next() ` .
79- - ** ` Session ` permanently wedged on an abandoned stream** (` SessionState ` ). ` streamingActive ` is cleared
80- only by ` commitStreamedReply ` ; a consumer that throws / ` break ` s / closes the iterable without
81- committing leaves the flag stuck ` true ` forever (every later ` send ` /` stream ` /` save ` /` restore ` throws),
82- and the user turn is committed with no assistant turn. ** Fix:** clear the flag on stream close/cancel.
83- - ** ✓ ` LlamaIterator.hasNext ` not ` volatile ` ** (` LlamaIterator.java:41 ` ). ` cancel() ` (documented as a
84- cross-thread call) writes it from another thread; the consumer may never observe ` false ` . The sibling
85- ` CancellationToken.cancelled ` is ` volatile ` for exactly this reason. ** Fix:** ` volatile ` .
86- - ** Log callback throws from a JVM-unattached native thread** (` jllama.cpp ` log lambda + ` get_jni_env ` ).
87- ggml/llama internal threads can fire logging; ` get_jni_env ` throws, unwinding through C frames →
88- ` std::terminate ` /UB. ** Fix:** make the lambda ` noexcept ` , skip when the thread is unattached.
89-
90- ** Tier 3 — hardening / low**
91-
92- - ** ✓ Timing-unsafe API-key comparison** (` OpenAiCompatServer.java:817 ` , ` String.equals ` ) →
93- ` MessageDigest.isEqual ` .
94- - ** ✓ ` ChatMessage.toolCalls ` not defensively copied** (` ChatMessage.java:102 ` ) — ` parts ` is copied +
95- wrapped, ` toolCalls ` is stored/returned by reference, breaking the documented-immutable contract.
96- - ** Single-thread heartbeat pool** (` OpenAiCompatServer.java:171 ` ) — one stalled SSE client blocks the
97- lone scheduler thread, starving every other stream's keep-alives. ** Fix:** size the pool / non-blocking
98- heartbeat write.
99- - ** Float ` NaN ` /` Infinity ` → invalid JSON** in scalar withers (` JsonParameters.withScalar ` via
100- ` String.valueOf ` ); the whole request then fails opaquely in the native parser. ** Fix:** reject at the
101- public wither.
102- - ** Native-lib extraction to a fixed temp path** (` LlamaLoader ` ) — no per-process uniqueness; concurrent
103- JVMs (CI matrices) can interleave a partial ` copy ` with a peer's ` System.load ` , and ` cleanup() ` can
104- delete a file a live peer needs. ** Fix:** per-process temp dir / atomic move.
105- - ** ` OSInfo ` raw ` exec() ` ** (32-bit ARM detection) leaks ` Process ` + undrained pipes (can hang init).
106- ** Fix:** route through ` ProcessRunner ` / drain+close streams.
107- - ** Discovery routes unauthenticated** (` /props ` , ` /api/show ` ) — minor metadata disclosure when a key is
108- set; decide intentional-and-document vs. gate.
109- - ** ` completeBatch ` partial-failure leak** — on the first ` join() ` throw, sibling futures keep running
110- unobserved (native slot pressure); await/cancel the rest before rethrowing.
111- - ** ` probabilities ` map is last-wins on duplicate token text** (` CompletionResponseParser ` ) — document,
112- or key by id+position (lossless data already in ` logprobs ` ).
34+ ** Tier 1 — high impact (#258 , ` a4325ff ` )**
35+
36+ - ** N1** — unhandled C++ exceptions crossing the JNI boundary → JVM abort; every entry point (incl. the
37+ public ` LlamaModel.jsonSchemaToGrammar ` , plus encode/tokenize/embeddings/rerank/infill/applyTemplate)
38+ now converts the failure to a ` LlamaException ` instead of crashing the process.
39+ - ** N2** — ` parse_string_array ` null-deref (null element / OOM) + per-iteration JNI local-ref leak.
40+ - ** J1** — ` close() ` / native ` delete() ` double-free under concurrent close → ` synchronized ` close.
41+ - ** P1** — ` ServerMetrics ` cumulative token totals truncated ` int ` → negative → ` Timings.promptN ` /
42+ ` predictedN ` widened to ` long ` .
43+
44+ ** Tier 2 — medium (#258 , ` a4325ff ` + ` 3e500aa ` )**
45+
46+ - ** S1** — unbounded request body → OOM DoS → 16 MiB cap + ` Content-Length ` pre-check; oversized → HTTP 413.
47+ - ** N3** — streaming-reader use-after-free → reader held as a ` shared_ptr ` and copied out under the lock
48+ before ` next() ` .
49+ - ** J5** — ` Session ` permanently wedged on an abandoned stream → ` Session.cancelStream() ` clears the guard
50+ and rolls back the pending user turn.
51+ - ** J3** — ` LlamaIterator.hasNext ` made ` volatile ` (observed across a cross-thread ` cancel() ` ).
52+ - ** N4** — log callback made ` noexcept ` + non-throwing env lookup (no exception unwinds through llama.cpp
53+ C frames from an unattached thread).
54+
55+ ** Tier 3 — hardening (#258 , ` ac3ad6d ` )**
56+
57+ - ** S3** — constant-time bearer-key comparison (` MessageDigest.isEqual ` ).
58+ - ** S2** — SSE heartbeat pool sized to the core count (one stalled client can't starve other streams).
59+ - ** P3** — ` ChatMessage.toolCalls ` defensively copied + wrapped unmodifiable.
60+ - ** NaN/Inf** — non-finite ` float ` /` double ` rejected at ` JsonParameters.withScalar ` (they would serialize
61+ to the invalid JSON tokens ` NaN ` /` Infinity ` ).
62+ - ** OSInfo** — armhf-detection ` exec() ` routed through a drain-and-close helper (no fd leak / pipe-full hang).
63+ - ** completeBatch** — ` completeBatch ` / ` completeBatchWithStats ` / ` chatBatch ` join every future before
64+ propagating the first failure (no abandoned in-flight requests).
65+ - ** Docs** — ` /props ` + Ollama discovery routes documented as intentionally-unauthenticated metadata;
66+ ` parseProbabilities ` documented as last-wins on duplicate token text (use ` parseLogprobs ` for lossless data).
67+
68+ ** Still open — optional follow-up (lower priority):** full per-process extraction ** directory** isolation
69+ + a ` cleanup() ` that recursively removes dead-process dirs. Now that writes are atomic and content-checked
70+ this is a tidiness improvement (stops the shared-tmpdir ` cleanup() ` racing a live peer's flat file), not a
71+ correctness fix — and it still needs the Windows locked-file co-design noted above.
11372
11473### OpenAI-compatible HTTP endpoint (shipped; follow-ups open)
11574
0 commit comments