Skip to content

Commit 90f95d0

Browse files
Merge pull request #258 from vaiju1981/feature/todo_fixes
Feature/todo fixes
2 parents 0d44a16 + ac3ad6d commit 90f95d0

22 files changed

Lines changed: 551 additions & 57 deletions

TODO.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,97 @@ cross-cutting initiative.
1313

1414
## Open — jllama-specific
1515

16+
### Code audit — pre-existing correctness / safety findings (open)
17+
18+
A multi-area audit (2026-06-20) of the **existing** codebase (independent of the tools / vision /
19+
session PRs) surfaced the items below. None are regressions from recent work. They are intentionally
20+
**split into tiers so each can land as its own small, focused PR** rather than one large change.
21+
Items marked **✓ verified** were re-confirmed by reading the cited code directly; the rest were
22+
read-confirmed by the audit but warrant a quick re-check before the fix.
23+
24+
**Progress (`feature/todo_fixes`):** Tier 1 (N1, N2, J1, P1), Tier 2 (N3, N4, S1, J3, J5) and Tier 3
25+
(S2, S3, P3, NaN/Inf JSON, OSInfo `exec`, discovery-route auth doc, `completeBatch` leak, probabilities
26+
doc) **DONE** — verified (452/452 C++ tests, affected Java tests + new regressions, Javadoc + Spotless +
27+
clang-format 22.1.5 clean).
28+
29+
**Deferred — `LlamaLoader` native-lib extraction temp-path race (open).** Fixing it (per-process temp
30+
dir / atomic move) is the one audit item left undone: it sits on *the* native-library load path, so the
31+
change has a high blast radius and needs careful Windows / `deleteOnExit` / `cleanup()` co-design plus
32+
cross-platform load testing (a locked-DLL replace on Windows, leftover unique-dir accumulation) that the
33+
fix-batch could not safely exercise. The race only bites concurrent JVMs sharing one `java.io.tmpdir`
34+
(e.g. some CI matrices). Pick it up as its own change with a Windows test pass.
35+
36+
**Tier 1 — high impact, fix first**
37+
38+
- **✓ Unhandled C++ exceptions cross the JNI boundary → JVM abort (UB).** Several `JNIEXPORT`
39+
entry points call throwing code with no `try/catch`, so a C++ exception propagates across JNI
40+
(no Java exception is raised — the process crashes). Most reachable: the **public** API
41+
`LlamaModel.jsonSchemaToGrammar(String)` (`LlamaModel.java:436`) → `jllama.cpp:1090`
42+
`ordered_json::parse(c_schema)` deterministically aborts the JVM on a malformed schema string.
43+
Same class on `encode` / `handleTokenize` / `handleEmbeddings` / `handleRerank` / `applyTemplate`
44+
with malformed JSON or an untokenizable prompt, and `applyTemplate`'s unchecked `.at("prompt")`.
45+
**Fix:** wrap each entry-point body and convert to `ThrowNew(c_llama_error, e.what())` (the pattern
46+
`handleCompletionsOai` / `configureParallelInference` already use). A shared macro covers them uniformly.
47+
- **`parse_string_array` — null deref + JNI local-reference leak** (`jllama.cpp:339`). The
48+
`GetStringUTFChars` result is not null-checked (OOM → `strdup(NULL)`), a `null` array element is UB,
49+
and `GetObjectArrayElement` leaks a local ref each iteration — a `handleRerank` with many documents
50+
overflows the ~16-slot local-ref table. **Fix:** null-check both, `DeleteLocalRef` per iteration,
51+
free already-allocated slots on the error path. Reachable from `loadModel` + `handleRerank`.
52+
- **`close()` / native `delete()` double-free under concurrent close** (`LlamaModel.java:387`,
53+
`jllama.cpp` delete path). `close()` is not `synchronized`, `ctx` is a plain non-volatile `long`,
54+
and native `delete()` does check-then-act on the handle with no atomicity; the library explicitly
55+
supports async/parallel use, so two closes (or a close racing a try-with-resources unwind) can both
56+
free the same handle. **Fix:** `synchronized` close (or atomic handle exchange).
57+
- **`ServerMetrics.getCumulativeTimings()` truncates cumulative token totals to `int` → negative**
58+
(`ServerMetrics.java:174`). `long` startup-cumulative counters cast `(int)` overflow past ~2.1B
59+
tokens (e.g. `(int)3_000_000_000L == -1294967296`). **Fix:** widen `Timings.promptN`/`predictedN`
60+
to `long` (the per-request `fromJson` path benefits too).
61+
62+
**Tier 2 — medium**
63+
64+
- **✓ Unbounded request-body read → OOM DoS** (`OpenAiCompatServer.java:820`,
65+
`OBJECT_MAPPER.readTree(getRequestBody())`). No size cap / `Content-Length` check; unauthenticated
66+
when bound to `0.0.0.0` (the bind is configurable and the default has no key). **Fix:** cap body size
67+
/ reject oversized `Content-Length` (config-driven, sane default).
68+
- **Streaming reader raw-pointer use-after-free** (`jllama.cpp` `receiveCompletionJson` /
69+
`receiveChatCompletionChunk`). The reader pointer is cached under `readers_mutex`, then `next()` is
70+
called outside the lock while `delete()`/`erase_reader` can free it. **Fix:** hold a `shared_ptr`
71+
across `next()`.
72+
- **`Session` permanently wedged on an abandoned stream** (`SessionState`). `streamingActive` is cleared
73+
only by `commitStreamedReply`; a consumer that throws / `break`s / closes the iterable without
74+
committing leaves the flag stuck `true` forever (every later `send`/`stream`/`save`/`restore` throws),
75+
and the user turn is committed with no assistant turn. **Fix:** clear the flag on stream close/cancel.
76+
- **`LlamaIterator.hasNext` not `volatile`** (`LlamaIterator.java:41`). `cancel()` (documented as a
77+
cross-thread call) writes it from another thread; the consumer may never observe `false`. The sibling
78+
`CancellationToken.cancelled` is `volatile` for exactly this reason. **Fix:** `volatile`.
79+
- **Log callback throws from a JVM-unattached native thread** (`jllama.cpp` log lambda + `get_jni_env`).
80+
ggml/llama internal threads can fire logging; `get_jni_env` throws, unwinding through C frames →
81+
`std::terminate`/UB. **Fix:** make the lambda `noexcept`, skip when the thread is unattached.
82+
83+
**Tier 3 — hardening / low**
84+
85+
- **✓ Timing-unsafe API-key comparison** (`OpenAiCompatServer.java:817`, `String.equals`) →
86+
`MessageDigest.isEqual`.
87+
- **`ChatMessage.toolCalls` not defensively copied** (`ChatMessage.java:102`) — `parts` is copied +
88+
wrapped, `toolCalls` is stored/returned by reference, breaking the documented-immutable contract.
89+
- **Single-thread heartbeat pool** (`OpenAiCompatServer.java:171`) — one stalled SSE client blocks the
90+
lone scheduler thread, starving every other stream's keep-alives. **Fix:** size the pool / non-blocking
91+
heartbeat write.
92+
- **Float `NaN`/`Infinity` → invalid JSON** in scalar withers (`JsonParameters.withScalar` via
93+
`String.valueOf`); the whole request then fails opaquely in the native parser. **Fix:** reject at the
94+
public wither.
95+
- **Native-lib extraction to a fixed temp path** (`LlamaLoader`) — no per-process uniqueness; concurrent
96+
JVMs (CI matrices) can interleave a partial `copy` with a peer's `System.load`, and `cleanup()` can
97+
delete a file a live peer needs. **Fix:** per-process temp dir / atomic move.
98+
- **`OSInfo` raw `exec()`** (32-bit ARM detection) leaks `Process` + undrained pipes (can hang init).
99+
**Fix:** route through `ProcessRunner` / drain+close streams.
100+
- **Discovery routes unauthenticated** (`/props`, `/api/show`) — minor metadata disclosure when a key is
101+
set; decide intentional-and-document vs. gate.
102+
- **`completeBatch` partial-failure leak** — on the first `join()` throw, sibling futures keep running
103+
unobserved (native slot pressure); await/cancel the rest before rethrowing.
104+
- **`probabilities` map is last-wins on duplicate token text** (`CompletionResponseParser`) — document,
105+
or key by id+position (lossless data already in `logprobs`).
106+
16107
### OpenAI-compatible HTTP endpoint (shipped; follow-ups open)
17108

18109
`net.ladenthin.llama.server.OpenAiCompatServer` is the single OpenAI-compatible server (JDK

0 commit comments

Comments
 (0)