Skip to content

Commit 42f2f2a

Browse files
Merge pull request #256 from bernardladenthin/claude/confident-knuth-4zgcg7
docs: add JNI safety and server hardening issues to TODO
2 parents b20c2d3 + 8b69218 commit 42f2f2a

1 file changed

Lines changed: 37 additions & 0 deletions

File tree

TODO.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,43 @@ These are JNI plumbing items for upstream API additions. Policy: add only after
271271
272272
- **Expose `llama_vocab::get_suppress_tokens()` via `LlamaModel.getSuppressTokens()`.** Added in b9490–b9495 alongside the new `tokenizer.ggml.suppress_tokens` GGUF key and the `LLM_KV_TOKENIZER_SUPPRESS_TOKENS` constant. When a GGUF declares this array, upstream stores it on `llama_vocab::impl::suppress_tokens` and exposes it via the new `llama_vocab::get_suppress_tokens()` accessor. The bias is **applied automatically** inside the model forward graph — the Gemma4 Unified graph (`src/models/gemma4.cpp`) reads the list and adds a `-INFINITY` logit bias to those token IDs via a new `llm_graph_input_logits_bias` input so the model cannot emit them (used to block `<image|>` / `<audio|>` placeholders). A Java mirror would be `public int[] getSuppressTokens()` on `LlamaModel`: a read-only inspector returning the suppression list for debugging or for callers running their own sampling who want to replicate the same bias. Value is low (the bias is auto-applied, Java callers cannot change it; java-llama.cpp does not expose custom logit-bias hooks at this level); cost is trivial (one JNI passthrough + a `getSuppressTokens()` Java method).
273273
274+
### JNI safety and server hardening (from PR #251 contributor)
275+
276+
Raised by [@vaiju1981](https://github.com/vaiju1981) in
277+
[PR #251 comment](https://github.com/bernardladenthin/java-llama.cpp/pull/251#issuecomment-4761363838).
278+
Feel free to contribute fixes — PRs welcome.
279+
280+
- **Unhandled C++ exceptions cross the JNI boundary → JVM abort (UB).** Any `std::exception`
281+
(or worse, an exception of unknown type) that escapes a native method and crosses the JNI
282+
boundary causes undefined behaviour on most JVMs and typically aborts the process. Each native
283+
method in `jllama.cpp` should wrap its body in `try { … } catch (const std::exception& e) {
284+
env->ThrowNew(llamaExceptionClass, e.what()); return <zero>; } catch (...) { env->ThrowNew(…,
285+
"unknown C++ exception"); return <zero>; }` so that errors surface as `LlamaException` on the
286+
Java side instead of crashing the JVM.
287+
288+
- **`parse_string_array` — null deref + JNI local-reference leak.** The helper that reads a
289+
JSON string array from JNI can dereference a null pointer when an array element is absent,
290+
and leaks JNI local references when an early exit skips the matching `DeleteLocalRef`. Fix:
291+
guard every `GetObjectArrayElement` result and pair each reference acquisition with a
292+
`DeleteLocalRef` before the next iteration or return.
293+
294+
- **`close()` / native `delete()` double-free under concurrent close.** If two threads race to
295+
call `LlamaModel.close()`, both can reach the native `delete` path and free the same
296+
`jllama_context` pointer twice → heap corruption. Fix: use `AtomicBoolean closed` + a
297+
`synchronized` guard (or `compareAndSet`) on the Java side so `close()` is idempotent and
298+
the native pointer is nulled before the second caller can reach it.
299+
300+
- **`ServerMetrics.getCumulativeTimings()` truncates cumulative token totals to `int`.** The
301+
cumulative token counters are stored as `long` in the JSON but cast to `int` when
302+
constructing `ServerMetrics`, silently truncating values above `Integer.MAX_VALUE`
303+
(~2.1 billion tokens). Fix: widen the field and constructor parameter to `long`.
304+
305+
- **Unbounded request-body read → OOM DoS.** The HTTP handler reads the entire request body
306+
into a `String`/`byte[]` before parsing it, with no size cap. A client that streams a
307+
multi-gigabyte body can exhaust heap memory and crash the JVM. Fix: add a configurable
308+
`maxRequestBodyBytes` limit (e.g. default 4 MB) and reject oversized requests with
309+
`HTTP 413 Content Too Large` before buffering them.
310+
274311
### Feature backlog from similar projects
275312
276313
- **Feature backlog from similar projects.** See [`docs/feature-investigation-similar-projects.md`](docs/feature-investigation-similar-projects.md) for the consolidated investigation across the 5 pure-Java sibling runtimes ([llama3.java](https://github.com/mukel/llama3.java), [gemma4.java](https://github.com/mukel/gemma4.java), [gptoss.java](https://github.com/mukel/gptoss.java), [qwen35.java](https://github.com/mukel/qwen35.java), [nemotron3.java](https://github.com/mukel/nemotron3.java)) plus the dormant alternative JNI binding [llamacpp4j](https://github.com/sebicom/llamacpp4j). The doc captures 18 candidate items grouped into cross-cutting themes (UTF-8 streaming boundary safety, thinking-channel router, operator timing line, jbang single-file example, README system-properties table, etc.) and per-repo unique findings (Harmony channel decoder, Qwen empty-`<think>` injection, llama_state_* save/load, llama_adapter_lora_* hot-apply, etc.), each with effort sizing (XS / S / M / L) and a prioritised backlog.

0 commit comments

Comments
 (0)