Merge pull request #202 from bernardladenthin/claude/eloquent-rubin-qzL1D

bernardladenthin · web-flow · commit 42bb74f5ce85 · 2026-05-31T13:25:33.000+02:00
Fix JNI bridge to skip null sentinel results from llama.cpp b9437
diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md
@@ -271,8 +271,8 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | ~b9333–b9354 | upstream CI (`.github/workflows/`) | CANN and SYCL builds disabled to save Actions resources; macOS builds moved to `build-apple.yml`; cache keys prefixed with `cache-gha-`; `[no release]` commit message token skips release pipeline; no project changes required |
 | ~b9354–b9437 | `common/common.h` + `common/arg.h` + `common/arg.cpp` | `common_params_handle_models()` return type `void` &#x2192; `bool` (caller can detect skip-download misses); new `common_params::skip_download`; `common_params::timeout_read` default raised 600 &#x2192; 3600. Project does not call `common_params_handle_models()` directly &mdash; arg parsing happens upstream; the new defaults flow through transparently |
 | ~b9354–b9437 | `common/download.h` + `common/download.cpp` | `common_download_model()` parameter list trimmed: `download_mmproj`/`download_mtp` moved into `common_download_opts`; new `common_skip_download_exception`; new opt `skip_download` returns `-2` on missing/etag mismatch. Project does not include `download.h` directly, no source changes required |
-| ~b9354–b9437 | `tools/server/server-task.h` + `server-task.cpp` | `task_params::stream` default `true` &#x2192; `false`; new `server_task_result_cmpl_partial::is_begin` bool to let HTTP layer emit SSE headers before the first delta; `to_json()` may now return `nullptr` for the begin marker. Project always sets `stream` explicitly from Java (`LlamaIterator.java`, `LlamaModel.java`) so the default change is inert; the `is_begin` &amp; nullable-`to_json` behaviour is contained inside compiled-from-upstream `server-context.cpp` &amp; `server-task.cpp` |
-| ~b9354–b9437 | `tools/server/server-context.cpp` + `server-queue.cpp` | `send_partial_response()` gained `is_begin` parameter (defaulted); SSE stream now emits a no-content opening event when `stream &amp;&amp; !return_progress` so the client sees HTTP 200 + headers before first token. `server_response_reader::next()` 30s warn-on-cancel diagnostic message updated. Compiled-from-upstream only, no project source changes required |
+| ~b9354–b9437 | `tools/server/server-task.h` + `server-task.cpp` | `task_params::stream` default `true` &#x2192; `false`; new `server_task_result_cmpl_partial::is_begin` bool to let HTTP layer emit SSE headers before the first delta; `to_json()` returns `nullptr` for the begin marker (sentinel meaning "HTTP-headers-only, no body"). Project always sets `stream` explicitly from Java (`LlamaIterator.java`, `LlamaModel.java`) so the default change is inert. The `is_begin` / nullable-`to_json` contract DOES leak into the JNI bridge &mdash; see the row below for the required fix |
+| ~b9354–b9437 | `tools/server/server-context.cpp` + `server-queue.cpp` | `send_partial_response()` gained `is_begin` parameter (defaulted); SSE stream now emits a no-content opening event when `stream &amp;&amp; !return_progress` (`server-context.cpp:2835`) so the client sees HTTP 200 + headers before first token. `server_response_reader::next()` 30s warn-on-cancel diagnostic message updated. **Required project source change**: `Java_net_ladenthin_llama_LlamaModel_receiveCompletionJson` in `src/main/cpp/jllama.cpp` called `result->to_json()` once and assigned `response["stop"]`, which silently auto-promoted the `nullptr` to an object `{"stop": false}` and surfaced a phantom empty `LlamaOutput` to every Java streaming caller (`LlamaModelTest.testGenerateAnswer` and four sibling tests overran by +1 token). Fixed by wrapping the `rd->next()` call in a loop that skips `response.is_null()` results so only real events reach Java |
 | ~b9354–b9437 | `common/arg.cpp` (env-var renames) | `LLAMA_LOG_*` &#x2192; `LLAMA_ARG_LOG_*`, `LLAMA_OFFLINE` &#x2192; `LLAMA_ARG_OFFLINE`, `LLAMA_LOG_FILE` &#x2192; `LLAMA_ARG_LOG_FILE`, `LLAMA_CHAT_TEMPLATE_KWARGS` &#x2192; `LLAMA_ARG_CHAT_TEMPLATE_KWARGS`. CLI verbosity values relabeled (4=trace, 5=debug). The `--license` CLI flag was REMOVED and moved to the new `llama-app licenses` subcommand. Project does not expose these env vars or the `--license` flag through the Java API, no changes required |
 | ~b9354–b9437 | `src/llama.cpp` | `llama_backend_init()` device-discovery rule tightened: iGPUs are now added only when no discrete GPUs were found (was: when no devices at all). RPC servers no longer count as "found" for this purpose, so iGPU + RPC setups keep the local iGPU. Behavioural only, single-line caller in `jllama.cpp` unchanged |
 | ~b9354–b9437 | `src/llama-chat.cpp` | New `LLM_CHAT_TEMPLATE_GRANITE_4_1` enum value + "granite-4.1" template name; `granite-4.0` detection now requires the literal token `g4_default_system_message` in the template, otherwise it routes to 4.1. Project does not implement chat-template detection directly &mdash; routing happens inside compiled-from-upstream code, no source changes required |
diff --git a/src/main/cpp/jllama.cpp b/src/main/cpp/jllama.cpp
@@ -812,18 +812,28 @@ JNIEXPORT jstring JNICALL Java_net_ladenthin_llama_LlamaModel_receiveCompletionJ
         rd = it->second.get();
     }
 
-    server_task_result_ptr result = rd->next([] { return false; });
-
-    if (!result_ok_or_throw(env, result)) {
-        erase_reader(jctx, id_task);
-        return nullptr;
-    }
+    // Upstream b9437 added is_begin partial results whose to_json() returns
+    // a nullptr sentinel meaning "HTTP-headers-only, no body". Loop past
+    // those so the Java iterator only ever sees real events.
+    json response;
+    while (true) {
+        server_task_result_ptr result = rd->next([] { return false; });
+
+        if (!result_ok_or_throw(env, result)) {
+            erase_reader(jctx, id_task);
+            return nullptr;
+        }
 
-    json response      = result->to_json();
-    response["stop"]   = result->is_stop();
+        response = result->to_json();
+        if (response.is_null()) {
+            continue;
+        }
+        response["stop"] = result->is_stop();
 
-    if (result->is_stop()) {
-        erase_reader(jctx, id_task);
+        if (result->is_stop()) {
+            erase_reader(jctx, id_task);
+        }
+        break;
     }
 
     return json_to_jstring_impl(env, response);