Skip to content

Commit 42bb74f

Browse files
Merge pull request #202 from bernardladenthin/claude/eloquent-rubin-qzL1D
Fix JNI bridge to skip null sentinel results from llama.cpp b9437
2 parents 24a870c + 590ef9f commit 42bb74f

2 files changed

Lines changed: 22 additions & 12 deletions

File tree

docs/history/llama-cpp-breaking-changes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -271,8 +271,8 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
271271
| ~b9333–b9354 | upstream CI (`.github/workflows/`) | CANN and SYCL builds disabled to save Actions resources; macOS builds moved to `build-apple.yml`; cache keys prefixed with `cache-gha-`; `[no release]` commit message token skips release pipeline; no project changes required |
272272
| ~b9354–b9437 | `common/common.h` + `common/arg.h` + `common/arg.cpp` | `common_params_handle_models()` return type `void` → `bool` (caller can detect skip-download misses); new `common_params::skip_download`; `common_params::timeout_read` default raised 600 → 3600. Project does not call `common_params_handle_models()` directly — arg parsing happens upstream; the new defaults flow through transparently |
273273
| ~b9354–b9437 | `common/download.h` + `common/download.cpp` | `common_download_model()` parameter list trimmed: `download_mmproj`/`download_mtp` moved into `common_download_opts`; new `common_skip_download_exception`; new opt `skip_download` returns `-2` on missing/etag mismatch. Project does not include `download.h` directly, no source changes required |
274-
| ~b9354–b9437 | `tools/server/server-task.h` + `server-task.cpp` | `task_params::stream` default `true` → `false`; new `server_task_result_cmpl_partial::is_begin` bool to let HTTP layer emit SSE headers before the first delta; `to_json()` may now return `nullptr` for the begin marker. Project always sets `stream` explicitly from Java (`LlamaIterator.java`, `LlamaModel.java`) so the default change is inert; the `is_begin` & nullable-`to_json` behaviour is contained inside compiled-from-upstream `server-context.cpp` & `server-task.cpp` |
275-
| ~b9354–b9437 | `tools/server/server-context.cpp` + `server-queue.cpp` | `send_partial_response()` gained `is_begin` parameter (defaulted); SSE stream now emits a no-content opening event when `stream && !return_progress` so the client sees HTTP 200 + headers before first token. `server_response_reader::next()` 30s warn-on-cancel diagnostic message updated. Compiled-from-upstream only, no project source changes required |
274+
| ~b9354–b9437 | `tools/server/server-task.h` + `server-task.cpp` | `task_params::stream` default `true` → `false`; new `server_task_result_cmpl_partial::is_begin` bool to let HTTP layer emit SSE headers before the first delta; `to_json()` returns `nullptr` for the begin marker (sentinel meaning "HTTP-headers-only, no body"). Project always sets `stream` explicitly from Java (`LlamaIterator.java`, `LlamaModel.java`) so the default change is inert. The `is_begin` / nullable-`to_json` contract DOES leak into the JNI bridge — see the row below for the required fix |
275+
| ~b9354–b9437 | `tools/server/server-context.cpp` + `server-queue.cpp` | `send_partial_response()` gained `is_begin` parameter (defaulted); SSE stream now emits a no-content opening event when `stream && !return_progress` (`server-context.cpp:2835`) so the client sees HTTP 200 + headers before first token. `server_response_reader::next()` 30s warn-on-cancel diagnostic message updated. **Required project source change**: `Java_net_ladenthin_llama_LlamaModel_receiveCompletionJson` in `src/main/cpp/jllama.cpp` called `result->to_json()` once and assigned `response["stop"]`, which silently auto-promoted the `nullptr` to an object `{"stop": false}` and surfaced a phantom empty `LlamaOutput` to every Java streaming caller (`LlamaModelTest.testGenerateAnswer` and four sibling tests overran by +1 token). Fixed by wrapping the `rd->next()` call in a loop that skips `response.is_null()` results so only real events reach Java |
276276
| ~b9354–b9437 | `common/arg.cpp` (env-var renames) | `LLAMA_LOG_*` → `LLAMA_ARG_LOG_*`, `LLAMA_OFFLINE` → `LLAMA_ARG_OFFLINE`, `LLAMA_LOG_FILE` → `LLAMA_ARG_LOG_FILE`, `LLAMA_CHAT_TEMPLATE_KWARGS` → `LLAMA_ARG_CHAT_TEMPLATE_KWARGS`. CLI verbosity values relabeled (4=trace, 5=debug). The `--license` CLI flag was REMOVED and moved to the new `llama-app licenses` subcommand. Project does not expose these env vars or the `--license` flag through the Java API, no changes required |
277277
| ~b9354–b9437 | `src/llama.cpp` | `llama_backend_init()` device-discovery rule tightened: iGPUs are now added only when no discrete GPUs were found (was: when no devices at all). RPC servers no longer count as "found" for this purpose, so iGPU + RPC setups keep the local iGPU. Behavioural only, single-line caller in `jllama.cpp` unchanged |
278278
| ~b9354–b9437 | `src/llama-chat.cpp` | New `LLM_CHAT_TEMPLATE_GRANITE_4_1` enum value + "granite-4.1" template name; `granite-4.0` detection now requires the literal token `g4_default_system_message` in the template, otherwise it routes to 4.1. Project does not implement chat-template detection directly — routing happens inside compiled-from-upstream code, no source changes required |

src/main/cpp/jllama.cpp

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -812,18 +812,28 @@ JNIEXPORT jstring JNICALL Java_net_ladenthin_llama_LlamaModel_receiveCompletionJ
812812
rd = it->second.get();
813813
}
814814

815-
server_task_result_ptr result = rd->next([] { return false; });
816-
817-
if (!result_ok_or_throw(env, result)) {
818-
erase_reader(jctx, id_task);
819-
return nullptr;
820-
}
815+
// Upstream b9437 added is_begin partial results whose to_json() returns
816+
// a nullptr sentinel meaning "HTTP-headers-only, no body". Loop past
817+
// those so the Java iterator only ever sees real events.
818+
json response;
819+
while (true) {
820+
server_task_result_ptr result = rd->next([] { return false; });
821+
822+
if (!result_ok_or_throw(env, result)) {
823+
erase_reader(jctx, id_task);
824+
return nullptr;
825+
}
821826

822-
json response = result->to_json();
823-
response["stop"] = result->is_stop();
827+
response = result->to_json();
828+
if (response.is_null()) {
829+
continue;
830+
}
831+
response["stop"] = result->is_stop();
824832

825-
if (result->is_stop()) {
826-
erase_reader(jctx, id_task);
833+
if (result->is_stop()) {
834+
erase_reader(jctx, id_task);
835+
}
836+
break;
827837
}
828838

829839
return json_to_jstring_impl(env, response);

0 commit comments

Comments
 (0)