You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/history/llama-cpp-breaking-changes.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -271,8 +271,8 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
271
271
|~b9333–b9354 | upstream CI (`.github/workflows/`) | CANN and SYCL builds disabled to save Actions resources; macOS builds moved to `build-apple.yml`; cache keys prefixed with `cache-gha-`; `[no release]` commit message token skips release pipeline; no project changes required |
272
272
|~b9354–b9437 |`common/common.h` + `common/arg.h` + `common/arg.cpp`|`common_params_handle_models()` return type `void`→`bool` (caller can detect skip-download misses); new `common_params::skip_download`; `common_params::timeout_read` default raised 600 → 3600. Project does not call `common_params_handle_models()` directly — arg parsing happens upstream; the new defaults flow through transparently |
273
273
|~b9354–b9437 |`common/download.h` + `common/download.cpp`|`common_download_model()` parameter list trimmed: `download_mmproj`/`download_mtp` moved into `common_download_opts`; new `common_skip_download_exception`; new opt `skip_download` returns `-2` on missing/etag mismatch. Project does not include `download.h` directly, no source changes required |
274
-
|~b9354–b9437 |`tools/server/server-task.h` + `server-task.cpp`|`task_params::stream` default `true`→`false`; new `server_task_result_cmpl_partial::is_begin` bool to let HTTP layer emit SSE headers before the first delta; `to_json()`may now return `nullptr` for the begin marker. Project always sets `stream` explicitly from Java (`LlamaIterator.java`, `LlamaModel.java`) so the default change is inert; the`is_begin`& nullable-`to_json`behaviour is contained inside compiled-from-upstream `server-context.cpp`&`server-task.cpp`|
275
-
|~b9354–b9437 |`tools/server/server-context.cpp` + `server-queue.cpp`|`send_partial_response()` gained `is_begin` parameter (defaulted); SSE stream now emits a no-content opening event when `stream && !return_progress` so the client sees HTTP 200 + headers before first token. `server_response_reader::next()` 30s warn-on-cancel diagnostic message updated. Compiled-from-upstream only, no project source changes required|
274
+
|~b9354–b9437 |`tools/server/server-task.h` + `server-task.cpp`|`task_params::stream` default `true`→`false`; new `server_task_result_cmpl_partial::is_begin` bool to let HTTP layer emit SSE headers before the first delta; `to_json()`returns `nullptr` for the begin marker (sentinel meaning "HTTP-headers-only, no body"). Project always sets `stream` explicitly from Java (`LlamaIterator.java`, `LlamaModel.java`) so the default change is inert. The`is_begin`/ nullable-`to_json`contract DOES leak into the JNI bridge — see the row below for the required fix|
275
+
|~b9354–b9437 |`tools/server/server-context.cpp` + `server-queue.cpp`|`send_partial_response()` gained `is_begin` parameter (defaulted); SSE stream now emits a no-content opening event when `stream && !return_progress`(`server-context.cpp:2835`) so the client sees HTTP 200 + headers before first token. `server_response_reader::next()` 30s warn-on-cancel diagnostic message updated. **Required project source change**: `Java_net_ladenthin_llama_LlamaModel_receiveCompletionJson` in `src/main/cpp/jllama.cpp` called `result->to_json()` once and assigned `response["stop"]`, which silently auto-promoted the `nullptr` to an object `{"stop": false}` and surfaced a phantom empty `LlamaOutput` to every Java streaming caller (`LlamaModelTest.testGenerateAnswer` and four sibling tests overran by +1 token). Fixed by wrapping the `rd->next()` call in a loop that skips `response.is_null()` results so only real events reach Java|
276
276
|~b9354–b9437 |`common/arg.cpp` (env-var renames) |`LLAMA_LOG_*`→`LLAMA_ARG_LOG_*`, `LLAMA_OFFLINE`→`LLAMA_ARG_OFFLINE`, `LLAMA_LOG_FILE`→`LLAMA_ARG_LOG_FILE`, `LLAMA_CHAT_TEMPLATE_KWARGS`→`LLAMA_ARG_CHAT_TEMPLATE_KWARGS`. CLI verbosity values relabeled (4=trace, 5=debug). The `--license` CLI flag was REMOVED and moved to the new `llama-app licenses` subcommand. Project does not expose these env vars or the `--license` flag through the Java API, no changes required |
277
277
|~b9354–b9437 |`src/llama.cpp`|`llama_backend_init()` device-discovery rule tightened: iGPUs are now added only when no discrete GPUs were found (was: when no devices at all). RPC servers no longer count as "found" for this purpose, so iGPU + RPC setups keep the local iGPU. Behavioural only, single-line caller in `jllama.cpp` unchanged |
278
278
|~b9354–b9437 |`src/llama-chat.cpp`| New `LLM_CHAT_TEMPLATE_GRANITE_4_1` enum value + "granite-4.1" template name; `granite-4.0` detection now requires the literal token `g4_default_system_message` in the template, otherwise it routes to 4.1. Project does not implement chat-template detection directly — routing happens inside compiled-from-upstream code, no source changes required |
0 commit comments