Skip to content

Commit e595af6

Browse files
committed
Trim cross-repo note from assembly comments; add SSE server TODO
Build comments: drop the 'deliberate non-parity (BAF + jllama only)' restatement and the crossrepostatus.md pointer from the package job + assembly profile comments (that lives only in the cross-repo doc). Also correct the now-stale 'no Main-Class' wording in both: the assembly fat jar is runnable via its LlamaServer Main-Class. TODO: add an item to implement OpenAI-style SSE token streaming for the server (stream:true) and to find a Java-8-compatible HTTP layer with SSE support, or implement SSE on the existing NanoHTTPD via chunked responses. Javalin (the SSE-capable option) is unusable here: v5 needs Java 11, v6 needs Java 17, v4 is EOL. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6
1 parent 18e3008 commit e595af6

3 files changed

Lines changed: 42 additions & 10 deletions

File tree

.github/workflows/publish.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -820,10 +820,9 @@ jobs:
820820
- name: Build JARs
821821
# `assembly` additionally produces the fat jar-with-dependencies uber JAR
822822
# (llama-<version>-jar-with-dependencies.jar: library classes + Java runtime deps +
823-
# default-platform native libs in one drop-on-classpath JAR; no Main-Class - it is a
824-
# library). It lands in target/ and is uploaded in the `llama-jars` artifact below - a
825-
# CI run artifact only, NOT a Maven Central / GitHub-Release asset. Documented as
826-
# deliberate non-parity (BAF + jllama only) in workspace/crossrepostatus.md.
823+
# default-platform native libs in one drop-on-classpath JAR, runnable via its
824+
# LlamaServer Main-Class). It lands in target/ and is uploaded in the `llama-jars`
825+
# artifact below - a CI run artifact only, not a Maven Central / GitHub-Release asset.
827826
run: mvn --batch-mode --no-transfer-progress -P release,cuda,opencl-android,assembly -Dmaven.test.skip=true -Dgpg.skip=true package
828827
- name: Upload JARs
829828
uses: actions/upload-artifact@v7

TODO.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,40 @@ These are JNI plumbing items for upstream API additions. Policy: add only after
5555

5656
**Out of scope until evidence supports it**: actually implementing any of the above. This entry exists so that when someone asks "can I ship java-llama.cpp as a single 30 MB binary?" the answer points to a concrete investigation plan rather than restarting from zero.
5757

58+
### OpenAI-compatible server: token streaming (SSE) + Java-8 HTTP layer
59+
60+
The `net.ladenthin.llama.server.LlamaServer` MVP is **non-streaming**: every request calls
61+
the blocking `LlamaModel.handle*` method and returns the full JSON response in one shot. A
62+
client that sends `"stream": true` still receives a single response, not the incremental
63+
`text/event-stream` (SSE) `data: {chunk}\n\n` events the OpenAI API emits for streaming
64+
chat/completions. This is the main functional gap of the server today.
65+
66+
The token source already exists — `LlamaModel.generateChat(InferenceParameters)` /
67+
`generate(...)` yield tokens incrementally through a Java `Iterator` (`LlamaIterable`). What
68+
is missing is an HTTP layer that emits SSE.
69+
70+
**Find a Java-8-compatible HTTP layer with good SSE support (alternative to Javalin), or
71+
implement SSE on NanoHTTPD.** Javalin has a first-class `ctx.sse(...)` API but is **not
72+
usable here**: Javalin 5 requires Java 11 and Javalin 6 requires Java 17, while this repo
73+
targets Java 8; Javalin 4 (the last Java-8 release) is EOL. Options, in rough order of
74+
preference:
75+
- **Implement SSE on the existing NanoHTTPD** via `NanoHTTPD.newChunkedResponse(status,
76+
"text/event-stream", InputStream)`, bridging a `LlamaIterable` to an `InputStream` that
77+
writes `data: {chunk}\n\n` frames. No new dependency, stays Java-8 clean; likely the right
78+
answer. Cost: the iterator→SSE bridge plus closing the `LlamaIterable` on client
79+
disconnect.
80+
- **Undertow** — Java-8 compatible, has a server-sent-events handler, but a heavier
81+
dependency tree.
82+
- **Spark Java** (Jetty 9) — Java-8 compatible; SSE support is limited/manual.
83+
- Avoid: Javalin 5/6 (Java 11/17), Javalin 4 (EOL), and the JDK `com.sun.net.httpserver`
84+
(ArchUnit-banned `com.sun..`).
85+
86+
Scope when implemented: honour `"stream": true` on `POST /v1/chat/completions` and
87+
`POST /v1/completions`, emit OpenAI-style SSE chunks terminated by `data: [DONE]`, close the
88+
underlying `LlamaIterable` on disconnect, and keep the non-streaming path as the default. Add
89+
a model-free routing test plus a real-socket SSE integration test (mirroring
90+
`OaiHttpServerIntegrationTest`).
91+
5892
## Open — cross-cutting (slice for this repo)
5993

6094
- **jqwik pin policy** — see [`../workspace/policies/jqwik-prompt-injection.md`](../workspace/policies/jqwik-prompt-injection.md). `jqwik.version ≤ 1.9.3` is mandatory.

pom.xml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -992,12 +992,11 @@ SPDX-License-Identifier: MIT
992992
<!--
993993
Builds the fat jar-with-dependencies uber JAR: the library classes, the
994994
default-platform native libs from src/main/resources, and all runtime Java
995-
dependencies in one drop-on-classpath JAR. No Main-Class (this is a library,
996-
not a CLI). Off by default; the CI `package` job activates it so the uber JAR
997-
rides along in the `llama-jars` upload-artifact bundle (a CI run artifact only,
998-
not a Maven Central / GitHub-Release asset). Documented in CLAUDE.md
999-
"Build Commands" as `mvn -P assembly package` and as deliberate cross-repo
1000-
non-parity (BAF + jllama only) in workspace/crossrepostatus.md.
995+
dependencies in one drop-on-classpath JAR, runnable via the LlamaServer
996+
Main-Class (set below) to start the OpenAI-compatible HTTP server. Off by
997+
default; the CI `package` job activates it so the uber JAR rides along in the
998+
`llama-jars` upload-artifact bundle. Documented in CLAUDE.md "Build Commands"
999+
as `mvn -P assembly package`.
10011000
-->
10021001
<id>assembly</id>
10031002
<build>

0 commit comments

Comments
 (0)