Add OpenAI-compatible HTTP server (LlamaServer) by bernardladenthin · Pull Request #242 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-06-18T23:19:31Z

Summary

Adds a runnable OpenAI-compatible HTTP server (LlamaServer) as the Main-Class of the fat JAR, enabling users to serve inference via standard /v1/chat/completions, /v1/completions, /v1/embeddings, and /v1/models endpoints.
Implements a clean separation of concerns: OaiRouter (testable routing logic independent of HTTP layer), OaiHttpServer (NanoHTTPD adapter), LlamaServerArgs (command-line parsing), and LlamaModelOaiBackend (inference delegation).
Adds Maven assembly profile to build a fat JAR with all dependencies bundled, runnable via java -jar llama-<version>-jar-with-dependencies.jar --model model.gguf --port 8080.

Test plan

Added comprehensive unit tests for LlamaServerArgs (flag parsing, defaults, validation, error messages)
Added unit tests for OaiRouter (endpoint dispatch, method/body preconditions, error handling, query string stripping)
Added integration test for OaiHttpServer (real loopback socket, NanoHTTPD adapter, request/response round-trip)
Existing tests pass; CI is green

Related issues / PRs

Closes the OpenAI-compatible server feature gap documented in TODO.md (non-streaming MVP).

Checklist

I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
My commits follow Conventional Commits
No security-sensitive changes (command-line argument handling is documented in spotbugs-exclude.xml as CLI-only threat model)

https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6

Add a managed maven-assembly-plugin (3.8.0) and an `assembly` profile that builds llama-<version>-jar-with-dependencies.jar: the library classes, all Java runtime dependencies, and the default-platform native libs from src/main/resources in one drop-on-classpath JAR (no Main-Class - it is a library). Activate it in the package job (-P release,cuda,opencl-android,assembly) so the uber JAR rides along in the existing `llama-jars` upload-artifact (a CI run artifact only, not a Maven Central / GitHub-Release asset). Document the command in CLAUDE.md. Recorded as deliberate cross-repo non-parity (BAF + jllama only) in workspace/crossrepostatus.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6

Introduce net.ladenthin.llama.server.LlamaServer, a runnable main class (and the fat-jar Main-Class) that loads a GGUF model in-process and serves the OpenAI-compatible endpoints over a tiny NanoHTTPD server: POST /v1/chat/completions -> LlamaModel.handleChatCompletions POST /v1/completions -> LlamaModel.handleCompletionsOai POST /v1/embeddings -> LlamaModel.handleEmbeddings (needs --embedding) GET /v1/models -> configured model alias GET /health -> {"status":"ok"} The handle* methods already return OAI-shaped JSON, so the server only forwards request bodies. Design: - OaiRouter (model-free, unit-tested) maps method+path+body to a response; OaiHttpServer is the thin NanoHTTPD adapter; LlamaModelOaiBackend bridges to LlamaModel; LlamaServerArgs parses --model/--host/--port/--ctx-size/ --n-gpu-layers/--threads/--embedding/--model-alias/--help. - handleChatCompletions widened to public to match the other raw OAI handlers. - NanoHTTPD is an <optional> compile dependency: bundled in the fat jar, not inherited by library consumers (Java-8 clean, zero transitive deps). - New `server` ArchUnit layer (the only layer allowed to access the Api root). - spotbugs-exclude: PATH_TRAVERSAL_IN + CRLF_INJECTION_LOGS on the server package (operator-supplied CLI input; same threat model as LlamaLoader), CC on the flag switch (desugared String-switch artifact), EI_EXPOSE_REP2 on the backend (non-owning model wrapper, mirrors Session). Tests (model-free): LlamaServerArgsTest (10), OaiRouterTest (10), OaiHttpServerIntegrationTest (real loopback socket + fake backend, 1). Verified: spotless, compile (Error Prone/NullAway/Checker), spotbugs Max+Low, javadoc, and the assembly fat jar (Main-Class set, NanoHTTPD bundled) all clean. Docs: README "OpenAI-compatible HTTP server" + Features bullet; CLAUDE.md note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6

Build comments: drop the 'deliberate non-parity (BAF + jllama only)' restatement and the crossrepostatus.md pointer from the package job + assembly profile comments (that lives only in the cross-repo doc). Also correct the now-stale 'no Main-Class' wording in both: the assembly fat jar is runnable via its LlamaServer Main-Class. TODO: add an item to implement OpenAI-style SSE token streaming for the server (stream:true) and to find a Java-8-compatible HTTP layer with SSE support, or implement SSE on the existing NanoHTTPD via chunked responses. Javalin (the SSE-capable option) is unusable here: v5 needs Java 11, v6 needs Java 17, v4 is EOL. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6

sonarqubecloud · 2026-06-18T23:21:53Z

Quality Gate failed

Failed conditions
69.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Make PR #240 mergeable on top of the NanoHTTPD OpenAI server that landed via #242. Both OpenAI-compatible server implementations now coexist in net.ladenthin.llama.server, pending a "best of both" consolidation (tracked in TODO.md): - OpenAiCompatServer (PR #240): JDK com.sun.net.httpserver, streaming SSE with delta.tool_calls, no new runtime dependency. - LlamaServer (#242): NanoHTTPD, non-streaming, fat-jar Main-Class, plus /v1/completions, /v1/embeddings and /health. Conflicts resolved: - server/package-info.java (add/add): documents both servers + pending merge. - README.md, CLAUDE.md: keep both server sections under one heading. - TODO.md: add a consolidation task; note SSE is already solved by OpenAiCompatServer and that com.sun.net.httpserver is the supported jdk.httpserver module, not an internal com.sun.. API. Auto-merged and verified consistent: LlamaModel.java (distinct native methods on each side), publish.yml, pom.xml (NanoHTTPD + assembly), spotbugs-exclude.xml, and LlamaArchitectureTest.java — main's Server layer already permits this session's server classes (they touch only the Api root + Parameters), and the noInternalJdkImports com.sun.net.httpserver exception merges alongside. Verified: mvn compile + test-compile clean; 64 model-free tests pass (LlamaArchitectureTest + both servers' unit/HTTP tests, integration test self-skips without a model); javadoc jar builds clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014L2dLbAtwdq7C6a2gFRsQQ

…p NanoHTTPD Two interim OpenAI-compatible servers coexisted in net.ladenthin.llama.server (PR #240's JDK com.sun.net.httpserver streaming server on top of #242's NanoHTTPD blocking server). Settle on one: keep the JDK + SSE-streaming core, absorb the NanoHTTPD server's extra routes / CLI / fat-jar entry point, then delete it. Survivor: OpenAiCompatServer (dependency-free, embeddable, fat-jar Main-Class). - Streaming chat via SSE with delta.tool_calls + prefill heartbeats (unchanged). - Ported routes: POST /v1/completions, POST /v1/embeddings, GET /health. - Broadened the model-free test seam ChatBackend -> OpenAiBackend (+ completions, embeddings); LlamaModelChatBackend -> LlamaModelBackend forwards the two new routes to handleCompletionsOai / handleEmbeddings. - New testable CLI parser OpenAiServerCli (short/long/alias flags, --help, validation) replacing the inline arg map and the deleted LlamaServerArgs; produces ModelParameters + OpenAiServerConfig. Deleted NanoHTTPD impl: LlamaServer, LlamaServerArgs, LlamaServerConfig, OaiHttpServer, OaiRouter, OaiBackend, OaiResponse, LlamaModelOaiBackend (+ OaiRouterTest, LlamaServerArgsTest, OaiHttpServerIntegrationTest). Reconciliation: - pom.xml: drop org.nanohttpd dependency + version; assembly Main-Class -> OpenAiCompatServer. - spotbugs-exclude.xml: retarget CC_CYCLOMATIC_COMPLEXITY to OpenAiServerCli.parse; drop the LlamaModelOaiBackend EI_EXPOSE_REP2 entry (survivor is package-private, like the old LlamaModelChatBackend, which needed none). - LlamaArchitectureTest Server layer + com.sun.net.httpserver exception and module-info `requires jdk.httpserver` unchanged (still correct for the survivor). - LlamaModel javadoc link, README, CLAUDE.md, TODO.md, publish.yml comment updated; removed the consolidation block and the now-moot "implement SSE" TODO (its premise that com.sun.net.httpserver is ArchUnit-banned was wrong: it is the supported, exported jdk.httpserver module). C++ (jllama.cpp / json_helpers.hpp / wrap_stream_chunk + its tests) unchanged: the streaming path survives. Verification (model-free): mvn compile test-compile; targeted tests (LlamaArchitectureTest, OpenAiRequestMapperTest, OpenAiSseFormatterTest, ChatStreamChunkParserTest, OpenAiCompatServerHttpTest, OpenAiServerCliTest) all green; javadoc:jar clean; spotless:check clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF

claude added 3 commits June 18, 2026 22:04

bernardladenthin had a problem deploying to startgate June 18, 2026 23:19 — with GitHub Actions Error

bernardladenthin merged commit 59abd79 into main Jun 18, 2026
9 of 11 checks passed

bernardladenthin mentioned this pull request Jun 19, 2026

Consolidate OpenAI server: unify implementations, add multi-protocol support #243

Merged

7 tasks

bernardladenthin deleted the claude/cool-curie-ym3acr branch June 20, 2026 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI-compatible HTTP server (LlamaServer)#242

Add OpenAI-compatible HTTP server (LlamaServer)#242
bernardladenthin merged 3 commits into
mainfrom
claude/cool-curie-ym3acr

bernardladenthin commented Jun 18, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented Jun 18, 2026

Summary

Test plan

Related issues / PRs

Checklist

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 18, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants