Skip to content

Add OpenAI-compatible HTTP server (LlamaServer)#242

Merged
bernardladenthin merged 3 commits into
mainfrom
claude/cool-curie-ym3acr
Jun 18, 2026
Merged

Add OpenAI-compatible HTTP server (LlamaServer)#242
bernardladenthin merged 3 commits into
mainfrom
claude/cool-curie-ym3acr

Conversation

@bernardladenthin

Copy link
Copy Markdown
Owner

Summary

  • Adds a runnable OpenAI-compatible HTTP server (LlamaServer) as the Main-Class of the fat JAR, enabling users to serve inference via standard /v1/chat/completions, /v1/completions, /v1/embeddings, and /v1/models endpoints.
  • Implements a clean separation of concerns: OaiRouter (testable routing logic independent of HTTP layer), OaiHttpServer (NanoHTTPD adapter), LlamaServerArgs (command-line parsing), and LlamaModelOaiBackend (inference delegation).
  • Adds Maven assembly profile to build a fat JAR with all dependencies bundled, runnable via java -jar llama-<version>-jar-with-dependencies.jar --model model.gguf --port 8080.

Test plan

  • Added comprehensive unit tests for LlamaServerArgs (flag parsing, defaults, validation, error messages)
  • Added unit tests for OaiRouter (endpoint dispatch, method/body preconditions, error handling, query string stripping)
  • Added integration test for OaiHttpServer (real loopback socket, NanoHTTPD adapter, request/response round-trip)
  • Existing tests pass; CI is green

Related issues / PRs

Closes the OpenAI-compatible server feature gap documented in TODO.md (non-streaming MVP).

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes (command-line argument handling is documented in spotbugs-exclude.xml as CLI-only threat model)

https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6

claude added 3 commits June 18, 2026 22:04
Add a managed maven-assembly-plugin (3.8.0) and an `assembly` profile
that builds llama-<version>-jar-with-dependencies.jar: the library
classes, all Java runtime dependencies, and the default-platform native
libs from src/main/resources in one drop-on-classpath JAR (no
Main-Class - it is a library). Activate it in the package job
(-P release,cuda,opencl-android,assembly) so the uber JAR rides along in
the existing `llama-jars` upload-artifact (a CI run artifact only, not a
Maven Central / GitHub-Release asset). Document the command in CLAUDE.md.

Recorded as deliberate cross-repo non-parity (BAF + jllama only) in
workspace/crossrepostatus.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6
Introduce net.ladenthin.llama.server.LlamaServer, a runnable main class (and the
fat-jar Main-Class) that loads a GGUF model in-process and serves the
OpenAI-compatible endpoints over a tiny NanoHTTPD server:

  POST /v1/chat/completions  -> LlamaModel.handleChatCompletions
  POST /v1/completions       -> LlamaModel.handleCompletionsOai
  POST /v1/embeddings        -> LlamaModel.handleEmbeddings (needs --embedding)
  GET  /v1/models            -> configured model alias
  GET  /health               -> {"status":"ok"}

The handle* methods already return OAI-shaped JSON, so the server only forwards
request bodies. Design:
- OaiRouter (model-free, unit-tested) maps method+path+body to a response;
  OaiHttpServer is the thin NanoHTTPD adapter; LlamaModelOaiBackend bridges to
  LlamaModel; LlamaServerArgs parses --model/--host/--port/--ctx-size/
  --n-gpu-layers/--threads/--embedding/--model-alias/--help.
- handleChatCompletions widened to public to match the other raw OAI handlers.
- NanoHTTPD is an <optional> compile dependency: bundled in the fat jar, not
  inherited by library consumers (Java-8 clean, zero transitive deps).
- New `server` ArchUnit layer (the only layer allowed to access the Api root).
- spotbugs-exclude: PATH_TRAVERSAL_IN + CRLF_INJECTION_LOGS on the server
  package (operator-supplied CLI input; same threat model as LlamaLoader), CC on
  the flag switch (desugared String-switch artifact), EI_EXPOSE_REP2 on the
  backend (non-owning model wrapper, mirrors Session).

Tests (model-free): LlamaServerArgsTest (10), OaiRouterTest (10),
OaiHttpServerIntegrationTest (real loopback socket + fake backend, 1). Verified:
spotless, compile (Error Prone/NullAway/Checker), spotbugs Max+Low, javadoc, and
the assembly fat jar (Main-Class set, NanoHTTPD bundled) all clean.

Docs: README "OpenAI-compatible HTTP server" + Features bullet; CLAUDE.md note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6
Build comments: drop the 'deliberate non-parity (BAF + jllama only)' restatement and the crossrepostatus.md pointer from the package job + assembly profile comments (that lives only in the cross-repo doc). Also correct the now-stale 'no Main-Class' wording in both: the assembly fat jar is runnable via its LlamaServer Main-Class.

TODO: add an item to implement OpenAI-style SSE token streaming for the server (stream:true) and to find a Java-8-compatible HTTP layer with SSE support, or implement SSE on the existing NanoHTTPD via chunked responses. Javalin (the SSE-capable option) is unusable here: v5 needs Java 11, v6 needs Java 17, v4 is EOL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UZbmBX5CjqVwPcaTS61im6
@bernardladenthin bernardladenthin merged commit 59abd79 into main Jun 18, 2026
9 of 11 checks passed
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
69.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

bernardladenthin pushed a commit that referenced this pull request Jun 18, 2026
Make PR #240 mergeable on top of the NanoHTTPD OpenAI server that landed
via #242. Both OpenAI-compatible server implementations now coexist in
net.ladenthin.llama.server, pending a "best of both" consolidation
(tracked in TODO.md):

- OpenAiCompatServer (PR #240): JDK com.sun.net.httpserver, streaming SSE
  with delta.tool_calls, no new runtime dependency.
- LlamaServer (#242): NanoHTTPD, non-streaming, fat-jar Main-Class, plus
  /v1/completions, /v1/embeddings and /health.

Conflicts resolved:
- server/package-info.java (add/add): documents both servers + pending merge.
- README.md, CLAUDE.md: keep both server sections under one heading.
- TODO.md: add a consolidation task; note SSE is already solved by
  OpenAiCompatServer and that com.sun.net.httpserver is the supported
  jdk.httpserver module, not an internal com.sun.. API.

Auto-merged and verified consistent: LlamaModel.java (distinct native
methods on each side), publish.yml, pom.xml (NanoHTTPD + assembly),
spotbugs-exclude.xml, and LlamaArchitectureTest.java — main's Server
layer already permits this session's server classes (they touch only the
Api root + Parameters), and the noInternalJdkImports com.sun.net.httpserver
exception merges alongside.

Verified: mvn compile + test-compile clean; 64 model-free tests pass
(LlamaArchitectureTest + both servers' unit/HTTP tests, integration test
self-skips without a model); javadoc jar builds clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014L2dLbAtwdq7C6a2gFRsQQ
bernardladenthin pushed a commit that referenced this pull request Jun 20, 2026
…p NanoHTTPD

Two interim OpenAI-compatible servers coexisted in net.ladenthin.llama.server
(PR #240's JDK com.sun.net.httpserver streaming server on top of #242's NanoHTTPD
blocking server). Settle on one: keep the JDK + SSE-streaming core, absorb the
NanoHTTPD server's extra routes / CLI / fat-jar entry point, then delete it.

Survivor: OpenAiCompatServer (dependency-free, embeddable, fat-jar Main-Class).
- Streaming chat via SSE with delta.tool_calls + prefill heartbeats (unchanged).
- Ported routes: POST /v1/completions, POST /v1/embeddings, GET /health.
- Broadened the model-free test seam ChatBackend -> OpenAiBackend (+ completions,
  embeddings); LlamaModelChatBackend -> LlamaModelBackend forwards the two new
  routes to handleCompletionsOai / handleEmbeddings.
- New testable CLI parser OpenAiServerCli (short/long/alias flags, --help,
  validation) replacing the inline arg map and the deleted LlamaServerArgs;
  produces ModelParameters + OpenAiServerConfig.

Deleted NanoHTTPD impl: LlamaServer, LlamaServerArgs, LlamaServerConfig,
OaiHttpServer, OaiRouter, OaiBackend, OaiResponse, LlamaModelOaiBackend
(+ OaiRouterTest, LlamaServerArgsTest, OaiHttpServerIntegrationTest).

Reconciliation:
- pom.xml: drop org.nanohttpd dependency + version; assembly Main-Class ->
  OpenAiCompatServer.
- spotbugs-exclude.xml: retarget CC_CYCLOMATIC_COMPLEXITY to OpenAiServerCli.parse;
  drop the LlamaModelOaiBackend EI_EXPOSE_REP2 entry (survivor is package-private,
  like the old LlamaModelChatBackend, which needed none).
- LlamaArchitectureTest Server layer + com.sun.net.httpserver exception and
  module-info `requires jdk.httpserver` unchanged (still correct for the survivor).
- LlamaModel javadoc link, README, CLAUDE.md, TODO.md, publish.yml comment updated;
  removed the consolidation block and the now-moot "implement SSE" TODO (its premise
  that com.sun.net.httpserver is ArchUnit-banned was wrong: it is the supported,
  exported jdk.httpserver module).

C++ (jllama.cpp / json_helpers.hpp / wrap_stream_chunk + its tests) unchanged: the
streaming path survives.

Verification (model-free): mvn compile test-compile; targeted tests
(LlamaArchitectureTest, OpenAiRequestMapperTest, OpenAiSseFormatterTest,
ChatStreamChunkParserTest, OpenAiCompatServerHttpTest, OpenAiServerCliTest) all
green; javadoc:jar clean; spotless:check clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
@bernardladenthin bernardladenthin deleted the claude/cool-curie-ym3acr branch June 20, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants