build(server): compile upstream server-http.cpp + cpp-httplib into libjllama

claude · claude · commit c10a6ec0d012 · 2026-06-19T19:11:05.000Z
First step toward driving the OpenAI-compatible server natively from JNI, shipped inside libjllama rather than as a standalone llama-server executable (a JNI .so/.dll/.dylib loads anywhere a JVM runs; a separate binary does not, which is the whole point of preferring the JNI path here). This commit only makes the HTTP layer build and link — no JNI route wiring yet. What changed (CMakeLists.txt): - Compile tools/server/server-http.cpp (the upstream server_http_context HTTP transport) and vendor/cpp-httplib/httplib.cpp directly into jllama, on all platforms (the getifaddrs API-24 gate cpp-httplib needs on Android is already satisfied by the existing __ANDROID_UNAVAILABLE_SYMBOLS_ARE_WEAK__ define). - <cpp-httplib/httplib.h> already resolves via llama-common's vendor/ include dir, whose bundled nlohmann/json is the same 3.12.0 as our FetchContent copy, so nothing is shadowed and no extra include dir is required for it. - Mirror upstream's cpp-httplib tuning defines (payload/URI/backlog limits, TCP_NODELAY) on jllama so httplib.cpp and the server-http.cpp that includes httplib.h agree on the inline behaviour those macros control. - Silence httplib.cpp warnings (-w / /w), matching upstream's own target. - Link ws2_32 on MinGW (MSVC auto-links it via a pragma in httplib.h). - No SSL: CPPHTTPLIB_OPENSSL_SUPPORT is left undefined (plain HTTP for now; bind localhost or front with a TLS proxy). WebUI stub (src/main/cpp/webui_stub/ui.h): - server-http.cpp does #include "ui.h" — the asset table tools/ui (llama-ui) normally GENERATES via the llama-ui-embed host tool. We do not ship the Svelte WebUI (it needs npm or a prebuilt-asset download), so this header supplies the exact "empty asset table" interface embed.cpp emits for n_assets == 0: the llama_ui_asset struct plus llama_ui_find_asset / llama_ui_use_gzip / llama_ui_get_assets. LLAMA_UI_HAS_ASSETS is intentionally left undefined, so every static-asset-serving block in server-http.cpp compiles out; the single unguarded use iterates the (empty) asset list. Header-only (.h) so it is outside the clang-format glob, which only covers *.cpp/*.hpp. server.cpp (standalone main() + route wiring) stays excluded — wiring those routes to a JNI entry point is the next step. Verified locally (Linux x86_64): - cmake --build --target jllama -> [100%] Built target jllama (clean). - libjllama.so contains server_http_context::init/start/stop (T) and ~1.8k httplib symbols, with zero undefined server-http/httplib symbols. - NativeLibraryLoadSmokeTest: Tests run: 1, Failures: 0, Skipped: 0 (the larger lib still loads and JNI_OnLoad resolves every referenced Java class). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -483,7 +483,7 @@ If the local check passes (`BUILD SUCCESS`), the `mvn package` job in
 - `json_helpers.hpp` — Pure JSON transformation helpers (no JNI, no llama state). Independently unit-testable.
 - `jni_helpers.hpp` — JNI bridge helpers (handle management + server orchestration). Includes `json_helpers.hpp`.
 - Uses `nlohmann/json` for JSON deserialization of parameters.
-- The upstream server library (`server-context.cpp`, `server-queue.cpp`, `server-task.cpp`, `server-models.cpp`) is compiled directly into `jllama` via CMake — there is no hand-ported `server.hpp` fork.
+- The upstream server library (`server-context.cpp`, `server-queue.cpp`, `server-task.cpp`, `server-models.cpp`) is compiled directly into `jllama` via CMake — there is no hand-ported `server.hpp` fork. **Phase 2:** the upstream HTTP transport (`tools/server/server-http.cpp`) and its `cpp-httplib` backend (`vendor/cpp-httplib/httplib.cpp`) are now compiled into `jllama` too, so the OpenAI-compatible server can be driven natively from JNI *inside* `libjllama` — no separate `llama-server` executable (a JNI shared library loads anywhere a JVM runs, which a standalone binary does not). `server-http.cpp` does `#include "ui.h"` (the WebUI asset table that `tools/ui`/`llama-ui` normally generates); since the Svelte WebUI is not shipped, `src/main/cpp/webui_stub/ui.h` supplies the upstream **empty-asset** interface and leaves `LLAMA_UI_HAS_ASSETS` undefined (all static-asset-serving blocks compile out). `<cpp-httplib/httplib.h>` already resolves via `llama-common`'s `vendor/` include dir (same nlohmann/json 3.12.0 as the FetchContent copy). No SSL: `CPPHTTPLIB_OPENSSL_SUPPORT` is left undefined (plain-HTTP; bind localhost / front with a TLS proxy). Only `server.cpp` (the standalone `main()` + route wiring) remains excluded — wiring the routes to JNI is the next step.
 
 ### Native Helper Architecture
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -261,7 +261,6 @@ add_library(jllama SHARED
 
 # Phase 1 refactoring: compile upstream server library units directly into jllama
 # server.hpp has been replaced by direct upstream includes in jllama.cpp.
-# server-http.cpp and server.cpp (main) are intentionally excluded.
 # server-context.cpp, server-queue.cpp, server-task.cpp compile on all platforms
 # including Android.  server-models.cpp is excluded on Android because it pulls
 # in subprocess.h which calls posix_spawn_*, declared but not implemented by the
@@ -278,9 +277,49 @@ if(NOT ANDROID_ABI AND NOT OS_NAME MATCHES "Android")
     )
 endif()
 
+# Phase 2: also compile the upstream HTTP transport (server-http.cpp) and its
+# cpp-httplib backend directly into jllama, so the OpenAI-compatible server can be
+# driven natively from JNI — shipped inside libjllama, with no separate
+# llama-server executable (a JNI .so/.dll/.dylib loads everywhere a JVM runs,
+# unlike a standalone binary).  Only server.cpp (the standalone main() + route
+# wiring) stays excluded for now; this first step just makes the HTTP layer build
+# and link.
+#
+# server-http.cpp does `#include "ui.h"` — the WebUI asset table that tools/ui
+# normally GENERATES.  We do not ship the Svelte WebUI (it needs npm / a prebuilt
+# asset download), so src/main/cpp/webui_stub/ui.h supplies the upstream "empty
+# asset table" interface instead (see that file).  <cpp-httplib/httplib.h> already
+# resolves via llama-common's vendor/ include dir, whose bundled nlohmann/json is
+# the same 3.12.0 as our FetchContent copy, so adding nothing there shadows it.
+target_sources(jllama PRIVATE
+    ${llama.cpp_SOURCE_DIR}/tools/server/server-http.cpp
+    ${llama.cpp_SOURCE_DIR}/vendor/cpp-httplib/httplib.cpp
+)
+
+# cpp-httplib is third-party: silence its warnings (matching upstream's own
+# cpp-httplib target, which compiles it with -w / /w).  No SSL is enabled —
+# CPPHTTPLIB_OPENSSL_SUPPORT is left undefined — so the embedded server is
+# plain-HTTP for now (bind to localhost or front it with a TLS proxy).
+if(MSVC)
+    set_source_files_properties(
+        ${llama.cpp_SOURCE_DIR}/vendor/cpp-httplib/httplib.cpp
+        PROPERTIES COMPILE_FLAGS "/w")
+else()
+    set_source_files_properties(
+        ${llama.cpp_SOURCE_DIR}/vendor/cpp-httplib/httplib.cpp
+        PROPERTIES COMPILE_FLAGS "-w")
+endif()
+
+# MinGW needs ws2_32 explicitly; MSVC auto-links it via a #pragma in httplib.h.
+if(WIN32 AND NOT MSVC)
+    target_link_libraries(jllama PRIVATE ws2_32)
+endif()
+
 set_target_properties(jllama PROPERTIES POSITION_INDEPENDENT_CODE ON)
 target_include_directories(jllama PRIVATE
     src/main/cpp
+    # webui_stub/ui.h stands in for the generated llama-ui header (see Phase 2 above)
+    src/main/cpp/webui_stub
     ${JNI_INCLUDE_DIRS}
     ${llama.cpp_SOURCE_DIR}/tools/mtmd
     ${llama.cpp_SOURCE_DIR}/tools/server)
@@ -289,6 +328,13 @@ target_compile_features(jllama PRIVATE cxx_std_11)
 
 target_compile_definitions(jllama PRIVATE
     SERVER_VERBOSE=$<BOOL:${LLAMA_VERBOSE}>
+    # cpp-httplib tuning — mirror the defines upstream's cpp-httplib target sets so
+    # httplib.cpp and every TU that includes httplib.h (server-http.cpp) agree on
+    # the inline behaviour these macros control.
+    CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH=1048576
+    CPPHTTPLIB_LISTEN_BACKLOG=512
+    CPPHTTPLIB_REQUEST_URI_MAX_LENGTH=32768
+    CPPHTTPLIB_TCP_NODELAY=1
 )
 
 if(OS_NAME STREQUAL "Windows")
diff --git a/src/main/cpp/webui_stub/ui.h b/src/main/cpp/webui_stub/ui.h
@@ -0,0 +1,52 @@
+// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
+//
+// SPDX-License-Identifier: MIT
+
+#pragma once
+
+// ui.h — minimal stand-in for the WebUI asset interface that llama.cpp's
+// tools/ui (CMake target "llama-ui") normally GENERATES into ui.h / ui.cpp at
+// build time via the llama-ui-embed host tool.
+//
+// The upstream HTTP transport (tools/server/server-http.cpp) does
+//     #include "ui.h"
+// and references llama_ui_get_assets() / llama_ui_find_asset() /
+// llama_ui_use_gzip().  We compile server-http.cpp directly into libjllama but do
+// NOT ship the Svelte WebUI assets (building them needs npm, or a prebuilt-asset
+// download from Hugging Face) — so we provide the exact "empty asset table"
+// interface that embed.cpp emits for its n_assets == 0 branch: the struct plus
+// the three functions, returning nothing.
+//
+// LLAMA_UI_HAS_ASSETS is intentionally left UNDEFINED.  Every static-asset-serving
+// block in server-http.cpp is guarded by `#if defined(LLAMA_UI_HAS_ASSETS)`, so
+// all of them compile out; the single unguarded use — iterating the asset list to
+// collect public endpoint paths — simply iterates this empty array.
+//
+// To actually ship the WebUI later: remove this stub directory from jllama's
+// include path, build the real llama-ui target (assets on), and add its
+// generated-header directory instead.
+
+#include <array>
+#include <cstddef>
+#include <string>
+
+struct llama_ui_asset {
+    std::string name;
+    const unsigned char * data;
+    std::size_t size;
+    std::string etag;
+    std::string type;
+};
+
+inline const llama_ui_asset * llama_ui_find_asset(const std::string & /*name*/) {
+    return nullptr;
+}
+
+inline bool llama_ui_use_gzip() {
+    return false;
+}
+
+inline const std::array<llama_ui_asset, 0> & llama_ui_get_assets() {
+    static const std::array<llama_ui_asset, 0> empty{};
+    return empty;
+}