bernardladenthin
diff --git a/‎CLAUDE.md‎
Lines changed: 35 additions & 3 deletions b/‎CLAUDE.md‎
Lines changed: 35 additions & 3 deletions
diff --git a/‎CMakeLists.txt‎
Lines changed: 32 additions & 1 deletion b/‎CMakeLists.txt‎
Lines changed: 32 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎cmake/generate-tts-upstream.cmake‎
Lines changed: 105 additions & 0 deletions b/‎cmake/generate-tts-upstream.cmake‎
Lines changed: 105 additions & 0 deletions
@@ -384,6 +384,38 @@ Current patches:
 |-------|-------|
 | `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — an earlier count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
 
+## OuteTTS build-time extraction (`cmake/generate-tts-upstream.cmake`)
+
+The `TextToSpeech` native pipeline reuses llama.cpp's OuteTTS helpers (`tools/tts/tts.cpp`)
+**without hand-copying them**. A verbatim copy would be a DRY/maintenance hazard that silently
+diverges on every upgrade, and `tts.cpp` cannot simply be added to `target_sources` — it defines its
+own `main()`, which would clash at link time (the same reason `tools/server/server.cpp` is excluded
+while `server-*.cpp` are compiled in), and all its helpers are `static` (internal linkage), so they
+are unreachable from another TU even if it were linked.
+
+Instead the helpers are **DERIVED mechanically at configure time** from the pinned upstream source:
+
+- **`cmake/generate-tts-upstream.cmake`** — reads `${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp`, keeps
+  the pre-`main()` span (the DSP `fill_hann_window`/`irfft`/`fold`/`embd_to_audio`, the prompt/text
+  helpers incl. `process_text`'s number-to-words, the `outetts_version` enum), strips `static` from
+  the handful the JNI engine calls (giving them external linkage), and extracts the two hard-coded
+  default-speaker literals out of `main()` into `extern const` strings. Writes
+  `build/tts_generated/tts_upstream_gen.cpp`.
+- **`CMakeLists.txt`** — runs the generator via `execute_process` right after
+  `FetchContent_MakeAvailable(llama.cpp)`, then compiles the generated TU into `jllama`. The file is
+  **never committed** (build artifact, like the native libs / WebUI assets); it is regenerated from
+  whatever `tts.cpp` the pinned `GIT_TAG` resolves to, so a version bump is picked up automatically.
+- **`src/main/cpp/tts_upstream.h`** — committed, hand-written declarations of the extracted symbols
+  (interface facts, not the implementation). `tts_engine.cpp` includes it and links against the
+  generated definitions. The in-memory WAV writer (`tts_wav.hpp`) is ours, not extracted.
+
+**Fail-loud on drift (same contract as `patches/`):** the generator asserts every anchor — the
+`int main(` split point, each `static <signature>` it de-statics, and both speaker literals. If an
+upgrade renames a helper or moves a literal, the **configure step aborts** with a pointer to the
+generator; if upstream changes a *type*, `tts_upstream.h` stops matching and the **link fails**.
+Either way a silent divergence is impossible. On a llama.cpp bump, re-verify the generator the same
+way you re-verify `patches/`.
+
 ## Upgrading/Downgrading llama.cpp Version
 
 To change the llama.cpp version, update the following **three** files (and re-verify `patches/`):
@@ -744,7 +776,7 @@ If the local check passes (`BUILD SUCCESS`), the `mvn package` job in
 
 **Java layer** (`src/main/java/net/ladenthin/llama/`):
 - `LlamaModel` — Main API class (AutoCloseable). Wraps native context for inference, embeddings, re-ranking, and tokenization.
-- `TextToSpeech` — Separate AutoCloseable native type for speech synthesis over the two-model OuteTTS (text-to-codes) + WavTokenizer (codes-to-speech vocoder) pipeline; `synthesize(text)` returns a 24 kHz mono 16-bit WAV byte stream. Native engine in `tts_engine.{h,cpp}`, output DSP in `tts_dsp.hpp`.
+- `TextToSpeech` — Separate AutoCloseable native type for speech synthesis over the two-model OuteTTS (text-to-codes) + WavTokenizer (codes-to-speech vocoder) pipeline; `synthesize(text)` returns a 24 kHz mono 16-bit WAV byte stream. Native orchestration in `tts_engine.{h,cpp}`; the OuteTTS DSP / prompt / text helpers + default speaker are **derived at build time from upstream `tts.cpp`** (see "OuteTTS build-time extraction" below), not hand-copied; the in-memory WAV writer is `tts_wav.hpp`.
 - `ModelParameters` / `InferenceParameters` — Builder-pattern parameter classes that serialize to JSON (extend `JsonParameters`) for passing to native code.
 - `LlamaIterator` / `LlamaIterable` — Streaming generation via Java `Iterator`/`Iterable`.
 - `LlamaLoader` — Extracts the platform-specific native library from the JAR to a temp directory, or finds it on `java.library.path`.
@@ -915,9 +947,9 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
 | `src/test/cpp/test_json_helpers.cpp` | 47 | All functions in `json_helpers.hpp`: `get_result_error_message`, `results_to_json`, `rerank_results_to_json`, `parse_encoding_format`, `extract_embedding_prompt`, `is_infill_request`, `parse_slot_prompt_similarity`, `parse_positive_int_config`, `wrap_stream_chunk` |
 | `src/test/cpp/test_log_helpers.cpp` | 13 | All functions in `log_helpers.hpp`: `log_level_name`, `format_log_as_json` |
 | `src/test/cpp/test_jni_helpers.cpp` | 47 | All functions in `jni_helpers.hpp` using a zero-filled `JNINativeInterface_` mock |
-| `src/test/cpp/test_tts_dsp.cpp` | 5 | All functions in `tts_dsp.hpp` (OuteTTS output DSP): `pcm_to_wav16_bytes` (WAV header/payload + little-endian clamping), `fill_hann_window`, `fold`, `embd_to_audio` |
+| `src/test/cpp/test_tts_wav.cpp` | 2 | The in-memory WAV writer `pcm_to_wav16_bytes` in `tts_wav.hpp` (WAV header/payload + little-endian clamping). The OuteTTS DSP it pairs with is derived from upstream `tts.cpp` and covered end-to-end by the Java `TtsIntegrationTest`, not unit-tested here. |
 
-**Current total: 457 tests (all passing).**
+**Current total: 454 tests (all passing).**
 
 #### Upstream source location (in CMake build tree)
 
 
@@ -151,6 +151,29 @@ FetchContent_Declare(
 )
 FetchContent_MakeAvailable(llama.cpp)
 
+# OuteTTS native pipeline: DERIVE the upstream tts.cpp helpers (DSP + prompt + text + the default
+# speaker profile) into a compilable translation unit at configure time, rather than hand-copying
+# them — a hand copy is a DRY/maintenance hazard that silently diverges on every llama.cpp upgrade.
+# tts.cpp cannot simply be added to target_sources because it defines its own main(); the generator
+# drops main() and gives the helpers external linkage. See cmake/generate-tts-upstream.cmake. The
+# generated file is never committed; it is regenerated from whatever tts.cpp the pinned GIT_TAG
+# resolves to, so a version bump is picked up automatically. The tag below is cosmetic provenance in
+# the generated banner — keep it in sync with the llama.cpp GIT_TAG above.
+set(JLLAMA_TTS_GEN_DIR ${CMAKE_BINARY_DIR}/tts_generated)
+set(JLLAMA_TTS_GEN_CPP ${JLLAMA_TTS_GEN_DIR}/tts_upstream_gen.cpp)
+file(MAKE_DIRECTORY ${JLLAMA_TTS_GEN_DIR})
+execute_process(
+    COMMAND ${CMAKE_COMMAND}
+        -DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
+        -DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
+        -DLLAMA_TAG=b9739
+        -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
+    RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
+)
+if(NOT JLLAMA_TTS_GEN_RESULT EQUAL 0)
+    message(FATAL_ERROR "OuteTTS extraction failed; see cmake/generate-tts-upstream.cmake")
+endif()
+
 # b8831 added ggml_graph_next_uid() which calls _InterlockedIncrement64 via
 # <intrin.h> on x86. The intrinsic only exists on x64; provide the
 # implementation in a compat TU so the linker resolves __InterlockedIncrement64.
@@ -264,10 +287,18 @@ endif()
 add_library(jllama SHARED
     src/main/cpp/jllama.cpp
     src/main/cpp/tts_engine.cpp
+    ${JLLAMA_TTS_GEN_CPP}
     src/main/cpp/utils.hpp
     ${llama.cpp_SOURCE_DIR}/tools/server/server-common.cpp
     ${llama.cpp_SOURCE_DIR}/tools/server/server-chat.cpp)
 
+# The generated TU keeps the whole pre-main() span of tts.cpp, so a few upstream CLI-only
+# helpers (print_usage, save_wav16, xterm colour) come along unused. Silence the resulting
+# unused-function warning on that one file (non-MSVC; MSVC's C4505 is off by default).
+if(NOT MSVC)
+    set_source_files_properties(${JLLAMA_TTS_GEN_CPP} PROPERTIES COMPILE_FLAGS "-Wno-unused-function")
+endif()
+
 # Phase 1 refactoring: compile upstream server library units directly into jllama
 # server.hpp has been replaced by direct upstream includes in jllama.cpp.
 # server-context.cpp, server-queue.cpp, server-task.cpp compile on all platforms
@@ -412,7 +443,7 @@ if(BUILD_TESTING)
         src/test/cpp/test_jni_helpers.cpp
         src/test/cpp/test_json_helpers.cpp
         src/test/cpp/test_log_helpers.cpp
-        src/test/cpp/test_tts_dsp.cpp
+        src/test/cpp/test_tts_wav.cpp
         ${llama.cpp_SOURCE_DIR}/tools/server/server-common.cpp
         ${llama.cpp_SOURCE_DIR}/tools/server/server-chat.cpp
         ${llama.cpp_SOURCE_DIR}/tools/server/server-context.cpp
 
@@ -515,9 +515,9 @@ try (TextToSpeech tts = new TextToSpeech(
 
 Add `(ttcPath, vocoderPath, gpuLayers, threads)` to offload to the GPU, or
 `synthesize(text, maxCodeTokens, topK, seed)` for explicit sampling. As with `LlamaModel`, native
-memory is not GC-managed — use try-with-resources or call `close()`. **Known limitation:** numeric
-digits in the input are dropped (number-to-words romanization is not yet ported), so spell numbers
-out for now; synthesis uses the built-in default speaker profile.
+memory is not GC-managed — use try-with-resources or call `close()`. Synthesis uses the built-in
+default speaker profile; English number words are expanded for speech (`3` → "three"), and
+non-English text is not romanized.
 
 Compatible GGUFs (the CI test defaults): OuteTTS
 [`OuteTTS-0.2-500M-GGUF`](https://huggingface.co/second-state/OuteTTS-0.2-500M-GGUF) +
 
@@ -0,0 +1,105 @@
+# SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
+#
+# SPDX-License-Identifier: MIT
+#
+# Build-time extractor for the OuteTTS native pipeline.
+#
+# Rather than hand-copying functions out of llama.cpp's tools/tts/tts.cpp (a maintenance
+# burden that silently diverges on every upgrade), this script DERIVES a compilable
+# translation unit MECHANICALLY from the pinned upstream source at configure time. The
+# generated file is never committed (it lives in the build tree) and is regenerated from
+# whatever tts.cpp the pinned GIT_TAG resolves to, so divergence is impossible and an
+# upstream bump is picked up automatically.
+#
+# What it does:
+#   1. Keeps everything in tts.cpp BEFORE `int main(` (includes, the outetts_version enum,
+#      the number-to-words tables, and the DSP + prompt + text helpers). main() itself —
+#      the standalone CLI entry point — is dropped (that is why tts.cpp cannot simply be
+#      added to target_sources: its main() would clash at link time).
+#   2. Strips `static` from exactly the helpers the JNI engine calls, giving them external
+#      linkage so tts_engine.cpp can link against them (they are `static`/internal upstream).
+#   3. Extracts the two hard-coded default-speaker literals (audio_text / audio_data), which
+#      upstream embeds as locals inside main(), into two external constants.
+#
+# Every anchor is asserted: if upstream renames a function or moves the literals, the
+# configure step FAILS LOUDLY with a pointer here, the same fail-on-drift contract as
+# patches/. Inputs (via -D): TTS_SRC, OUT_CPP, LLAMA_TAG.
+
+if(NOT EXISTS "${TTS_SRC}")
+    message(FATAL_ERROR "generate-tts-upstream: upstream tts.cpp not found at '${TTS_SRC}'")
+endif()
+
+file(READ "${TTS_SRC}" SRC)
+
+# --- 1. keep the pre-main() portion (main() is the unbuildable-as-library CLI entry point) ---
+string(FIND "${SRC}" "\nint main(" MAIN_POS)
+if(MAIN_POS EQUAL -1)
+    message(FATAL_ERROR "generate-tts-upstream: 'int main(' anchor not found in tts.cpp — upstream layout changed; update cmake/generate-tts-upstream.cmake")
+endif()
+string(SUBSTRING "${SRC}" 0 ${MAIN_POS} PREMAIN)
+
+# --- 2. give external linkage to the helpers the JNI engine calls ---
+# Each entry is asserted present as `static <sig>` before stripping, so an upstream rename
+# fails the configure instead of silently dropping the symbol (caught later only at link).
+set(JLLAMA_TTS_DESTATIC
+    "std::vector<float> embd_to_audio("
+    "std::string process_text("
+    "void prompt_add("
+    "void prompt_init("
+    "std::vector<llama_token> prepare_guide_tokens("
+    "outetts_version get_tts_version(")
+foreach(sig IN LISTS JLLAMA_TTS_DESTATIC)
+    string(FIND "${PREMAIN}" "static ${sig}" _pos)
+    if(_pos EQUAL -1)
+        message(FATAL_ERROR "generate-tts-upstream: expected 'static ${sig}' in upstream tts.cpp but it is absent — upstream changed; update the de-static list in cmake/generate-tts-upstream.cmake")
+    endif()
+    string(REPLACE "static ${sig}" "${sig}" PREMAIN "${PREMAIN}")
+endforeach()
+
+# --- 3. extract the two default-speaker literals from inside main() ---
+# audio_text: a single-line  std::string audio_text = "<|text_start|>the<|text_sep|>...";
+# The leading "<|text_start|>the<|text_sep|>" disambiguates it from the empty-seed literal
+# in audio_text_from_speaker(). Content runs to the next double-quote (it embeds none).
+set(_AT_DECL "std::string audio_text = \"")
+string(FIND "${SRC}" "${_AT_DECL}<|text_start|>the<|text_sep|>" _at_at)
+if(_at_at EQUAL -1)
+    message(FATAL_ERROR "generate-tts-upstream: default audio_text literal not found in tts.cpp main() — upstream changed; update cmake/generate-tts-upstream.cmake")
+endif()
+string(LENGTH "${_AT_DECL}" _at_decl_len)
+math(EXPR _at_content "${_at_at} + ${_at_decl_len}")
+string(SUBSTRING "${SRC}" ${_at_content} -1 _at_rest)
+string(FIND "${_at_rest}" "\"" _at_len)
+string(SUBSTRING "${_at_rest}" 0 ${_at_len} AUDIO_TEXT)
+
+# audio_data: a multi-line raw string  std::string audio_data = R"(...)";
+# The R"( form disambiguates it from the empty-seed "..." literal in audio_data_from_speaker().
+# Content runs to the first )" (the body embeds none — only <|...|> tokens).
+set(_AD_DECL "std::string audio_data = R\"(")
+string(FIND "${SRC}" "${_AD_DECL}" _ad_at)
+if(_ad_at EQUAL -1)
+    message(FATAL_ERROR "generate-tts-upstream: default audio_data raw-string literal not found in tts.cpp main() — upstream changed; update cmake/generate-tts-upstream.cmake")
+endif()
+string(LENGTH "${_AD_DECL}" _ad_decl_len)
+math(EXPR _ad_content "${_ad_at} + ${_ad_decl_len}")
+string(SUBSTRING "${SRC}" ${_ad_content} -1 _ad_rest)
+string(FIND "${_ad_rest}" ")\"" _ad_len)
+string(SUBSTRING "${_ad_rest}" 0 ${_ad_len} AUDIO_DATA)
+
+# --- 4. emit the derived translation unit ---
+set(BANNER
+"// AUTO-GENERATED — DO NOT EDIT, DO NOT COMMIT.
+// Derived mechanically at build time by cmake/generate-tts-upstream.cmake from
+// llama.cpp tools/tts/tts.cpp @ ${LLAMA_TAG} (MIT-licensed, the llama.cpp authors).
+// Regenerated from the pinned upstream source on every configure; see CLAUDE.md.
+
+")
+set(SPEAKER
+"
+// --- default OuteTTS speaker profile (en_male_1), extracted from upstream main() ---
+// `extern const` forces external linkage (a namespace-scope `const` is internal by default),
+// so tts_engine.cpp links against these via the `extern` declarations in tts_upstream.h.
+extern const std::string jllama_tts_default_audio_text = \"${AUDIO_TEXT}\";
+extern const std::string jllama_tts_default_audio_data = R\"(${AUDIO_DATA})\";
+")
+file(WRITE "${OUT_CPP}" "${BANNER}${PREMAIN}${SPEAKER}")
+message(STATUS "generate-tts-upstream: wrote ${OUT_CPP} (from tts.cpp @ ${LLAMA_TAG})")