You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+43-4Lines changed: 43 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -384,6 +384,38 @@ Current patches:
384
384
|-------|-------|
385
385
|`0001-win32-arg-parse-embed-guard.patch`| Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — an earlier count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
the pre-`main()` span (the DSP `fill_hann_window`/`irfft`/`fold`/`embd_to_audio`, the prompt/text
400
+
helpers incl. `process_text`'s number-to-words, the `outetts_version` enum), strips `static` from
401
+
the handful the JNI engine calls (giving them external linkage), and extracts the two hard-coded
402
+
default-speaker literals out of `main()` into `extern const` strings. Writes
403
+
`build/tts_generated/tts_upstream_gen.cpp`.
404
+
-**`CMakeLists.txt`** — runs the generator via `execute_process` right after
405
+
`FetchContent_MakeAvailable(llama.cpp)`, then compiles the generated TU into `jllama`. The file is
406
+
**never committed** (build artifact, like the native libs / WebUI assets); it is regenerated from
407
+
whatever `tts.cpp` the pinned `GIT_TAG` resolves to, so a version bump is picked up automatically.
408
+
-**`src/main/cpp/tts_upstream.h`** — committed, hand-written declarations of the extracted symbols
409
+
(interface facts, not the implementation). `tts_engine.cpp` includes it and links against the
410
+
generated definitions. The in-memory WAV writer (`tts_wav.hpp`) is ours, not extracted.
411
+
412
+
**Fail-loud on drift (same contract as `patches/`):** the generator asserts every anchor — the
413
+
`int main(` split point, each `static <signature>` it de-statics, and both speaker literals. If an
414
+
upgrade renames a helper or moves a literal, the **configure step aborts** with a pointer to the
415
+
generator; if upstream changes a *type*, `tts_upstream.h` stops matching and the **link fails**.
416
+
Either way a silent divergence is impossible. On a llama.cpp bump, re-verify the generator the same
417
+
way you re-verify `patches/`.
418
+
387
419
## Upgrading/Downgrading llama.cpp Version
388
420
389
421
To change the llama.cpp version, update the following **three** files (and re-verify `patches/`):
@@ -588,6 +620,8 @@ the README. The summary below covers only the optional-model bindings:
588
620
|`net.ladenthin.llama.audio.model`|`AudioInputIntegrationTest` (llama.cpp discussion #13759) | audio-input model GGUF, e.g. `ultravox-v0_5-llama-3_2-1b.gguf`|
589
621
|`net.ladenthin.llama.audio.mmproj`|`AudioInputIntegrationTest`| matching audio mmproj/encoder, e.g. `mmproj-ultravox-v0_5-llama-3_2-1b-f16.gguf`|
590
622
|`net.ladenthin.llama.audio.input`|`AudioInputIntegrationTest`| a `.wav`/`.mp3` clip on disk (no committed default — audio is not committed) |
623
+
|`net.ladenthin.llama.tts.ttc.model`|`TtsIntegrationTest`| OuteTTS text-to-codes model, e.g. `OuteTTS-0.2-500M-Q4_K_M.gguf`|
624
+
|`net.ladenthin.llama.tts.vocoder.model`|`TtsIntegrationTest`| matching codes-to-speech vocoder, e.g. `WavTokenizer-Large-75-F16.gguf`|
591
625
592
626
Run those tests by setting the property:
593
627
```bash
@@ -605,6 +639,9 @@ mvn test -Dtest=AudioInputIntegrationTest \
-`LlamaModel` — Main API class (AutoCloseable). Wraps native context for inference, embeddings, re-ranking, and tokenization.
779
+
-`TextToSpeech` — Separate AutoCloseable native type for speech synthesis over the two-model OuteTTS (text-to-codes) + WavTokenizer (codes-to-speech vocoder) pipeline; `synthesize(text)` returns a 24 kHz mono 16-bit WAV byte stream. Native orchestration in `tts_engine.{h,cpp}`; the OuteTTS DSP / prompt / text helpers + default speaker are **derived at build time from upstream `tts.cpp`** (see "OuteTTS build-time extraction" below), not hand-copied; the in-memory WAV writer is `tts_wav.hpp`.
742
780
-`ModelParameters` / `InferenceParameters` — Builder-pattern parameter classes that serialize to JSON (extend `JsonParameters`) for passing to native code.
743
781
-`LlamaIterator` / `LlamaIterable` — Streaming generation via Java `Iterator`/`Iterable`.
744
782
-`LlamaLoader` — Extracts the platform-specific native library from the JAR to a temp directory, or finds it on `java.library.path`.
@@ -750,7 +788,7 @@ If the local check passes (`BUILD SUCCESS`), the `mvn package` job in
750
788
- The `server` package is a dedicated top layer in the ArchUnit `layeredArchitecture` rule (the only layer allowed to access the root `Api`); `noInternalJdkImports` carries an explicit exception for the supported `com.sun.net.httpserver` (the exported `jdk.httpserver` module, which `module-info.java``requires`). See README "OpenAI-compatible HTTP server".
|`src/test/cpp/test_json_helpers.cpp`| 47 | All functions in `json_helpers.hpp`: `get_result_error_message`, `results_to_json`, `rerank_results_to_json`, `parse_encoding_format`, `extract_embedding_prompt`, `is_infill_request`, `parse_slot_prompt_similarity`, `parse_positive_int_config`, `wrap_stream_chunk`|
910
948
|`src/test/cpp/test_log_helpers.cpp`| 13 | All functions in `log_helpers.hpp`: `log_level_name`, `format_log_as_json`|
911
-
|`src/test/cpp/test_jni_helpers.cpp`| 41 | All functions in `jni_helpers.hpp` using a zero-filled `JNINativeInterface_` mock |
949
+
|`src/test/cpp/test_jni_helpers.cpp`| 47 | All functions in `jni_helpers.hpp` using a zero-filled `JNINativeInterface_` mock |
950
+
|`src/test/cpp/test_tts_wav.cpp`| 2 | The in-memory WAV writer `pcm_to_wav16_bytes` in `tts_wav.hpp` (WAV header/payload + little-endian clamping). The OuteTTS DSP it pairs with is derived from upstream `tts.cpp` and covered end-to-end by the Java `TtsIntegrationTest`, not unit-tested here. |
912
951
913
-
**Current total: 445 tests (all passing).**
952
+
**Current total: 454 tests (all passing).**
914
953
915
954
#### Upstream source location (in CMake build tree)
0 commit comments