@@ -971,12 +971,18 @@ properties set, so `LlamaEmbeddingsTest`, `MultimodalIntegrationTest`, and `TtsI
971971these as ** required** (a missing model hard-fails the job before tests run, so a download
972972regression can never silently downgrade to a skip). The only model still self-skipping is the
973973audio-input model (` AudioInputIntegrationTest ` ) — it has no committed clip and no CI download.
974- The shared GGUF cache (` actions/cache ` , key ` gguf-models-v1 ` , path ` models/ ` ) holds the full set;
975- since every test job downloads the full set before the cache can save, whichever job wins the
976- save race caches everything. Because the cache key is immutable, changing the model set means the
977- ** existing cache entry must be deleted** (not bumped to ` v2 ` ) so the next run rebuilds it complete
978- — locally the model tests still self-skip when a GGUF is absent (` Assume.assumeTrue ` ), so a
979- partial local checkout is fine.
974+ The shared GGUF cache (` actions/cache ` , key ` gguf-models-v1 ` , path ` models/ ` ) holds the full set
975+ and is populated ** once, upfront** by a dedicated ** ` download-models ` ** job (` needs: startgate ` ):
976+ it is the single place the ~ 5 GB set is fetched from HuggingFace (the ten ` curl ` steps + the
977+ ` validate-models.sh ` gate live only there). Every ` test-java-* ` job — and the langchain4j
978+ integration job — ` needs: download-models ` and then only ** restores** that cache (no per-job
979+ download, no cold-start save race), keeping ` validate-models.{sh,bat} ` as a per-job integrity
980+ guard. GGUF is platform-independent, so the one ubuntu ` download-models ` cache is reused by the
981+ macOS and Windows jobs too. ` validate-models.{sh,bat} ` treats the models as ** required** (a
982+ missing model hard-fails the job before tests run). Because the cache key is immutable, changing
983+ the model set means the ** existing cache entry must be deleted** (not bumped to ` v2 ` ) so
984+ ` download-models ` rebuilds it complete — locally the model tests still self-skip when a GGUF is
985+ absent (` Assume.assumeTrue ` ), so a partial local checkout is fine.
980986
981987Set the model path via system property or environment variable (see test files for exact property names).
982988
@@ -1219,6 +1225,56 @@ keeping it clear of the JPMS module-mode javadoc trap that bit BAF. **Before rai
12191225javadoc source level to ≥ 9, read**
12201226[`../workspace/policies/jpms-module-descriptor.md`](../workspace/policies/jpms-module-descriptor.md).
12211227
1228+ ## LangChain4j integration (`llama-langchain4j` sibling module)
1229+
1230+ `llama-langchain4j/` adapts a `LlamaModel` to LangChain4j's `ChatModel`,
1231+ `StreamingChatModel`, `EmbeddingModel` and `ScoringModel` interfaces **in-process over
1232+ JNI** (no HTTP hop). It is a **standalone sibling module**, deliberately *not* in the root
1233+ reactor, so the native build/release pipeline is untouched.
1234+
1235+ Why it is a **separate artifact** and not a classifier of the core: langchain4j 1.x
1236+ requires **Java 17** (the core stays Java 8), and classifiers share the core's single POM —
1237+ adding `langchain4j-core` there would force it (and the Java 17 floor) on every plain
1238+ `net.ladenthin:llama` consumer. A separate `artifactId` with its own POM is the only way to
1239+ keep that dependency (and Java floor) off the core. It is pure Java with **no per-classifier
1240+ matrix**: it compiles against the core's Java API, which is identical across every native
1241+ classifier; the backend (CPU/CUDA/OpenCL/Vulkan) is a runtime classpath choice for the
1242+ consumer.
1243+
1244+ Wiring:
1245+
1246+ 1. **`llama-langchain4j/pom.xml`** — `net.ladenthin:llama-langchain4j`, `release 17`,
1247+ depends on `net.ladenthin:llama:${project.version}` (so the core dep always matches the
1248+ module's own version) and `dev.langchain4j:langchain4j-core`. Carries its own
1249+ sources/javadoc/gpg + `release` profile (Central requires per-artifact signing; the module
1250+ has no parent to inherit them from — plugin versions are pinned in lockstep with the root
1251+ `pom.xml`). Java package stays `net.ladenthin.llama.langchain4j` (package name need not track
1252+ the artifactId).
1253+ 2. **`.github/workflows/publish.yml`** — the `test-java-llama-langchain4j` job installs the
1254+ core Java jar, runs a **version-lockstep guard** (module version must equal core version,
1255+ else the build fails — the standalone module can't inherit `${project.version}` from a
1256+ reactor), then `mvn -f llama-langchain4j/pom.xml verify` (7 model-free mapping unit tests
1257+ run; the 4 model-backed integration tests self-skip without a GGUF; `verify` also builds the
1258+ javadoc jar so a release-time javadoc break is caught in PR CI). The
1259+ `publish-snapshot`/`publish-release` jobs `needs:` this job and, after the core `deploy`
1260+ (which installs the core jar locally), run a second `deploy` of the module at the same
1261+ version. A separate **`test-java-llama-langchain4j-integration`** job runs the model-backed
1262+ tests (chat/streaming/embedding/scoring adapters) by **reusing** the shared GGUF cache
1263+ (`gguf-models-v1`, restore-only — no extra download) and the `Linux-x86_64-libraries` native
1264+ artifact: it `needs: [crosscompile-linux-x86_64, download-models]` (so the cache is already
1265+ populated and it runs in parallel), installs the core jar with the downloaded native lib
1266+ bundled, and passes the already-cached chat
1267+ (`REASONING_MODEL_NAME`), nomic-embedding and jina-reranker model paths via the module's
1268+ `-Dnet.ladenthin.llama.langchain4j.{embedding,rerank}.model` / `net.ladenthin.llama.model.path`
1269+ properties. It is validation-only (not a release gate); a cold cache degrades to a self-skip.
1270+ 3. **Version bumps** — when the root `pom.xml` `<version>` changes, bump
1271+ `llama-langchain4j/pom.xml` `<version>` to match in the same commit, or the lockstep guard
1272+ reds CI.
1273+
1274+ **Open follow-ups** (documented in `llama-langchain4j/README.md`): tool calling
1275+ (`ToolSpecification` ↔ jllama `ToolDefinition`), `response_format`/JSON mode, and multimodal
1276+ user input (currently flattened to text).
1277+
12221278## Open TODOs
12231279
12241280Open TODOs for this repo live in [`TODO.md`](TODO.md). Cross-repo status
0 commit comments