Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
379 changes: 197 additions & 182 deletions .github/workflows/publish.yml

Large diffs are not rendered by default.

68 changes: 62 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -971,12 +971,18 @@ properties set, so `LlamaEmbeddingsTest`, `MultimodalIntegrationTest`, and `TtsI
these as **required** (a missing model hard-fails the job before tests run, so a download
regression can never silently downgrade to a skip). The only model still self-skipping is the
audio-input model (`AudioInputIntegrationTest`) — it has no committed clip and no CI download.
The shared GGUF cache (`actions/cache`, key `gguf-models-v1`, path `models/`) holds the full set;
since every test job downloads the full set before the cache can save, whichever job wins the
save race caches everything. Because the cache key is immutable, changing the model set means the
**existing cache entry must be deleted** (not bumped to `v2`) so the next run rebuilds it complete
— locally the model tests still self-skip when a GGUF is absent (`Assume.assumeTrue`), so a
partial local checkout is fine.
The shared GGUF cache (`actions/cache`, key `gguf-models-v1`, path `models/`) holds the full set
and is populated **once, upfront** by a dedicated **`download-models`** job (`needs: startgate`):
it is the single place the ~5 GB set is fetched from HuggingFace (the ten `curl` steps + the
`validate-models.sh` gate live only there). Every `test-java-*` job — and the langchain4j
integration job — `needs: download-models` and then only **restores** that cache (no per-job
download, no cold-start save race), keeping `validate-models.{sh,bat}` as a per-job integrity
guard. GGUF is platform-independent, so the one ubuntu `download-models` cache is reused by the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states that "CLAUDE.md documents the module, why it is a separate artifact (not a classifier), and the CI/publish wiring," but the diff shows the entire ## LangChain4j integration (llama-langchain4j sibling module) section (previously ~56 lines) has been removed.

This section documented:

  • Why llama-langchain4j is a separate artifact (Java 17 requirement, avoid forcing it on core users)
  • How the module is wired into CI (test job, version-lockstep guard, release profile)
  • The new download-models job pattern

Consider restoring this documentation section, or updating the PR description to reflect that this documentation was intentionally removed in favor of the module's own README.md."

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was added in this PR, not removed — the diff is being misread.

git diff origin/main -- CLAUDE.md is +62 / −6: the whole ## LangChain4j integration (llama-langchain4j sibling module) block is new (it exists on this branch, grep -c = 1, but not on main, grep -c = 0). The −6 lines are only the reworded "GGUF cache" paragraph (the old "save race" wording replaced by the download-models description). So there's no documentation regression and the PR description is accurate — no action needed.


Generated by Claude Code

macOS and Windows jobs too. `validate-models.{sh,bat}` treats the models as **required** (a
missing model hard-fails the job before tests run). Because the cache key is immutable, changing
the model set means the **existing cache entry must be deleted** (not bumped to `v2`) so
`download-models` rebuilds it complete — locally the model tests still self-skip when a GGUF is
absent (`Assume.assumeTrue`), so a partial local checkout is fine.

Set the model path via system property or environment variable (see test files for exact property names).

Expand Down Expand Up @@ -1219,6 +1225,56 @@ keeping it clear of the JPMS module-mode javadoc trap that bit BAF. **Before rai
javadoc source level to ≥ 9, read**
[`../workspace/policies/jpms-module-descriptor.md`](../workspace/policies/jpms-module-descriptor.md).

## LangChain4j integration (`llama-langchain4j` sibling module)

`llama-langchain4j/` adapts a `LlamaModel` to LangChain4j's `ChatModel`,
`StreamingChatModel`, `EmbeddingModel` and `ScoringModel` interfaces **in-process over
JNI** (no HTTP hop). It is a **standalone sibling module**, deliberately *not* in the root
reactor, so the native build/release pipeline is untouched.

Why it is a **separate artifact** and not a classifier of the core: langchain4j 1.x
requires **Java 17** (the core stays Java 8), and classifiers share the core's single POM —
adding `langchain4j-core` there would force it (and the Java 17 floor) on every plain
`net.ladenthin:llama` consumer. A separate `artifactId` with its own POM is the only way to
keep that dependency (and Java floor) off the core. It is pure Java with **no per-classifier
matrix**: it compiles against the core's Java API, which is identical across every native
classifier; the backend (CPU/CUDA/OpenCL/Vulkan) is a runtime classpath choice for the
consumer.

Wiring:

1. **`llama-langchain4j/pom.xml`** — `net.ladenthin:llama-langchain4j`, `release 17`,
depends on `net.ladenthin:llama:${project.version}` (so the core dep always matches the
module's own version) and `dev.langchain4j:langchain4j-core`. Carries its own
sources/javadoc/gpg + `release` profile (Central requires per-artifact signing; the module
has no parent to inherit them from — plugin versions are pinned in lockstep with the root
`pom.xml`). Java package stays `net.ladenthin.llama.langchain4j` (package name need not track
the artifactId).
2. **`.github/workflows/publish.yml`** — the `test-java-llama-langchain4j` job installs the
core Java jar, runs a **version-lockstep guard** (module version must equal core version,
else the build fails — the standalone module can't inherit `${project.version}` from a
reactor), then `mvn -f llama-langchain4j/pom.xml verify` (7 model-free mapping unit tests
run; the 4 model-backed integration tests self-skip without a GGUF; `verify` also builds the
javadoc jar so a release-time javadoc break is caught in PR CI). The
`publish-snapshot`/`publish-release` jobs `needs:` this job and, after the core `deploy`
(which installs the core jar locally), run a second `deploy` of the module at the same
version. A separate **`test-java-llama-langchain4j-integration`** job runs the model-backed
tests (chat/streaming/embedding/scoring adapters) by **reusing** the shared GGUF cache
(`gguf-models-v1`, restore-only — no extra download) and the `Linux-x86_64-libraries` native
artifact: it `needs: [crosscompile-linux-x86_64, download-models]` (so the cache is already
populated and it runs in parallel), installs the core jar with the downloaded native lib
bundled, and passes the already-cached chat
(`REASONING_MODEL_NAME`), nomic-embedding and jina-reranker model paths via the module's
`-Dnet.ladenthin.llama.langchain4j.{embedding,rerank}.model` / `net.ladenthin.llama.model.path`
properties. It is validation-only (not a release gate); a cold cache degrades to a self-skip.
3. **Version bumps** — when the root `pom.xml` `<version>` changes, bump
`llama-langchain4j/pom.xml` `<version>` to match in the same commit, or the lockstep guard
reds CI.

**Open follow-ups** (documented in `llama-langchain4j/README.md`): tool calling
(`ToolSpecification` ↔ jllama `ToolDefinition`), `response_format`/JSON mode, and multimodal
user input (currently flattened to text).

## Open TODOs

Open TODOs for this repo live in [`TODO.md`](TODO.md). Cross-repo status
Expand Down
2 changes: 1 addition & 1 deletion REUSE.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ path = [
".github/ISSUE_TEMPLATE/bug_report.md",
".github/ISSUE_TEMPLATE/feature_request.md",
".claude/commands/find-cpp-duplication.md",
"langchain4j-jllama/README.md",
"llama-langchain4j/README.md",
]
SPDX-FileCopyrightText = [
"2023-2025 Konstantin Herud",
Expand Down
94 changes: 0 additions & 94 deletions langchain4j-jllama/pom.xml

This file was deleted.

22 changes: 17 additions & 5 deletions langchain4j-jllama/README.md → llama-langchain4j/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# langchain4j-jllama
# llama-langchain4j

[LangChain4j](https://github.com/langchain4j/langchain4j) adapters backed by an **in-process**
[java-llama.cpp](https://github.com/bernardladenthin/java-llama.cpp) model over JNI — no HTTP server,
Expand Down Expand Up @@ -61,7 +61,7 @@ ScoringModel reranker = new JllamaScoringModel(rerankLlama);
```xml
<dependency>
<groupId>net.ladenthin</groupId>
<artifactId>langchain4j-jllama</artifactId>
<artifactId>llama-langchain4j</artifactId>
<version>5.0.4-SNAPSHOT</version>
</dependency>
```
Expand All @@ -79,16 +79,28 @@ build here:
mvn -DskipTests install

# then build/test this module
cd langchain4j-jllama
cd llama-langchain4j
mvn test
```

The end-to-end test (`JllamaChatModelIntegrationTest`) self-skips unless you pass a model:
The model-backed integration tests self-skip unless you point them at a GGUF. Each adapter has
its own property so you can run them independently (a chat/instruct model, an embedding-mode model,
and a reranking-mode model respectively):

```bash
mvn test -Dnet.ladenthin.llama.model.path=/abs/path/to/model.gguf
# chat + streaming (JllamaChatModelIntegrationTest)
mvn test -Dnet.ladenthin.llama.model.path=/abs/path/to/chat.gguf
# embeddings (JllamaEmbeddingModelIntegrationTest)
mvn test -Dnet.ladenthin.llama.langchain4j.embedding.model=/abs/path/to/embedding.gguf
# re-ranking / scoring (JllamaScoringModelIntegrationTest)
mvn test -Dnet.ladenthin.llama.langchain4j.rerank.model=/abs/path/to/reranker.gguf
```

In CI these reuse the project's existing shared GGUF cache (the chat, nomic-embedding and
jina-reranker models the core test jobs already download) — the
`test-java-llama-langchain4j-integration` job restores that cache and the
`Linux-x86_64` native library artifact, so no extra model is downloaded.

## Not mapped yet

- **Tool calling.** `ChatRequest.toolSpecifications()` are not forwarded, so the chat adapters return
Expand Down
Loading
Loading