Skip to content

Commit a66b2b0

Browse files
committed
Add langchain4j-jllama module with in-process LangChain4j adapters
Introduce a separate Maven artifact that adapts a java-llama.cpp LlamaModel to LangChain4j's model interfaces over JNI, with no HTTP hop: - JllamaChatModel -> ChatModel - JllamaStreamingChatModel -> StreamingChatModel (token streaming) - JllamaEmbeddingModel -> EmbeddingModel - JllamaScoringModel -> ScoringModel (rerank; scores aligned by input index) The adapters borrow a caller-owned LlamaModel and never close it. The module depends on langchain4j-core 1.17.1, but the core net.ladenthin:llama binding gains no langchain4j dependency, so plain users never pull it transitively. It is kept as a sibling module (not part of the root reactor) so the native build and release pipeline stay untouched, and it targets Java 17 to match the langchain4j 1.x baseline. The pure message/parameter/response transforms are unit-tested model-free; an end-to-end chat and streaming test self-skips when no GGUF is provided. The module README documents usage and the currently unmapped surfaces (tool calling, multimodal user input).
1 parent 9766240 commit a66b2b0

11 files changed

Lines changed: 837 additions & 0 deletions

File tree

REUSE.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ path = [
2424
".github/ISSUE_TEMPLATE/bug_report.md",
2525
".github/ISSUE_TEMPLATE/feature_request.md",
2626
".claude/commands/find-cpp-duplication.md",
27+
"langchain4j-jllama/README.md",
2728
]
2829
SPDX-FileCopyrightText = [
2930
"2023-2025 Konstantin Herud",

langchain4j-jllama/README.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# langchain4j-jllama
2+
3+
[LangChain4j](https://github.com/langchain4j/langchain4j) adapters backed by an **in-process**
4+
[java-llama.cpp](https://github.com/bernardladenthin/java-llama.cpp) model over JNI — no HTTP server,
5+
no separate process.
6+
7+
This is a **separate Maven artifact** on purpose: it depends on `langchain4j-core`, but the core
8+
`net.ladenthin:llama` binding does **not** depend on langchain4j, so plain java-llama.cpp users never
9+
pull langchain4j (or its Java 17 floor) transitively.
10+
11+
> **Already have an OpenAI-compatible setup?** java-llama.cpp also ships
12+
> `net.ladenthin.llama.server.OpenAiCompatServer`, so you can point langchain4j's `langchain4j-open-ai`
13+
> client at a running server with zero code from this module. Use *this* module when you want the
14+
> in-process path (no HTTP hop, single process — e.g. desktop/Android/embedded).
15+
16+
## Adapters
17+
18+
| Class | langchain4j interface | java-llama.cpp call |
19+
|-------|-----------------------|---------------------|
20+
| `JllamaChatModel` | `ChatModel` | `LlamaModel.chat(...)` |
21+
| `JllamaStreamingChatModel` | `StreamingChatModel` | `LlamaModel.generateChat(...)` (token streaming) |
22+
| `JllamaEmbeddingModel` | `EmbeddingModel` | `LlamaModel.embed(...)` |
23+
| `JllamaScoringModel` | `ScoringModel` (re-ranking) | `LlamaModel.handleRerank(...)` |
24+
25+
## Lifecycle: the model is *borrowed*
26+
27+
Every adapter takes a `LlamaModel` you already loaded and **keeps owning**. The adapter never loads
28+
or closes the native model — you manage it (try-with-resources or explicit `close()`). One
29+
`LlamaModel` can back several adapters at once.
30+
31+
```java
32+
try (LlamaModel llama = new LlamaModel(new ModelParameters().setModel("models/qwen3-0.6b.gguf"))) {
33+
ChatModel chat = new JllamaChatModel(llama);
34+
35+
String reply = chat.chat("Write a haiku about lazy senior devs.");
36+
System.out.println(reply);
37+
}
38+
```
39+
40+
Streaming:
41+
42+
```java
43+
StreamingChatModel chat = new JllamaStreamingChatModel(llama);
44+
chat.chat("Tell me a story.", new StreamingChatResponseHandler() {
45+
@Override public void onPartialResponse(String token) { System.out.print(token); }
46+
@Override public void onCompleteResponse(ChatResponse response) { /* done */ }
47+
@Override public void onError(Throwable error) { error.printStackTrace(); }
48+
});
49+
```
50+
51+
Embeddings (model loaded with `enableEmbedding()`) and re-ranking
52+
(`enableReranking()`) plug straight into langchain4j RAG:
53+
54+
```java
55+
EmbeddingModel embeddings = new JllamaEmbeddingModel(embeddingLlama);
56+
ScoringModel reranker = new JllamaScoringModel(rerankLlama);
57+
```
58+
59+
## Dependency
60+
61+
```xml
62+
<dependency>
63+
<groupId>net.ladenthin</groupId>
64+
<artifactId>langchain4j-jllama</artifactId>
65+
<version>5.0.4-SNAPSHOT</version>
66+
</dependency>
67+
```
68+
69+
`langchain4j-core` is pulled transitively. You still supply a java-llama.cpp native library for your
70+
platform the usual way (bundled in the `net.ladenthin:llama` JAR or on `java.library.path`).
71+
72+
## Building
73+
74+
This is a **sibling module**, not part of the root reactor. Install the core artifact first, then
75+
build here:
76+
77+
```bash
78+
# from the repo root: publish the core net.ladenthin:llama jar to your local ~/.m2
79+
mvn -DskipTests install
80+
81+
# then build/test this module
82+
cd langchain4j-jllama
83+
mvn test
84+
```
85+
86+
The end-to-end test (`JllamaChatModelIntegrationTest`) self-skips unless you pass a model:
87+
88+
```bash
89+
mvn test -Dnet.ladenthin.llama.model.path=/abs/path/to/model.gguf
90+
```
91+
92+
## Not mapped yet
93+
94+
- **Tool calling.** `ChatRequest.toolSpecifications()` are not forwarded, so the chat adapters return
95+
assistant *text*, not `AiMessage.toolExecutionRequests()`. (java-llama.cpp itself supports tool
96+
calling via `LlamaModel.chatWithTools` / typed `ToolDefinition`; bridging that to langchain4j
97+
`ToolSpecification` is the planned next step.)
98+
- **Multimodal user input.** A multi-content `UserMessage` is flattened to its text parts; image/audio
99+
content is dropped.
100+
- **Per-token tool-call / thinking stream events.** Streaming forwards plain text via
101+
`onPartialResponse`.
102+
- **`response_format` (JSON mode).** `ChatRequest.responseFormat()` (json_object / json_schema) is not
103+
forwarded; `modelName()` is ignored since one model is bound per adapter.
104+
105+
Mapped request parameters: `temperature`, `topP`, `topK`, `maxOutputTokens`, `frequencyPenalty`,
106+
`presencePenalty`, `stopSequences`. The non-streaming chat response carries the model's real finish
107+
reason (`stop`/`length`/`tool_calls`) and token usage; the streaming completion carries assembled text
108+
(no per-token usage).
109+
110+
Requires Java 17+ (langchain4j 1.x baseline). Targets `langchain4j-core` 1.17.1.

langchain4j-jllama/pom.xml

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
<!--
2+
SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
3+
4+
SPDX-License-Identifier: MIT
5+
-->
6+
7+
<project xmlns="http://maven.apache.org/POM/4.0.0"
8+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
9+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
10+
<modelVersion>4.0.0</modelVersion>
11+
12+
<groupId>net.ladenthin</groupId>
13+
<artifactId>langchain4j-jllama</artifactId>
14+
<version>5.0.4-SNAPSHOT</version>
15+
<packaging>jar</packaging>
16+
17+
<name>${project.groupId}:${project.artifactId}</name>
18+
<description>LangChain4j integration for java-llama.cpp: in-process ChatModel,
19+
StreamingChatModel, EmbeddingModel and ScoringModel adapters backed by a
20+
llama.cpp model over JNI (no HTTP hop).</description>
21+
<url>https://github.com/bernardladenthin/java-llama.cpp</url>
22+
23+
<licenses>
24+
<license>
25+
<name>MIT License</name>
26+
<url>https://www.opensource.org/licenses/mit-license.php</url>
27+
<distribution>repo</distribution>
28+
</license>
29+
</licenses>
30+
31+
<developers>
32+
<developer>
33+
<name>Bernard Ladenthin</name>
34+
<organizationUrl>https://github.com/bernardladenthin</organizationUrl>
35+
</developer>
36+
</developers>
37+
38+
<scm>
39+
<connection>scm:git:https://github.com/bernardladenthin/java-llama.cpp.git</connection>
40+
<developerConnection>scm:git:https://github.com/bernardladenthin/java-llama.cpp.git</developerConnection>
41+
<url>https://github.com/bernardladenthin/java-llama.cpp/tree/main</url>
42+
</scm>
43+
44+
<properties>
45+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
46+
<maven.compiler.release>17</maven.compiler.release>
47+
<!-- Keep in lockstep with the core java-llama.cpp artifact version. -->
48+
<jllama.version>5.0.4-SNAPSHOT</jllama.version>
49+
<langchain4j.version>1.17.1</langchain4j.version>
50+
<junit.version>6.1.1</junit.version>
51+
<hamcrest.version>3.0</hamcrest.version>
52+
<surefire.version>3.5.5</surefire.version>
53+
</properties>
54+
55+
<dependencies>
56+
<!-- The JNI binding we adapt. Provided-by-the-consumer in spirit, but compile
57+
scope so a consumer that only declares langchain4j-jllama still gets it. -->
58+
<dependency>
59+
<groupId>net.ladenthin</groupId>
60+
<artifactId>llama</artifactId>
61+
<version>${jllama.version}</version>
62+
</dependency>
63+
64+
<!-- The interfaces we implement (ChatModel/StreamingChatModel/EmbeddingModel/ScoringModel). -->
65+
<dependency>
66+
<groupId>dev.langchain4j</groupId>
67+
<artifactId>langchain4j-core</artifactId>
68+
<version>${langchain4j.version}</version>
69+
</dependency>
70+
71+
<dependency>
72+
<groupId>org.junit.jupiter</groupId>
73+
<artifactId>junit-jupiter</artifactId>
74+
<version>${junit.version}</version>
75+
<scope>test</scope>
76+
</dependency>
77+
<dependency>
78+
<groupId>org.hamcrest</groupId>
79+
<artifactId>hamcrest</artifactId>
80+
<version>${hamcrest.version}</version>
81+
<scope>test</scope>
82+
</dependency>
83+
</dependencies>
84+
85+
<build>
86+
<plugins>
87+
<plugin>
88+
<groupId>org.apache.maven.plugins</groupId>
89+
<artifactId>maven-surefire-plugin</artifactId>
90+
<version>${surefire.version}</version>
91+
</plugin>
92+
</plugins>
93+
</build>
94+
</project>
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
2+
//
3+
// SPDX-License-Identifier: MIT
4+
5+
package net.ladenthin.llama.langchain4j;
6+
7+
import dev.langchain4j.model.chat.ChatModel;
8+
import dev.langchain4j.model.chat.request.ChatRequest;
9+
import dev.langchain4j.model.chat.response.ChatResponse;
10+
import java.util.Objects;
11+
import net.ladenthin.llama.LlamaModel;
12+
13+
/**
14+
* langchain4j {@link ChatModel} backed by an in-process java-llama.cpp model (over JNI, no HTTP).
15+
*
16+
* <p>The model is <em>borrowed</em>: this adapter never loads or closes it. Construct it from a
17+
* {@link LlamaModel} you already own and keep managing that model's lifecycle (try-with-resources or
18+
* an explicit {@code close()}). One {@code LlamaModel} can back several adapters at once.
19+
*
20+
* <p>Mapped today: messages (system/user/assistant/tool-result) and the sampling parameters
21+
* {@code temperature}/{@code topP}/{@code topK}/{@code maxOutputTokens}/{@code stopSequences}.
22+
* Tool <em>specifications</em> on the request are not yet forwarded, so this returns assistant text,
23+
* not tool calls — see the module README for the planned tool-calling bridge.
24+
*/
25+
public final class JllamaChatModel implements ChatModel {
26+
27+
private final LlamaModel model;
28+
29+
/**
30+
* Creates a chat model over a borrowed {@link LlamaModel}.
31+
*
32+
* @param model the loaded model to drive; not closed by this adapter
33+
*/
34+
public JllamaChatModel(LlamaModel model) {
35+
this.model = Objects.requireNonNull(model, "model");
36+
}
37+
38+
@Override
39+
public ChatResponse doChat(ChatRequest chatRequest) {
40+
net.ladenthin.llama.value.ChatResponse response =
41+
model.chat(LangChain4jMapping.toJllamaRequest(chatRequest));
42+
return LangChain4jMapping.toLangChainResponse(response);
43+
}
44+
}
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
2+
//
3+
// SPDX-License-Identifier: MIT
4+
5+
package net.ladenthin.llama.langchain4j;
6+
7+
import dev.langchain4j.data.embedding.Embedding;
8+
import dev.langchain4j.data.segment.TextSegment;
9+
import dev.langchain4j.model.embedding.EmbeddingModel;
10+
import dev.langchain4j.model.output.Response;
11+
import java.util.ArrayList;
12+
import java.util.List;
13+
import java.util.Objects;
14+
import net.ladenthin.llama.LlamaModel;
15+
16+
/**
17+
* langchain4j {@link EmbeddingModel} backed by an in-process java-llama.cpp model.
18+
*
19+
* <p>The backing {@link LlamaModel} must be loaded in embedding mode
20+
* ({@code ModelParameters.enableEmbedding()}). The model is <em>borrowed</em> (never closed here) —
21+
* see {@link JllamaChatModel}.
22+
*/
23+
public final class JllamaEmbeddingModel implements EmbeddingModel {
24+
25+
private final LlamaModel model;
26+
27+
/**
28+
* Creates an embedding model over a borrowed {@link LlamaModel}.
29+
*
30+
* @param model the loaded embedding-mode model to drive; not closed by this adapter
31+
*/
32+
public JllamaEmbeddingModel(LlamaModel model) {
33+
this.model = Objects.requireNonNull(model, "model");
34+
}
35+
36+
@Override
37+
public Response<List<Embedding>> embedAll(List<TextSegment> textSegments) {
38+
List<Embedding> embeddings = new ArrayList<>(textSegments.size());
39+
for (TextSegment segment : textSegments) {
40+
embeddings.add(Embedding.from(model.embed(segment.text())));
41+
}
42+
return Response.from(embeddings);
43+
}
44+
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
2+
//
3+
// SPDX-License-Identifier: MIT
4+
5+
package net.ladenthin.llama.langchain4j;
6+
7+
import dev.langchain4j.data.segment.TextSegment;
8+
import dev.langchain4j.model.output.Response;
9+
import dev.langchain4j.model.scoring.ScoringModel;
10+
import java.util.ArrayList;
11+
import java.util.List;
12+
import java.util.Objects;
13+
import net.ladenthin.llama.LlamaModel;
14+
15+
/**
16+
* langchain4j {@link ScoringModel} (re-ranker) backed by an in-process java-llama.cpp model.
17+
*
18+
* <p>Maps onto java-llama.cpp's native rerank endpoint, so the backing {@link LlamaModel} must be
19+
* loaded in reranking mode ({@code ModelParameters.enableReranking()}). Scores are returned in the
20+
* same order as the input segments. The model is <em>borrowed</em> (never closed here) — see
21+
* {@link JllamaChatModel}.
22+
*/
23+
public final class JllamaScoringModel implements ScoringModel {
24+
25+
private final LlamaModel model;
26+
27+
/**
28+
* Creates a scoring model over a borrowed {@link LlamaModel}.
29+
*
30+
* @param model the loaded reranking-mode model to drive; not closed by this adapter
31+
*/
32+
public JllamaScoringModel(LlamaModel model) {
33+
this.model = Objects.requireNonNull(model, "model");
34+
}
35+
36+
@Override
37+
public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
38+
String[] documents = new String[segments.size()];
39+
for (int i = 0; i < segments.size(); i++) {
40+
documents[i] = segments.get(i).text();
41+
}
42+
double[] scores = LangChain4jMapping.parseRerankScores(model.handleRerank(query, documents), documents.length);
43+
List<Double> result = new ArrayList<>(scores.length);
44+
for (double score : scores) {
45+
result.add(score);
46+
}
47+
return Response.from(result);
48+
}
49+
}

0 commit comments

Comments
 (0)