Skip to content

Commit b5ee309

Browse files
Merge pull request bernardladenthin#284 from vaiju1981/langchain4j-jllama
Add langchain4j-jllama module: in-process LangChain4j adapters
2 parents 9766240 + a66b2b0 commit b5ee309

11 files changed

Lines changed: 837 additions & 0 deletions

File tree

REUSE.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ path = [
2424
".github/ISSUE_TEMPLATE/bug_report.md",
2525
".github/ISSUE_TEMPLATE/feature_request.md",
2626
".claude/commands/find-cpp-duplication.md",
27+
"langchain4j-jllama/README.md",
2728
]
2829
SPDX-FileCopyrightText = [
2930
"2023-2025 Konstantin Herud",

langchain4j-jllama/README.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# langchain4j-jllama
2+
3+
[LangChain4j](https://github.com/langchain4j/langchain4j) adapters backed by an **in-process**
4+
[java-llama.cpp](https://github.com/bernardladenthin/java-llama.cpp) model over JNI — no HTTP server,
5+
no separate process.
6+
7+
This is a **separate Maven artifact** on purpose: it depends on `langchain4j-core`, but the core
8+
`net.ladenthin:llama` binding does **not** depend on langchain4j, so plain java-llama.cpp users never
9+
pull langchain4j (or its Java 17 floor) transitively.
10+
11+
> **Already have an OpenAI-compatible setup?** java-llama.cpp also ships
12+
> `net.ladenthin.llama.server.OpenAiCompatServer`, so you can point langchain4j's `langchain4j-open-ai`
13+
> client at a running server with zero code from this module. Use *this* module when you want the
14+
> in-process path (no HTTP hop, single process — e.g. desktop/Android/embedded).
15+
16+
## Adapters
17+
18+
| Class | langchain4j interface | java-llama.cpp call |
19+
|-------|-----------------------|---------------------|
20+
| `JllamaChatModel` | `ChatModel` | `LlamaModel.chat(...)` |
21+
| `JllamaStreamingChatModel` | `StreamingChatModel` | `LlamaModel.generateChat(...)` (token streaming) |
22+
| `JllamaEmbeddingModel` | `EmbeddingModel` | `LlamaModel.embed(...)` |
23+
| `JllamaScoringModel` | `ScoringModel` (re-ranking) | `LlamaModel.handleRerank(...)` |
24+
25+
## Lifecycle: the model is *borrowed*
26+
27+
Every adapter takes a `LlamaModel` you already loaded and **keeps owning**. The adapter never loads
28+
or closes the native model — you manage it (try-with-resources or explicit `close()`). One
29+
`LlamaModel` can back several adapters at once.
30+
31+
```java
32+
try (LlamaModel llama = new LlamaModel(new ModelParameters().setModel("models/qwen3-0.6b.gguf"))) {
33+
ChatModel chat = new JllamaChatModel(llama);
34+
35+
String reply = chat.chat("Write a haiku about lazy senior devs.");
36+
System.out.println(reply);
37+
}
38+
```
39+
40+
Streaming:
41+
42+
```java
43+
StreamingChatModel chat = new JllamaStreamingChatModel(llama);
44+
chat.chat("Tell me a story.", new StreamingChatResponseHandler() {
45+
@Override public void onPartialResponse(String token) { System.out.print(token); }
46+
@Override public void onCompleteResponse(ChatResponse response) { /* done */ }
47+
@Override public void onError(Throwable error) { error.printStackTrace(); }
48+
});
49+
```
50+
51+
Embeddings (model loaded with `enableEmbedding()`) and re-ranking
52+
(`enableReranking()`) plug straight into langchain4j RAG:
53+
54+
```java
55+
EmbeddingModel embeddings = new JllamaEmbeddingModel(embeddingLlama);
56+
ScoringModel reranker = new JllamaScoringModel(rerankLlama);
57+
```
58+
59+
## Dependency
60+
61+
```xml
62+
<dependency>
63+
<groupId>net.ladenthin</groupId>
64+
<artifactId>langchain4j-jllama</artifactId>
65+
<version>5.0.4-SNAPSHOT</version>
66+
</dependency>
67+
```
68+
69+
`langchain4j-core` is pulled transitively. You still supply a java-llama.cpp native library for your
70+
platform the usual way (bundled in the `net.ladenthin:llama` JAR or on `java.library.path`).
71+
72+
## Building
73+
74+
This is a **sibling module**, not part of the root reactor. Install the core artifact first, then
75+
build here:
76+
77+
```bash
78+
# from the repo root: publish the core net.ladenthin:llama jar to your local ~/.m2
79+
mvn -DskipTests install
80+
81+
# then build/test this module
82+
cd langchain4j-jllama
83+
mvn test
84+
```
85+
86+
The end-to-end test (`JllamaChatModelIntegrationTest`) self-skips unless you pass a model:
87+
88+
```bash
89+
mvn test -Dnet.ladenthin.llama.model.path=/abs/path/to/model.gguf
90+
```
91+
92+
## Not mapped yet
93+
94+
- **Tool calling.** `ChatRequest.toolSpecifications()` are not forwarded, so the chat adapters return
95+
assistant *text*, not `AiMessage.toolExecutionRequests()`. (java-llama.cpp itself supports tool
96+
calling via `LlamaModel.chatWithTools` / typed `ToolDefinition`; bridging that to langchain4j
97+
`ToolSpecification` is the planned next step.)
98+
- **Multimodal user input.** A multi-content `UserMessage` is flattened to its text parts; image/audio
99+
content is dropped.
100+
- **Per-token tool-call / thinking stream events.** Streaming forwards plain text via
101+
`onPartialResponse`.
102+
- **`response_format` (JSON mode).** `ChatRequest.responseFormat()` (json_object / json_schema) is not
103+
forwarded; `modelName()` is ignored since one model is bound per adapter.
104+
105+
Mapped request parameters: `temperature`, `topP`, `topK`, `maxOutputTokens`, `frequencyPenalty`,
106+
`presencePenalty`, `stopSequences`. The non-streaming chat response carries the model's real finish
107+
reason (`stop`/`length`/`tool_calls`) and token usage; the streaming completion carries assembled text
108+
(no per-token usage).
109+
110+
Requires Java 17+ (langchain4j 1.x baseline). Targets `langchain4j-core` 1.17.1.

langchain4j-jllama/pom.xml

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
<!--
2+
SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
3+
4+
SPDX-License-Identifier: MIT
5+
-->
6+
7+
<project xmlns="http://maven.apache.org/POM/4.0.0"
8+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
9+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
10+
<modelVersion>4.0.0</modelVersion>
11+
12+
<groupId>net.ladenthin</groupId>
13+
<artifactId>langchain4j-jllama</artifactId>
14+
<version>5.0.4-SNAPSHOT</version>
15+
<packaging>jar</packaging>
16+
17+
<name>${project.groupId}:${project.artifactId}</name>
18+
<description>LangChain4j integration for java-llama.cpp: in-process ChatModel,
19+
StreamingChatModel, EmbeddingModel and ScoringModel adapters backed by a
20+
llama.cpp model over JNI (no HTTP hop).</description>
21+
<url>https://github.com/bernardladenthin/java-llama.cpp</url>
22+
23+
<licenses>
24+
<license>
25+
<name>MIT License</name>
26+
<url>https://www.opensource.org/licenses/mit-license.php</url>
27+
<distribution>repo</distribution>
28+
</license>
29+
</licenses>
30+
31+
<developers>
32+
<developer>
33+
<name>Bernard Ladenthin</name>
34+
<organizationUrl>https://github.com/bernardladenthin</organizationUrl>
35+
</developer>
36+
</developers>
37+
38+
<scm>
39+
<connection>scm:git:https://github.com/bernardladenthin/java-llama.cpp.git</connection>
40+
<developerConnection>scm:git:https://github.com/bernardladenthin/java-llama.cpp.git</developerConnection>
41+
<url>https://github.com/bernardladenthin/java-llama.cpp/tree/main</url>
42+
</scm>
43+
44+
<properties>
45+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
46+
<maven.compiler.release>17</maven.compiler.release>
47+
<!-- Keep in lockstep with the core java-llama.cpp artifact version. -->
48+
<jllama.version>5.0.4-SNAPSHOT</jllama.version>
49+
<langchain4j.version>1.17.1</langchain4j.version>
50+
<junit.version>6.1.1</junit.version>
51+
<hamcrest.version>3.0</hamcrest.version>
52+
<surefire.version>3.5.5</surefire.version>
53+
</properties>
54+
55+
<dependencies>
56+
<!-- The JNI binding we adapt. Provided-by-the-consumer in spirit, but compile
57+
scope so a consumer that only declares langchain4j-jllama still gets it. -->
58+
<dependency>
59+
<groupId>net.ladenthin</groupId>
60+
<artifactId>llama</artifactId>
61+
<version>${jllama.version}</version>
62+
</dependency>
63+
64+
<!-- The interfaces we implement (ChatModel/StreamingChatModel/EmbeddingModel/ScoringModel). -->
65+
<dependency>
66+
<groupId>dev.langchain4j</groupId>
67+
<artifactId>langchain4j-core</artifactId>
68+
<version>${langchain4j.version}</version>
69+
</dependency>
70+
71+
<dependency>
72+
<groupId>org.junit.jupiter</groupId>
73+
<artifactId>junit-jupiter</artifactId>
74+
<version>${junit.version}</version>
75+
<scope>test</scope>
76+
</dependency>
77+
<dependency>
78+
<groupId>org.hamcrest</groupId>
79+
<artifactId>hamcrest</artifactId>
80+
<version>${hamcrest.version}</version>
81+
<scope>test</scope>
82+
</dependency>
83+
</dependencies>
84+
85+
<build>
86+
<plugins>
87+
<plugin>
88+
<groupId>org.apache.maven.plugins</groupId>
89+
<artifactId>maven-surefire-plugin</artifactId>
90+
<version>${surefire.version}</version>
91+
</plugin>
92+
</plugins>
93+
</build>
94+
</project>
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
2+
//
3+
// SPDX-License-Identifier: MIT
4+
5+
package net.ladenthin.llama.langchain4j;
6+
7+
import dev.langchain4j.model.chat.ChatModel;
8+
import dev.langchain4j.model.chat.request.ChatRequest;
9+
import dev.langchain4j.model.chat.response.ChatResponse;
10+
import java.util.Objects;
11+
import net.ladenthin.llama.LlamaModel;
12+
13+
/**
14+
* langchain4j {@link ChatModel} backed by an in-process java-llama.cpp model (over JNI, no HTTP).
15+
*
16+
* <p>The model is <em>borrowed</em>: this adapter never loads or closes it. Construct it from a
17+
* {@link LlamaModel} you already own and keep managing that model's lifecycle (try-with-resources or
18+
* an explicit {@code close()}). One {@code LlamaModel} can back several adapters at once.
19+
*
20+
* <p>Mapped today: messages (system/user/assistant/tool-result) and the sampling parameters
21+
* {@code temperature}/{@code topP}/{@code topK}/{@code maxOutputTokens}/{@code stopSequences}.
22+
* Tool <em>specifications</em> on the request are not yet forwarded, so this returns assistant text,
23+
* not tool calls — see the module README for the planned tool-calling bridge.
24+
*/
25+
public final class JllamaChatModel implements ChatModel {
26+
27+
private final LlamaModel model;
28+
29+
/**
30+
* Creates a chat model over a borrowed {@link LlamaModel}.
31+
*
32+
* @param model the loaded model to drive; not closed by this adapter
33+
*/
34+
public JllamaChatModel(LlamaModel model) {
35+
this.model = Objects.requireNonNull(model, "model");
36+
}
37+
38+
@Override
39+
public ChatResponse doChat(ChatRequest chatRequest) {
40+
net.ladenthin.llama.value.ChatResponse response =
41+
model.chat(LangChain4jMapping.toJllamaRequest(chatRequest));
42+
return LangChain4jMapping.toLangChainResponse(response);
43+
}
44+
}
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
2+
//
3+
// SPDX-License-Identifier: MIT
4+
5+
package net.ladenthin.llama.langchain4j;
6+
7+
import dev.langchain4j.data.embedding.Embedding;
8+
import dev.langchain4j.data.segment.TextSegment;
9+
import dev.langchain4j.model.embedding.EmbeddingModel;
10+
import dev.langchain4j.model.output.Response;
11+
import java.util.ArrayList;
12+
import java.util.List;
13+
import java.util.Objects;
14+
import net.ladenthin.llama.LlamaModel;
15+
16+
/**
17+
* langchain4j {@link EmbeddingModel} backed by an in-process java-llama.cpp model.
18+
*
19+
* <p>The backing {@link LlamaModel} must be loaded in embedding mode
20+
* ({@code ModelParameters.enableEmbedding()}). The model is <em>borrowed</em> (never closed here) —
21+
* see {@link JllamaChatModel}.
22+
*/
23+
public final class JllamaEmbeddingModel implements EmbeddingModel {
24+
25+
private final LlamaModel model;
26+
27+
/**
28+
* Creates an embedding model over a borrowed {@link LlamaModel}.
29+
*
30+
* @param model the loaded embedding-mode model to drive; not closed by this adapter
31+
*/
32+
public JllamaEmbeddingModel(LlamaModel model) {
33+
this.model = Objects.requireNonNull(model, "model");
34+
}
35+
36+
@Override
37+
public Response<List<Embedding>> embedAll(List<TextSegment> textSegments) {
38+
List<Embedding> embeddings = new ArrayList<>(textSegments.size());
39+
for (TextSegment segment : textSegments) {
40+
embeddings.add(Embedding.from(model.embed(segment.text())));
41+
}
42+
return Response.from(embeddings);
43+
}
44+
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
// SPDX-FileCopyrightText: 2026 Bernard Ladenthin <bernard.ladenthin@gmail.com>
2+
//
3+
// SPDX-License-Identifier: MIT
4+
5+
package net.ladenthin.llama.langchain4j;
6+
7+
import dev.langchain4j.data.segment.TextSegment;
8+
import dev.langchain4j.model.output.Response;
9+
import dev.langchain4j.model.scoring.ScoringModel;
10+
import java.util.ArrayList;
11+
import java.util.List;
12+
import java.util.Objects;
13+
import net.ladenthin.llama.LlamaModel;
14+
15+
/**
16+
* langchain4j {@link ScoringModel} (re-ranker) backed by an in-process java-llama.cpp model.
17+
*
18+
* <p>Maps onto java-llama.cpp's native rerank endpoint, so the backing {@link LlamaModel} must be
19+
* loaded in reranking mode ({@code ModelParameters.enableReranking()}). Scores are returned in the
20+
* same order as the input segments. The model is <em>borrowed</em> (never closed here) — see
21+
* {@link JllamaChatModel}.
22+
*/
23+
public final class JllamaScoringModel implements ScoringModel {
24+
25+
private final LlamaModel model;
26+
27+
/**
28+
* Creates a scoring model over a borrowed {@link LlamaModel}.
29+
*
30+
* @param model the loaded reranking-mode model to drive; not closed by this adapter
31+
*/
32+
public JllamaScoringModel(LlamaModel model) {
33+
this.model = Objects.requireNonNull(model, "model");
34+
}
35+
36+
@Override
37+
public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
38+
String[] documents = new String[segments.size()];
39+
for (int i = 0; i < segments.size(); i++) {
40+
documents[i] = segments.get(i).text();
41+
}
42+
double[] scores = LangChain4jMapping.parseRerankScores(model.handleRerank(query, documents), documents.length);
43+
List<Double> result = new ArrayList<>(scores.length);
44+
for (double score : scores) {
45+
result.add(score);
46+
}
47+
return Response.from(result);
48+
}
49+
}

0 commit comments

Comments
 (0)