You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,13 +13,16 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
13
13
-`CODE_OF_CONDUCT.md` (Contributor Covenant 2.0).
14
14
-`docs/RELEASE.md` capturing the maintainer-facing release procedure (moved out of CHANGELOG).
15
15
- OpenSSF Best Practices badge (project 12862) on README.
16
+
- OpenAI-compatible `parallel_tool_calls` support: `ChatRequest.withParallelToolCalls(Boolean)` / `getParallelToolCalls()`, `InferenceParameters.withParallelToolCalls(boolean)`, and pass-through in the `/v1/chat/completions` server mapper.
17
+
- Real-model tool-calling integration tests for blocking and streaming required tool calls (`ToolCallingIntegrationTest`, Qwen2.5-1.5B-Instruct), wired into CI and `validate-models`.
16
18
17
19
### Changed
18
20
- Unified `CONTRIBUTING.md` and `SECURITY.md` structure with sibling repositories in the project family.
19
21
- Reconciled Java baseline to **11+** across `pom.xml`, README badge, `CLAUDE.md`, and `CONTRIBUTING.md`.
20
22
- README license badge corrected from "Apache 2.0" to "MIT" (matches `LICENSE` file and `pom.xml`).
- Extracted the `chatWithTools` agent loop into `ToolCallingAgent`; tool-result errors (unknown tool / handler exception) are now JSON-serialized so tool names containing special characters remain valid JSON.
Copy file name to clipboardExpand all lines: README.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -259,7 +259,8 @@ Every `net.ladenthin.llama.*` system property recognised by the library, deep-sc
259
259
|`net.ladenthin.llama.lib.path`| unset (falls back to `java.library.path`) | runtime |`LlamaLoader`| Directory containing the native `jllama` shared library. Checked first, before `java.library.path`. Set with `-Dnet.ladenthin.llama.lib.path=/path/to/dir`. |
260
260
|`net.ladenthin.llama.tmpdir`| unset (falls back to `java.io.tmpdir`) | runtime |`LlamaLoader`| Custom temporary directory used when extracting the native library from the JAR. |
261
261
|`net.ladenthin.llama.osinfo.architecture`| unset (uses `os.arch`) | runtime |`OSInfo`| Override for the architecture string used to locate the bundled library inside the JAR. Useful when `os.arch` reports an unexpected value (e.g. inside dockcross / chrooted environments). |
262
-
|`net.ladenthin.llama.test.ngl`|`43`| test |`LlamaModelTest`, `RerankingModelTest`, `ChatScenarioTest`, `ChatAdvancedTest`, `ErrorHandlingTest`, `SessionConcurrencyTest`, `ConfigureParallelInferenceTest`, `MultimodalIntegrationTest` (via `Integer.getInteger(TestConstants.PROP_TEST_NGL, TestConstants.DEFAULT_TEST_NGL)`) | Number of GPU layers used during testing. Pin to `0` on CPU-only hosts: `mvn test -Dnet.ladenthin.llama.test.ngl=0`. |
262
+
|`net.ladenthin.llama.test.ngl`|`43` for the general suite; `0` for `ToolCallingIntegrationTest`| test | Model-backed integration tests | Number of GPU layers used during testing. Pin to `0` on CPU-only hosts: `mvn test -Dnet.ladenthin.llama.test.ngl=0`. The tool test also selects device `none` at zero layers so Metal/CUDA is not initialized. |
263
+
|`net.ladenthin.llama.tool.model`|`models/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf` (test self-skips if missing) | test |`ToolCallingIntegrationTest`| Path to a tool-capable GGUF used to verify required blocking and streaming tool calls. The default matches the Qwen2.5 model in upstream llama.cpp's tool-call test matrix. |
263
264
|`net.ladenthin.llama.nomic.path`| unset (test self-skips) | test |`LlamaEmbeddingsTest#testNomicEmbedLoads`| Path to a Nomic embedding model (`nomic-embed-text-v1.5.f16.gguf` or a compatible BERT-family encoder). Regression test for upstream issue #98 (BERT-encoder `result_output` assertion). |
264
265
|`net.ladenthin.llama.vision.model`| unset (test self-skips) | test |`MultimodalIntegrationTest` (closes #103 / #34) | Path to a vision-capable model GGUF. Any vision-capable GGUF works; CI default is `SmolVLM-500M-Instruct-Q8_0.gguf`. |
265
266
|`net.ladenthin.llama.vision.mmproj`| unset (test self-skips) | test |`MultimodalIntegrationTest`| Matching mmproj GGUF for the vision model. |
@@ -368,6 +369,40 @@ try (LlamaModel model = new LlamaModel(modelParams)) {
368
369
Reasoning/thinking models can receive custom Jinja template variables via
369
370
`ModelParameters#setChatTemplateKwargs(Map)`.
370
371
372
+
### Tool Calling
373
+
374
+
Use a tool-aware instruct model and enable Jinja when loading it. A typed request can either return
375
+
the model's tool calls through `chat`, or execute registered handlers until the model produces a
376
+
normal assistant response through `chatWithTools`:
0 commit comments