You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -259,7 +259,8 @@ Every `net.ladenthin.llama.*` system property recognised by the library, deep-sc
259
259
|`net.ladenthin.llama.lib.path`| unset (falls back to `java.library.path`) | runtime |`LlamaLoader`| Directory containing the native `jllama` shared library. Checked first, before `java.library.path`. Set with `-Dnet.ladenthin.llama.lib.path=/path/to/dir`. |
260
260
|`net.ladenthin.llama.tmpdir`| unset (falls back to `java.io.tmpdir`) | runtime |`LlamaLoader`| Custom temporary directory used when extracting the native library from the JAR. |
261
261
|`net.ladenthin.llama.osinfo.architecture`| unset (uses `os.arch`) | runtime |`OSInfo`| Override for the architecture string used to locate the bundled library inside the JAR. Useful when `os.arch` reports an unexpected value (e.g. inside dockcross / chrooted environments). |
262
-
|`net.ladenthin.llama.test.ngl`|`43`| test |`LlamaModelTest`, `RerankingModelTest`, `ChatScenarioTest`, `ChatAdvancedTest`, `ErrorHandlingTest`, `SessionConcurrencyTest`, `ConfigureParallelInferenceTest`, `MultimodalIntegrationTest` (via `Integer.getInteger(TestConstants.PROP_TEST_NGL, TestConstants.DEFAULT_TEST_NGL)`) | Number of GPU layers used during testing. Pin to `0` on CPU-only hosts: `mvn test -Dnet.ladenthin.llama.test.ngl=0`. |
262
+
|`net.ladenthin.llama.test.ngl`|`43` for the general suite; `0` for `ToolCallingIntegrationTest`| test | Model-backed integration tests | Number of GPU layers used during testing. Pin to `0` on CPU-only hosts: `mvn test -Dnet.ladenthin.llama.test.ngl=0`. The tool test also selects device `none` at zero layers so Metal/CUDA is not initialized. |
263
+
|`net.ladenthin.llama.tool.model`|`models/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf` (test self-skips if missing) | test |`ToolCallingIntegrationTest`| Path to a tool-capable GGUF used to verify required blocking and streaming tool calls. The default matches the Qwen2.5 model in upstream llama.cpp's tool-call test matrix. |
263
264
|`net.ladenthin.llama.nomic.path`| unset (test self-skips) | test |`LlamaEmbeddingsTest#testNomicEmbedLoads`| Path to a Nomic embedding model (`nomic-embed-text-v1.5.f16.gguf` or a compatible BERT-family encoder). Regression test for upstream issue #98 (BERT-encoder `result_output` assertion). |
264
265
|`net.ladenthin.llama.vision.model`| unset (test self-skips) | test |`MultimodalIntegrationTest` (closes #103 / #34) | Path to a vision-capable model GGUF. Any vision-capable GGUF works; CI default is `SmolVLM-500M-Instruct-Q8_0.gguf`. |
265
266
|`net.ladenthin.llama.vision.mmproj`| unset (test self-skips) | test |`MultimodalIntegrationTest`| Matching mmproj GGUF for the vision model. |
@@ -368,6 +369,40 @@ try (LlamaModel model = new LlamaModel(modelParams)) {
368
369
Reasoning/thinking models can receive custom Jinja template variables via
369
370
`ModelParameters#setChatTemplateKwargs(Map)`.
370
371
372
+
### Tool Calling
373
+
374
+
Use a tool-aware instruct model and enable Jinja when loading it. A typed request can either return
375
+
the model's tool calls through `chat`, or execute registered handlers until the model produces a
376
+
normal assistant response through `chatWithTools`:
0 commit comments