Group: sk.ainet.transformers
High-performance LLM application layer on top of the SKaiNET engine. Provides model-specific inference, agentic chat with tool calling, and a unified CLI for transformer-based models, all in Kotlin Multiplatform.
- Multi-model support. Llama 3 / 3.1 / 3.2, Gemma 2 / 3 / 4, Qwen 2 / 3, Apertus (Swiss AI), Mistral, BERT.
- Native CPU performance. Auto-discovers SKaiNET's priority-100 FFM (Foreign Function & Memory) native kernel provider when present (4–6× faster Q4_K matmul, 1.5–1.8× faster FP32 SGEMM vs the priority-50 Panama Vector path; Linux x86_64 / macOS ARM64 / Windows x86_64 in the published JAR — no manual setup).
- Native tool calling. Family-specific chat templates and tool-call parsers for Llama 3, Gemma 4, Qwen, Apertus, and ChatML/Hermes. Includes a Java surface (
KLlamaJava,JavaTools.definition,JavaAgentLoop) for plain-Java consumers. - GGUF + SafeTensors loading. Streaming reader for any model size;
NATIVE_OPTIMIZEDquant policy keeps weights in their packed SIMD-friendly form. - Kotlin Multiplatform. JVM, Android, Kotlin/Native (Linux x64/ARM64, macOS ARM64, iOS arm64/sim arm64), JS, Wasm targets where applicable.
The current release is 0.21.1. Coordinates:
dependencies {
implementation("sk.ainet.transformers:llm-core:0.21.1")
implementation("sk.ainet.transformers:llm-runtime-kllama:0.21.1") // or kgemma, etc.
implementation("sk.ainet.transformers:llm-agent:0.21.1") // chat templates + tool calling
}The matching SKaiNET engine is 0.22.1. To opt in to the native FFM CPU provider (recommended for JVM consumers):
dependencies {
implementation("sk.ainet.core:skainet-backend-cpu:0.22.1") // priority-50 Panama Vector
implementation("sk.ainet.core:skainet-backend-native-cpu:0.22.1") // priority-100 FFM (auto-discovered)
}KernelRegistry picks the highest-priority available provider; on hosts where the native lib doesn't load (sandboxed JDKs, unsupported arches), it cleanly falls back to Panama with no functional regression.
| Module | Purpose |
|---|---|
llm-api |
Framework-neutral interfaces (ChatModel, EmbeddingModel, ToolDefinition) — Spring AI-shaped. |
llm-core |
OptimizedLLMRuntime, ModelRegistry, UnifiedModelLoader, shared abstractions. |
llm-inference/<arch> |
Per-architecture network DSLs and weight loaders (llama, gemma, qwen, apertus, bert). |
llm-runtime/<arch> |
Per-architecture runtime facades (kllama, kgemma, kqwen, kapertus). |
llm-agent |
Chat templates, tool-call parsers, agent loops; Java surface. |
llm-apps |
CLIs: skainet-cli (unified), kllama-cli, kbert-cli, plus kllama-java-sample. |
llm-test/llm-test-java |
JUnit 5 end-to-end tests for the Java surface (gated on TINYLLAMA_MODEL_PATH). |
- JDK 21 or higher
- Gradle 8.10+
# Plain generation
./gradlew :llm-apps:skainet-cli:shadowJar
java -jar llm-apps/skainet-cli/build/libs/skainet-all.jar \
-m /path/to/model.gguf "The capital of France is"
# Tool-calling demo (calculator + file-listing tools auto-registered)
java -jar skainet-all.jar -m model.gguf --demo --template=llama3 "What is 17 * 23?"
# Interactive agent
java -jar skainet-all.jar -m model.gguf --agent --template=apertus--template accepts llama3, chatml, qwen, gemma, apertus (auto-detected from GGUF metadata if omitted).
try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ null)) {
JavaTool calc = new JavaTool() {
@Override public ToolDefinition getDefinition() {
return JavaTools.definition(
"calculator", "Evaluate an arithmetic expression.",
"{\"type\":\"object\",\"properties\":{\"expression\":{\"type\":\"string\"}},\"required\":[\"expression\"]}"
);
}
@Override public String execute(Map<String, ?> args) { /* ... */ }
};
JavaAgentLoop agent = JavaAgentLoop.builder()
.session(session).tool(calc).template("llama3").build();
String response = agent.chat("What is 17 * 23?");
}See llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java for a runnable reference.
- Apertus support. Routing fix, chat template, tool calling all merged on
develop. SeeAPERTUS_ROLLOUT.md. Real-checkpoint loading has known gaps tracked separately. - Gemma 4 chat-model JVM facade (
Gemma4ChatModel) for embedded text-only deployments. - Sharded SafeTensors
loadTensorStorageMappedfor >2 GB models (consumed by Gemma 4 PLE mmap path).
This project uses SKaiNET as its underlying execution engine — tensor ops, neural-network DSL, kernel SPI, GGUF / SafeTensors I/O.
MIT — see LICENCE.