SKaiNET-transformers

Group: sk.ainet.transformers

High-performance LLM application layer on top of the SKaiNET engine. Provides model-specific inference, agentic chat with tool calling, and a unified CLI for transformer-based models, all in Kotlin Multiplatform.

Key features

Multi-model support. Llama 3 / 3.1 / 3.2, Gemma 2 / 3 / 4, Qwen 2 / 3, Apertus (Swiss AI), Mistral, BERT.
Native CPU performance. Auto-discovers SKaiNET's priority-100 FFM (Foreign Function & Memory) native kernel provider when present (4–6× faster Q4_K matmul, 1.5–1.8× faster FP32 SGEMM vs the priority-50 Panama Vector path; Linux x86_64 / macOS ARM64 / Windows x86_64 in the published JAR — no manual setup).
Native tool calling. Family-specific chat templates and tool-call parsers for Llama 3, Gemma 4, Qwen, Apertus, and ChatML/Hermes. Includes a Java surface (KLlamaJava, JavaTools.definition, JavaAgentLoop) for plain-Java consumers.
GGUF + SafeTensors loading. Streaming reader for any model size; NATIVE_OPTIMIZED quant policy keeps weights in their packed SIMD-friendly form.
Kotlin Multiplatform. JVM, Android, Kotlin/Native (Linux x64/ARM64, macOS ARM64, iOS arm64/sim arm64), JS, Wasm targets where applicable.

Current release

The current release is 0.23.1, version-aligned with the matching SKaiNET engine release. Coordinates:

dependencies {
    implementation("sk.ainet.transformers:llm-core:0.23.1")
    implementation("sk.ainet.transformers:llm-runtime-kllama:0.23.1") // or kgemma, kqwen, kapertus
    implementation("sk.ainet.transformers:llm-agent:0.23.1")          // chat templates + tool calling
}

To opt in to the native FFM CPU provider (recommended for JVM consumers):

dependencies {
    implementation("sk.ainet.core:skainet-backend-cpu:0.23.1")        // priority-50 Panama Vector
    implementation("sk.ainet.core:skainet-backend-native-cpu:0.23.1") // priority-100 FFM (auto-discovered)
}

KernelRegistry picks the highest-priority available provider; on hosts where the native lib doesn't load (sandboxed JDKs, unsupported arches), it cleanly falls back to Panama with no functional regression.

Project structure

Module	Purpose
`llm-api`	Framework-neutral interfaces (`ChatModel`, `EmbeddingModel`, `ToolDefinition`) — Spring AI-shaped.
`llm-core`	`OptimizedLLMRuntime`, `ModelRegistry`, `UnifiedModelLoader`, shared abstractions.
`llm-inference/<arch>`	Per-architecture network DSLs and weight loaders (`llama`, `gemma`, `qwen`, `apertus`, `bert`).
`llm-runtime/<arch>`	Per-architecture runtime facades (`kllama`, `kgemma`, `kqwen`, `kapertus`).
`llm-agent`	Chat templates, tool-call parsers, agent loops; Java surface.
`llm-apps`	CLIs: `skainet-cli` (unified), `kllama-cli`, `kbert-cli`, plus `kllama-java-sample`.
`llm-test/llm-test-java`	JUnit 5 end-to-end tests for the Java surface (gated on `TINYLLAMA_MODEL_PATH`).

Getting started

Prerequisites

JDK 21 or higher
Gradle 8.10+

CLI: unified `skainet-cli`

# Plain generation
./gradlew :llm-apps:skainet-cli:shadowJar
java -jar llm-apps/skainet-cli/build/libs/skainet-all.jar \
  -m /path/to/model.gguf "The capital of France is"

# Tool-calling demo (calculator + file-listing tools auto-registered)
java -jar skainet-all.jar -m model.gguf --demo --template=llama3 "What is 17 * 23?"

# Interactive agent
java -jar skainet-all.jar -m model.gguf --agent --template=apertus

--template accepts llama3, chatml, qwen, gemma, apertus (auto-detected from GGUF metadata if omitted).

Java consumers

try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ null)) {
    JavaTool calc = new JavaTool() {
        @Override public ToolDefinition getDefinition() {
            return JavaTools.definition(
                "calculator", "Evaluate an arithmetic expression.",
                "{\"type\":\"object\",\"properties\":{\"expression\":{\"type\":\"string\"}},\"required\":[\"expression\"]}"
            );
        }
        @Override public String execute(Map<String, ?> args) { /* ... */ }
    };
    JavaAgentLoop agent = JavaAgentLoop.builder()
        .session(session).tool(calc).template("llama3").build();
    String response = agent.chat("What is 17 * 23?");
}

See llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java for a runnable reference.

What's new in 0.23.1

Apertus end-to-end. Routing fix (now goes through OptimizedLLMRuntime + apertusNetwork()), chat template + tool calling, and real-GGUF loading on top of skainet 0.23.1's block-major Q4_K TensorData wiring. See APERTUS_ROLLOUT.md.
Gemma 4 chat-model JVM facade (Gemma4ChatModel) for embedded text-only deployments, with close() propagating to the mmap arena and the PLE mmap path consuming upstream loadTensorStorageMapped.
Multi-id EOS / stop-token support in the chat layer — required for templates that emit several end markers.
Tokenizer auto-detect for SentencePiece in fromTokenizerJson, fixing decoding for models that omit the explicit marker.
End-to-end smoke test in llm-test/llm-test-java that wires LEAF (KBertJava) and Llama 3 (KLlamaJava) in one JVM.
skainet-cli and kllama-cli shadow-jar ServiceLoader fix-up so the priority-100 native-cpu provider is picked up post-merge.

See CHANGELOG.md for the full set of changes.

Engine

This project uses SKaiNET as its underlying execution engine — tensor ops, neural-network DSL, kernel SPI, GGUF / SafeTensors I/O.

License

MIT — see LICENCE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKaiNET-transformers

Key features

Current release

Project structure

Getting started

Prerequisites

CLI: unified `skainet-cli`

Java consumers

What's new in 0.23.1

Engine

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SKaiNET-transformers

Key features

Current release

Project structure

Getting started

Prerequisites

CLI: unified skainet-cli

Java consumers

What's new in 0.23.1

Engine

License

CLI: unified `skainet-cli`