Skip to content

SKaiNET-developers/SKaiNET-transformers

Repository files navigation

SKaiNET-transformers

License: MIT Maven Central DeepWiki

Group: sk.ainet.transformers

High-performance LLM application layer on top of the SKaiNET engine. Provides model-specific inference, agentic chat with tool calling, and a unified CLI for transformer-based models, all in Kotlin Multiplatform.

Key features

  • Multi-model support. Llama 3 / 3.1 / 3.2, Gemma 2 / 3 / 4, Qwen 2 / 3, Apertus (Swiss AI), Mistral, BERT.
  • Native CPU performance. Auto-discovers SKaiNET's priority-100 FFM (Foreign Function & Memory) native kernel provider when present (4–6× faster Q4_K matmul, 1.5–1.8× faster FP32 SGEMM vs the priority-50 Panama Vector path; Linux x86_64 / macOS ARM64 / Windows x86_64 in the published JAR — no manual setup).
  • Native tool calling. Family-specific chat templates and tool-call parsers for Llama 3, Gemma 4, Qwen, Apertus, and ChatML/Hermes. Includes a Java surface (KLlamaJava, JavaTools.definition, JavaAgentLoop) for plain-Java consumers.
  • GGUF + SafeTensors loading. Streaming reader for any model size; NATIVE_OPTIMIZED quant policy keeps weights in their packed SIMD-friendly form.
  • Kotlin Multiplatform. JVM, Android, Kotlin/Native (Linux x64/ARM64, macOS ARM64, iOS arm64/sim arm64), JS, Wasm targets where applicable.

Current release

The current release is 0.21.1. Coordinates:

dependencies {
    implementation("sk.ainet.transformers:llm-core:0.21.1")
    implementation("sk.ainet.transformers:llm-runtime-kllama:0.21.1") // or kgemma, etc.
    implementation("sk.ainet.transformers:llm-agent:0.21.1")          // chat templates + tool calling
}

The matching SKaiNET engine is 0.22.1. To opt in to the native FFM CPU provider (recommended for JVM consumers):

dependencies {
    implementation("sk.ainet.core:skainet-backend-cpu:0.22.1")        // priority-50 Panama Vector
    implementation("sk.ainet.core:skainet-backend-native-cpu:0.22.1") // priority-100 FFM (auto-discovered)
}

KernelRegistry picks the highest-priority available provider; on hosts where the native lib doesn't load (sandboxed JDKs, unsupported arches), it cleanly falls back to Panama with no functional regression.

Project structure

Module Purpose
llm-api Framework-neutral interfaces (ChatModel, EmbeddingModel, ToolDefinition) — Spring AI-shaped.
llm-core OptimizedLLMRuntime, ModelRegistry, UnifiedModelLoader, shared abstractions.
llm-inference/<arch> Per-architecture network DSLs and weight loaders (llama, gemma, qwen, apertus, bert).
llm-runtime/<arch> Per-architecture runtime facades (kllama, kgemma, kqwen, kapertus).
llm-agent Chat templates, tool-call parsers, agent loops; Java surface.
llm-apps CLIs: skainet-cli (unified), kllama-cli, kbert-cli, plus kllama-java-sample.
llm-test/llm-test-java JUnit 5 end-to-end tests for the Java surface (gated on TINYLLAMA_MODEL_PATH).

Getting started

Prerequisites

  • JDK 21 or higher
  • Gradle 8.10+

CLI: unified skainet-cli

# Plain generation
./gradlew :llm-apps:skainet-cli:shadowJar
java -jar llm-apps/skainet-cli/build/libs/skainet-all.jar \
  -m /path/to/model.gguf "The capital of France is"

# Tool-calling demo (calculator + file-listing tools auto-registered)
java -jar skainet-all.jar -m model.gguf --demo --template=llama3 "What is 17 * 23?"

# Interactive agent
java -jar skainet-all.jar -m model.gguf --agent --template=apertus

--template accepts llama3, chatml, qwen, gemma, apertus (auto-detected from GGUF metadata if omitted).

Java consumers

try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ null)) {
    JavaTool calc = new JavaTool() {
        @Override public ToolDefinition getDefinition() {
            return JavaTools.definition(
                "calculator", "Evaluate an arithmetic expression.",
                "{\"type\":\"object\",\"properties\":{\"expression\":{\"type\":\"string\"}},\"required\":[\"expression\"]}"
            );
        }
        @Override public String execute(Map<String, ?> args) { /* ... */ }
    };
    JavaAgentLoop agent = JavaAgentLoop.builder()
        .session(session).tool(calc).template("llama3").build();
    String response = agent.chat("What is 17 * 23?");
}

See llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java for a runnable reference.

In develop, not in 0.21.1 yet

  • Apertus support. Routing fix, chat template, tool calling all merged on develop. See APERTUS_ROLLOUT.md. Real-checkpoint loading has known gaps tracked separately.
  • Gemma 4 chat-model JVM facade (Gemma4ChatModel) for embedded text-only deployments.
  • Sharded SafeTensors loadTensorStorageMapped for >2 GB models (consumed by Gemma 4 PLE mmap path).

Engine

This project uses SKaiNET as its underlying execution engine — tensor ops, neural-network DSL, kernel SPI, GGUF / SafeTensors I/O.

License

MIT — see LICENCE.

About

Multi-model LLM inference and agentic tool calling for the JVM, Android, and Kotlin/Native built on the SKaiNET engine.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages