GitHub - SKaiNET-developers/SKaiNET: SKaiNET makes local AI practical for developers: simple to build with, multiplatform by design, and optimized for native performance without compromises.

For architecture details see ARCHITECTURE.md.

Start in 5 minutes

SKaiNET is a Kotlin Multiplatform AI framework. New here? Choose the path that matches what you want to try first.

Goal	Start here	Time
Run tensor operations	Quickstart (below)	2–5 min
Build and train a neural net	Hello Neural Net (below)	5 min
Run a local GGUF model	SKaiNET Transformers starter	5 min after model setup
Export a secure MCU bundle	Minerva getting started	10 min without firmware flashing

Working in Java? SKaiNET ships first-class Java support — see the Java getting-started guide.

Use the version shown in this README as the source of truth for first-run snippets. If another page shows a different version, please open an issue or PR.

Quickstart

Add the core dependencies (Gradle Kotlin DSL):

dependencies {
    // Recommended: import the umbrella BOM and drop versions on the engine modules.
    implementation(platform("sk.ainet:skainet-bom:0.31.2"))

    implementation("sk.ainet.core:skainet-lang-core")
    implementation("sk.ainet.core:skainet-backend-cpu")
}

The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.

Hello Neural Net

val model = nn {
    input(28 * 28)
    dense(out = 128)
    relu()
    dense(out = 10)
}

Core Tensor Ops

val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }

val c = a matMul b
val d = c.relu()

GGUF Model Loading

// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
    println("Tensors: ${reader.tensorCount}")
    
    // Load specific tensor on demand (no whole-file loading)
    val bytes = reader.loadTensor("token_embd.weight")
    
    // Or get a TensorStorage descriptor with encoding/placement metadata
    val storage = reader.loadTensorStorage("token_embd.weight")
}

More examples: SKaiNET-examples | SKaiNET-notebook

Ecosystem

SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:

Project	Description
SKaiNET-transformers	Pre-built transformer architectures and layers
SKaiNET-examples	Sample projects and integration demos

Explore

Goal	Start here
Examples and sample projects	SKaiNET-examples
Interactive notebooks	SKaiNET-notebook
Eager backends & kernels (what runs where)	Backends & kernels mindmap

Official Benchmarks

SKaiNET ships an official Phoronix-Test-Suite-compatible benchmark program for the compute engine. See the methodology and replay docs, the release manifest, and the CI workflow. Smoke runs fire on every PR via ubuntu-latest; full publishable runs fire on a self-hosted Linux x86 runner on release.

Quick local replay:

./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
./scripts/run_engine_smoke.sh

Architecture goal

SKaiNET is built around one path: a model is defined once in the Kotlin DSL, then either compiled to native code or executed eagerly — without rewriting it.

Define the model with the DSL (nn { } / dag { }).
Capture it as a tape (traced execution) or a DAG (explicit graph).
Run it one of two ways:
- Compile — lower the graph to MLIR / StableHLO (HloGenerator) and compile to native code (IREE-compatible) for native / edge targets.
- Eager — execute directly on an available backend. On the JVM this is the primary, go-to path.

flowchart LR
    DSL["Model — Kotlin DSL"] --> Graph["Tape / DAG"]
    Graph --> HLO["MLIR / StableHLO"]
    Graph --> Eager["Eager backend (JVM, …)"]
    HLO --> Native["Native code"]

The same DSL model feeds both paths — eager execution for development and JVM deployment, the StableHLO path for native and edge targets.

Important Addition: Minerva Secure MCU Export

SKaiNET now includes a Minerva export backend for secure MCU deployment. It is a sibling to StableHLO and Arduino/C99 export: it starts from a supported ComputeGraph, lowers static MLPs to a Minerva compiler input, invokes libminerva when configured, and packages generated weights, host fixtures, firmware skeletons, and a fingerprinted manifest.json.

Start here:

Minerva getting started — run the maintained tiny MLP dry sample, then the real libminerva runtime profile.
Minerva export how-to — configure compiler paths, keys, calibration, CMake/CTest host verification, and troubleshooting.
How Minerva secure MCU export fits — understand why Minerva is not an Arduino replacement and when to choose StableHLO instead.

Runnable examples:

./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples
./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples \
  -Pminerva.example=sensor-classifier

Features

Kotlin Multiplatform

Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
Single codebase shared across all platforms via Kotlin Multiplatform

Optimized Execution

ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets: safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.

Neural Network DSL

Sequential: nn { input(); dense(); relu(); dense() }
DAG / Graph: arbitrary wiring with dag { } for ResNet, YOLO-style architectures
Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
KAN (Kolmogorov–Arnold Networks) layer (experimental)
Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers

Data and I/O

Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
Type-safe transform DSL: resize, crop, normalize, toTensor

Edge AI: Arduino / C99 Export

Export trained models to standalone, optimized C99 with static memory allocation
Ready-to-use Arduino library output

Edge AI: Minerva Secure MCU Export

Export supported static MLP graphs to Minerva project bundles for secure MCU inference
Emits compiler NPZ input, libminerva weights, a fingerprinted manifest, host harness, firmware example, and host verification results
Start with the Minerva getting started guide

Compiler: MLIR / StableHLO

Lower Kotlin DSL to MLIR StableHLO dialect
Optimization passes: constant folding, operation fusion, dead code elimination
Valid IREE-compilable output with streaming API and public HloGenerator

Choosing an Export Path

Use StableHLO when you want portable MLIR/IREE-compatible graphs for native, accelerator, or ecosystem compiler flows.
Use Arduino / C99 export when you want standalone generated C with static memory allocation and no external secure runtime.
Use Minerva export when you need a secure MCU project bundle that goes through libminerva packaging and host verification.

What's New in 0.31.2

RowDequantSource + ops.gather row-dequant. A TensorData can now mark itself RowDequantSource (dequantRow(rowIdx): FloatArray); ops.gather then dequantises only the rows it touches instead of materialising the whole table (and instead of the get() path, which such tensors don't support). The table presents as logical FP32, so a packed/oversized embedding (a Q-quantised token_embd) can stay packed and be looked up via ops.gather directly — moving the per-row-dequant trick out of model code into the engine. (PR #741)

What's New in 0.31.0

ops.transpose lazily handles every packed matmul dtype. The CPU backend rewraps packed bytes with a flipped shape (metadata-only "lazy transpose") so a packed weight survives linearProject's matmul(x, transpose(W)) instead of inflating to FP32 — but Q8_0 and Q4_0 were missing and threw Byte → Float ClassCastException. Now the full dispatch set (Q4_K/Q5_K/Q6_K/Q5_0/Q5_1/Q8_0/Q4_0) transposes lazily, so a packed Q8_0/Q4_0 matmul weight (e.g. a tied Q8_0 lm_head) stays packed end-to-end on its NEON/SIMD kernel. Regression-tested across all seven packed types. (PRs #736, #737)
Dependency: com.networknt:json-schema-validator → 3.0.4. (PR #733)

Recent releases

0.30.0 — First-class Q5_K packed in-kernel dequant-matmul across the CPU backends (Q5_KBlockTensorData + Q5KMatmulKernel SPI: scalar / Panama Vector / native-C), hand-written ARM NEON kernels (fp32/q8_0/q4k/q5k, -march=armv8.2-a+fp16+dotprod), and Kotlin/Native consumption of the C kernels via cinterop (skainet-backend-native-cpu static archive + linuxX64/linuxArm64 KernelProvider). (PR #734)
0.29.1 — sk.ainet.core:skainet-compile-minerva now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0).
0.29.0 — Minerva secure-MCU export module: an end-to-end pipeline that lowers a SKaiNET model through shared graph-export contracts → Minerva IR → an .npz compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus packed-quant matmul kernels with Kotlin/Native parity (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in DefaultCpuOpsBase, Panama Vector for Q5_1/Q5_0 and Q6_K via the KernelRegistry), and an auto-generated, CI-gated kernel × platform support matrix. (PRs #697–#726)
0.28.1 — Kotlin DSL → StableHLO → IREE is green end-to-end for the whole conformance suite (7/7 models, 27/27 ops compile to a vmfb): inferDagOutputSpecs now infers correct output shapes for shape-changing ops, and reduce_window (pooling) emits IREE's generic region form. (PRs #674, #676)
0.28.0 — Four StableHLO export bugs fixed (reshape #666, concatenate #667, constants/reductions #663, HloGenerator tracing #668) plus non-JVM image runtime support (#671). (PRs #664, #670, #671)
0.27.0 — A full gemma3 network lowers to StableHLO and compiles to an IREE vmfb (zero op gaps, verified by GemmaTraceTest): new scaledDotProductAttention (with causal + explicit additive mask), permute, narrow, and multi-output split converters, plus boxing-free FloatArray weight externalization for .irpa baking. (PRs #661 et al.)
0.26.0 — Q4_0 promoted to a first-class quantized format across the provider stack, tanh as a first-class activation primitive, and a CPU tensor convert op, plus test/build/CI hygiene. (PRs #648–#651, #631, #636)
0.25.0 — BF16 and Q8_0 matmul kernels end-to-end across the provider stack, autograd completeness for pow/log and the conv/pool/upsample/split family, the hybrid adaptive dtype-constraint DSL, the @DarcValidated operator-doc flag, and the SentencePiece special-token splitter. (PRs #595, #605–#628)
0.23.0 — Real-model GGUFs no longer OOM at network construction (lazy TensorDataFactory.placeholder(...)); Kotlin/Native can finally load GGUFs over 2 GiB via the new POSIX-pread-backed PosixPreadRandomAccessSource. (Issues #587, #589; PRs #588, #591)
0.22.2 — sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)
0.22.1 — StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)
0.22.0 — Native (FFM) CPU kernel provider: 4–6× faster Q4_K matmul, 1.5–1.8× FP32 SGEMM vs Panama Vector; auto-selected via KernelRegistry.bestAvailable(). (PR #571)

See CHANGELOG.md for the full release history.

Roadmap

Q1 2026: Comprehensive documentation ✅
Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.20.0)
Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
Q4 2026: Federated learning support for multi-device training

Contributing & Community

We love contributions! Whether it's a new operator, documentation, or a bug fix:

Read our Contribution Guide.
Check the Good First Issues.
Open a discussion or issue on GitHub.

Browse the full codebase documentation on DeepWiki.

Contributors (0.14.0)

Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)

License

MIT — see LICENCE.

Name		Name	Last commit message	Last commit date
Latest commit History 1,464 Commits
.github		.github
benchmarks		benchmarks
build-logic		build-logic
docs		docs
gradle		gradle
kotlin-js-store		kotlin-js-store
scripts		scripts
skainet-apps		skainet-apps
skainet-backends		skainet-backends
skainet-bom		skainet-bom
skainet-compile		skainet-compile
skainet-data		skainet-data
skainet-docs-samples		skainet-docs-samples
skainet-io		skainet-io
skainet-lang		skainet-lang
skainet-models/skainet-model-yolo		skainet-models/skainet-model-yolo
skainet-pipeline		skainet-pipeline
skainet-test		skainet-test
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
FAQ.md		FAQ.md
GITFLOW.adoc		GITFLOW.adoc
LICENCE		LICENCE
README.md		README.md
SECURITY.md		SECURITY.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Start in 5 minutes

Quickstart

Hello Neural Net

Core Tensor Ops

GGUF Model Loading

Ecosystem

Explore

Official Benchmarks

Architecture goal

Important Addition: Minerva Secure MCU Export

Features

Kotlin Multiplatform

Optimized Execution

Neural Network DSL

Data and I/O

Edge AI: Arduino / C99 Export

Edge AI: Minerva Secure MCU Export

Compiler: MLIR / StableHLO

Choosing an Export Path

What's New in 0.31.2

What's New in 0.31.0

Recent releases

Roadmap

Contributing & Community

Contributors (0.14.0)

License

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Start in 5 minutes

Quickstart

Hello Neural Net

Core Tensor Ops

GGUF Model Loading

Ecosystem

Explore

Official Benchmarks

Architecture goal

Important Addition: Minerva Secure MCU Export

Features

Kotlin Multiplatform

Optimized Execution

Neural Network DSL

Data and I/O

Edge AI: Arduino / C99 Export

Edge AI: Minerva Secure MCU Export

Compiler: MLIR / StableHLO

Choosing an Export Path

What's New in 0.31.2

What's New in 0.31.0

Recent releases

Roadmap

Contributing & Community

Contributors (0.14.0)

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages