Skip to content

SKaiNET-developers/SKaiNET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,205 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: MIT Maven Central GitHub Contributors DeepWiki

SKaiNET logo

Vision

SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.

For architecture details see ARCHITECTURE.md.


Quickstart

Add the core dependencies (Gradle Kotlin DSL):

dependencies {
    // Recommended: import the umbrella BOM and drop versions on the engine modules.
    implementation(platform("sk.ainet:skainet-bom:0.23.0"))

    implementation("sk.ainet.core:skainet-lang-core")
    implementation("sk.ainet.core:skainet-backend-cpu")
}

The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.

Hello Neural Net

val model = nn {
    input(28 * 28)
    dense(out = 128)
    relu()
    dense(out = 10)
}

Core Tensor Ops

val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }

val c = a matMul b
val d = c.relu()

GGUF Model Loading

// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
    println("Tensors: ${reader.tensorCount}")
    
    // Load specific tensor on demand (no whole-file loading)
    val bytes = reader.loadTensor("token_embd.weight")
    
    // Or get a TensorStorage descriptor with encoding/placement metadata
    val storage = reader.loadTensorStorage("token_embd.weight")
}

More examples: SKaiNET-examples | SKaiNET-notebook


Ecosystem

SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:

Project Description
SKaiNET-transformers Pre-built transformer architectures and layers
SKaiNET-examples Sample projects and integration demos

Explore

Goal Start here
Examples and sample projects SKaiNET-examples
Interactive notebooks SKaiNET-notebook
LLM inference (Llama, Gemma, Qwen) SKaiNET-transformers

Features

Kotlin Multiplatform

  • Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
  • Single codebase shared across all platforms via Kotlin Multiplatform

Optimized Execution

  • ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
  • SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
  • TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets: safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.

Agentic AI Infrastructure

  • ComputeGraph: Unified framework for defining agentic workflows and tool-calling loops.
  • Java facade: JavaAgentLoop (in skainet-lang-java)

Neural Network DSL

  • Sequential: nn { input(); dense(); relu(); dense() }
  • DAG / Graph: arbitrary wiring with dag { } for ResNet, YOLO-style architectures
  • Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
  • KAN (Kolmogorov–Arnold Networks) layer (experimental)
  • Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers

Data and I/O

  • Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
  • Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
  • Type-safe transform DSL: resize, crop, normalize, toTensor

Java 21+ Support

  • SKaiNET entry point, TensorJavaOps, builder-pattern model definition
  • Maven BOM (sk.ainet:skainet-bom) for one-line version management

Edge AI: Arduino / C99 Export

  • Export trained models to standalone, optimized C99 with static memory allocation
  • Ready-to-use Arduino library output

Compiler: MLIR / StableHLO

  • Lower Kotlin DSL to MLIR StableHLO dialect
  • Optimization passes: constant folding, operation fusion, dead code elimination
  • Valid IREE-compilable output with streaming API and public HloGenerator

What's New in 0.23.0

  • Real-model GGUFs no longer OOM at network construction. The DSL pre-allocated zero-filled FloatArray(shape.volume) for every Linear / Conv weight at module-creation time, even though downstream loaders overwrite those zeros immediately. For an Apertus-8B Q4_K_S GGUF (4.7 GB on disk) that was ~27 GB of FP32 zeros allocated and thrown away — OOMed at 12 GB heap. New TensorDataFactory.placeholder(...) API; every eager zeros(...) call site in the network builders routes through it. Lazy materialization fires only if a caller actually reads the tensor (which the load path never does). Verified end-to-end against unsloth/Apertus-8B-Instruct-2509-GGUF: now loads in 12 GB heap. Same fix benefits Gemma / Llama / Qwen / Voxtral DSL paths transparently. (Issue #587, PR #588)
  • Kotlin/Native: GGUFs over ~2 GiB now load. createRandomAccessSource(filePath) had no native actual; K/N consumers fell through to the legacy slurp-into-ByteArray reader, which capped at Int.MAX_VALUE bytes (~2 GiB). Practical impact: macOS / Linux / iOS native couldn't open Q8 models above ~1B parameters or Q4 above ~3B. New POSIX-pread-backed PosixPreadRandomAccessSource covers macosArm64, linuxX64, linuxArm64, iosArm64, iosSimulatorArm64. (Issue #589, PR #591)

Recent releases

  • 0.22.2sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)
  • 0.22.1StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)
  • 0.22.0 — Native (FFM) CPU kernel provider: 4–6× faster Q4_K matmul, 1.5–1.8× FP32 SGEMM vs Panama Vector; auto-selected via KernelRegistry.bestAvailable(). (PR #571)

See CHANGELOG.md for the full release history.


Roadmap

  • Q1 2026: Comprehensive documentation ✅
  • Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.20.0)
  • Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
  • Q4 2026: Federated learning support for multi-device training

Contributing & Community

We love contributions! Whether it's a new operator, documentation, or a bug fix:

  1. Read our Contribution Guide.
  2. Check the Good First Issues.
  3. Open a discussion or issue on GitHub.

Browse the full codebase documentation on DeepWiki.

Contributors (0.14.0)

  • Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)

License

MIT — see LICENCE.

About

SKaiNET makes local AI practical for developers: simple to build with, multiplatform by design, and optimized for native performance without compromises.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages