Merge pull request #539 from SKaiNET-developers/release/0.19.0

michalharakal · web-flow · commit 5072593304de · 2026-04-21T07:49:35.000+02:00
Release 0.19.0
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,17 +2,85 @@
 
 ## [Unreleased]
 
+## [0.19.0] - 2026-04-20
+
 ### Added
-- **Qwen / GPT-2 Byte-Level BPE Tokenizer**: `QwenByteLevelBpeTokenizer` implements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (`fromGgufFields`) or a HuggingFace `tokenizer.json` (`fromTokenizerJson`). Verified against Qwen2.5-0.5B reference token IDs from HuggingFace `transformers`.
-- **LLaMA / SentencePiece Tokenizer**: `SentencePieceTokenizer` implements the llama.cpp SPM pipeline — whitespace escape (`▁`), code-point symbol split, **score-priority** BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and `<0xNN>` byte fallback for unknown characters. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace `tokenizer.json` (`model.type == "Unigram"`). Verified against TinyLlama-1.1B reference token IDs from HuggingFace `transformers`.
-- **`TokenizerFactory` with Per-Architecture Dispatch**: Tokenizer selection is now **per-architecture, not per file format**. `TokenizerFactory.fromGguf(fields)` and `.fromTokenizerJson(json)` inspect `tokenizer.ggml.model` / `model.type` and dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors.
+
+#### Tokenizers
+- **Qwen / GPT-2 Byte-Level BPE Tokenizer**: `QwenByteLevelBpeTokenizer` implements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (`fromGgufFields`) or a HuggingFace `tokenizer.json` (`fromTokenizerJson`). Verified against Qwen2.5-0.5B reference token IDs from HuggingFace `transformers`. (#463)
+- **LLaMA / SentencePiece Tokenizer**: `SentencePieceTokenizer` implements the llama.cpp SPM pipeline — whitespace escape (`▁`), code-point symbol split, **score-priority** BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and `<0xNN>` byte fallback for unknown characters. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace `tokenizer.json` (`model.type == "Unigram"`). Verified against TinyLlama-1.1B reference token IDs from HuggingFace `transformers`. (#464)
+- **`TokenizerFactory` with Per-Architecture Dispatch**: Tokenizer selection is now **per-architecture, not per file format**. `TokenizerFactory.fromGguf(fields)` and `.fromTokenizerJson(json)` inspect `tokenizer.ggml.model` / `model.type` and dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors. (#463)
 - **`Tokenizer` Interface**: Common surface implemented by `TekkenTokenizer`, `QwenByteLevelBpeTokenizer`, and `SentencePieceTokenizer` (`encode`, `decode`, `vocabSize`, `bosTokenId`, `eosTokenId`).
 - **GGUF Tokenizer Metadata**: `GgufModelMetadata` now exposes `tokenizerModel`, `tokenizerTokens`, `tokenizerMerges`, `tokenizerTokenTypes`, `bosTokenId`, and `eosTokenId` so callers can build a tokenizer without re-parsing the raw field map.
 
+#### StableHLO → IREE compilation
+- **Whisper Encoder E2E**: Whisper encoder now compiles end-to-end via SKaiNET → StableHLO → IREE.
+- **Real StableHLO Lowerings**: `softmax`, `layerNorm`, and `rmsnorm` now lower to real StableHLO ops (reductions, `broadcast_in_dim`, standard ops) instead of `custom_call` stubs. (#467, #479, #480)
+- **New Op Converters**: `gather` / `embedding`, and `concat` / `slice` / `cast` StableHLO converters. (#483, #489)
+- **Activation Alias**: `silu` / `SiLU` registered as an alias for `swish` in `ActivationOperationsConverter`. (#484)
+- **`ConstantMaterializationPolicy`**: Seam for externalizing large weight tensors out of the StableHLO module (enables `.irpa` externalization). (#524)
+- **Splat Constant Folding**: Uniform-value tensor constants collapsed to `dense<v>` splat instead of fully materialized arrays. (#522)
+- **SSA Value Type Tracking**: Tracks SSA value types so `reshape` emits the operand's declared type, producing valid MLIR. (#521)
+- **Tensor Encoding in Output**: `tensor_encoding` comments in StableHLO output and a top-level `skainet.tensor_encodings` module attribute. (#473, #477)
+
+#### IREE `.irpa` weight files
+- **`skainet-io-iree-params` Module**: New module with `IrpaWriter` for writing IREE Parameter Archive (`.irpa`) files. Accepts `FileBacked` handles via mmap on JVM / Android for zero-copy weight export. (#523, #525, #528, #529)
+
+#### Backend API
+- **`skainet-backend-api` Module**: New module cleanly separating backend contracts; CPU backend now depends on it. (#468)
+- **`TensorEncoding` Metadata**: Accessor for `TensorSpec.metadata` and propagation through `TraceToGraphBuilder.finalize`, keeping quantization encoding visible end-to-end. (#469)
+
+#### Java API (0.19.0 surface polish)
+- Annotated `StableHloConverterFactory` and `TokenizerFactory` for idiomatic Java call sites. (#400)
+- Renamed `TensorSpecEncoding.kt` class for Java callers. (#400)
+- Added `skainet-backend-api` to the BOM. (#400)
+- New `ReleaseApiJavaTest` covering the 0.19.0 Java surface. (#400)
+
+#### Docs (Antora migration)
+- **Antora + Diátaxis**: Migrated docs to Antora with Divio / Diátaxis layout (tutorials, how-tos, reference, explanation). (#494)
+- **`skainet-docs-ui` v1.1.1**: Adopted the new theme with Diátaxis card-grid landing page. (#501)
+- **Operator Coverage Matrix**: Emit cross-backend Operator Coverage Matrix generated from `TensorOps` surface scan. (#494, #511)
+- **Ops Docs**: KDoc `@param` extraction, real version stamps, LaTeX rendering, fixed partials, and dropped void backend. (#511, #513)
+- **Dokka API Bundle**: Wired into the Antora site build. (#494)
+- **Local Mermaid**: Drop kroki, render Mermaid locally via `mmdc`. (#496)
+
+#### Platform targets
+- **`androidNativeArm32`**: Added across core modules. (#503)
+
 ### Fixed
 - **Byte-Level BPE Broken for Qwen/GPT-2 Models**: Previously there was no GPT-2-style byte-level BPE tokenizer in the repo, and `GgufModelMetadata` ignored `tokenizer.ggml.merges` entirely — so any Qwen / GPT-2 / Mistral-Nemo model encoded text into garbage tokens (byte-level chars instead of merged vocab IDs), blocking chat mode and tool calling. The new `QwenByteLevelBpeTokenizer` + `TokenizerFactory` dispatch fix the issue for both GGUF and SafeTensors sources. (#463)
 - **No SentencePiece Path for LLaMA-Family GGUF Models**: `TokenizerFactory` previously threw `UnsupportedTokenizerException` for `tokenizer.ggml.model == "llama"`, leaving LLaMA / TinyLlama / Gemma / Mistral-v0.1 GGUFs untokenizable. The new `SentencePieceTokenizer` closes that gap. (#464)
 - **GGUF UInt Fields Silently Dropped**: GGUF UINT32 fields (e.g. `tokenizer.ggml.bos_token_id`) arrive from `StreamingGGUFReader` as `kotlin.UInt`, which is a value class — *not* a subclass of `kotlin.Number` — so a plain `as? Number` cast was returning null. The new `toIntFlexible` helper handles every signed and unsigned numeric type GGUF can produce, restoring the BOS/EOS/UNK ids on the tokenizer builders.
+- **Graph Conv Output Shape Inference**: `conv1d` / `conv2d` / `conv3d` operations in graph inference previously produced placeholder output shapes, breaking downstream shape-dependent passes. Graph ops now compute real output shapes. (#536, #537)
+- **Conv1d/Conv3d Not Recorded**: `conv1d` and `conv3d` were not routed through the recording decorator, so they disappeared from traced computation graphs. (#532, #533)
+- **Static Conv1d HLO Shape Crash**: Conv1d StableHLO lowering crashed when trace attributes were missing; now falls back to `TensorRef` shape / dtype. (#530, #531)
+- **Flatten Hardcoded to MNIST Shape**: `NetworkBuilder.flatten()` returned a hardcoded `lastDimension = 1568` (the MNIST CNN value); any other architecture — e.g. a 64-channel CNN over 32×32 inputs — crashed with `ArrayIndexOutOfBoundsException` in the following `dense()` layer. The DSL now tracks per-sample shape through a new `input(IntArray)` overload, `conv1d` / `conv2d` / `conv3d`, `maxPool2d`, `avgPool2d`, and `upsample2d`, reusing the `ConvShapeUtils` arithmetic introduced in #537; `flatten()` reads the tracked shape and honors `startDim` / `endDim`, and `Conv*` layers can auto-infer `inChannels` from the declared input. (#535, #538)
+- **StableHLO `transpose` / `dot_general` MLIR Emission**: Fixed malformed MLIR produced by `stablehlo.transpose` and `stablehlo.dot_general` that blocked IREE compilation. (#520)
+- **WasmJS / JS / Native Compile**: Replaced JVM-only `putIfAbsent` with a common-stdlib idiom. (#485)
+- **Antora Container**: `HOME=/tmp` so Chromium crashpad can launch during Mermaid rendering in CI. (#534)
+- **`bundleDokkaIntoSite` CI Permission Failure**: Fixed docs pipeline permission error. (#496)
+- **Pandoc Artifacts in Docs**: Stripped pandoc anchors and demoted heading levels in migrated pages. (#496)
+
+### Changed
+- **`compile-hlo` Dependencies**: Dropped vestigial `skainet-backend-cpu` dependency from `compile-hlo` jvmMain. (#472)
+- **Moved-LLM Docs**: Replaced relocated LLM pages with redirect stubs pointing at the standalone repo. (#499)
+- **Maven Group / Version Refs**: Bumped stale version references and fixed Maven group coordinates. (#499)
+
+### Removed
+- Stale `TURBOQUANT_ISSUES.md` tracker at the repo root. (#490)
+
+### Dependencies
+- agp: 9.1.0 → 9.1.1.
+- com.networknt:json-schema-validator: 3.0.1 → 3.0.2.
+- org.jetbrains.kotlinx:kotlinx-serialization-json: bumped to 1.11.0.
+- actions/checkout: 4 → 6.
+- actions/upload-pages-artifact: 3 → 5.
+- actions/cache: 4 → 5.
+- actions/setup-java: 4 → 5.
+- actions/deploy-pages: 4 → 5.
+- actions/github-script: 8 → 9.
+- docker/build-push-action: 5 → 7.
+- docker/setup-buildx-action: 3 → 4.
 
 ## [0.18.0] - 2026-04-08
 
diff --git a/README.md b/README.md
@@ -19,8 +19,8 @@ Add the core dependencies (Gradle Kotlin DSL):
 
 ```kotlin
 dependencies {
-    implementation("sk.ainet.core:SKaiNET-lang-core:0.18.0")
-    implementation("sk.ainet.core:SKaiNET-backend-cpu:0.18.0")
+    implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0")
+    implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0")
 }
 ```
 
@@ -149,14 +149,14 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,
 
 ---
 
-## What's New in 0.18.0
+## What's New in 0.19.0
 
-- **TurboQuant KV-Cache Compression** — Runtime KV-cache compression for LLM inference: ~8x memory reduction with 4-bit, works with any model (LLaMA, Mistral, Gemma, Qwen). One-line integration via `KvCacheStore.turboQuant("balanced", ...)`.
-- **Memory Architecture Hardening** — First-class storage/placement abstractions (`TensorStorage`, `TensorEncoding`, `BufferHandle`, `Placement`), zero-copy ownership semantics, quantization-preserving loaders.
-- **KV-Cache Subsystem** — Dedicated `KvCacheStore` with append-by-token writes, layer/head addressing, asymmetric K/V encoding policies, and `CompressedKvAttention` SDPA bridge.
-- **Mistral Tokenizer** — Tekken (tiktoken-based BPE) tokenizer support for Mistral models.
-- **Large Tensor Fix** — Fixed Int overflow in GGUF and SafeTensors loaders for tensors > 2 GB (Gemma 4 E4B support).
-- **CPU SIMD Kernels** — Java Vector API acceleration for TurboQuant encode/decode/rotation operations.
+- **Qwen / GPT-2 Byte-Level BPE Tokenizer** — Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace `tokenizer.json`; verified against Qwen2.5-0.5B reference token IDs.
+- **LLaMA / SentencePiece Tokenizer** — llama.cpp SPM pipeline with whitespace escape, **score-priority** BPE (SPM rule, opposite of GPT-2 merge-rank), and `<0xNN>` byte fallback. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace Unigram `tokenizer.json`.
+- **`TokenizerFactory` Per-Architecture Dispatch** — Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors.
+- **Byte-Level BPE Fix for Qwen/GPT-2** — Previously these models encoded text into garbage tokens because `GgufModelMetadata` ignored `tokenizer.ggml.merges` entirely, blocking chat mode and tool calling. (#463)
+- **LLaMA GGUF Tokenization Fix** — `TokenizerFactory` previously threw `UnsupportedTokenizerException` for LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464)
+- **GGUF UInt Field Fix** — UINT32 fields (e.g. `tokenizer.ggml.bos_token_id`) are Kotlin `UInt` value classes, not subclasses of `Number`, and were silently dropped by `as? Number` casts. Fixed via a `toIntFlexible` helper that handles every signed and unsigned numeric type GGUF can produce.
 
 See [CHANGELOG.md](CHANGELOG.md) for the full release history.
 
@@ -165,7 +165,7 @@ See [CHANGELOG.md](CHANGELOG.md) for the full release history.
 ## Roadmap
 
 - **Q1 2026**: Comprehensive documentation ✅
-- **Q2 2026**: TurboQuant KV-cache compression ✅ (shipped in 0.18.0)
+- **Q2 2026**: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0)
 - **Q3 2026**: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
 - **Q4 2026**: Federated learning support for multi-device training
 
diff --git a/docs/modules/ROOT/pages/reference/operators/generated/index.adoc b/docs/modules/ROOT/pages/reference/operators/generated/index.adoc
@@ -1,6 +1,6 @@
 = AI-NET Operators Reference
 
-Generated from version `0.18.0` on 2026-04-15
+Generated from version `0.19.0` on 2026-04-15
 
 == Operators by Modality
 
diff --git a/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc b/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc
@@ -1,7 +1,7 @@
 = Operator Coverage Matrix
 :description: Cross-backend status for every operator function in SKaiNET.
 
-Generated from `operators.json` version `0.18.0` on 2026-04-15.
+Generated from `operators.json` version `0.19.0` on 2026-04-15.
 
 Rows are `Operator.function` pairs; columns are backends that appear in any function's `statusByBackend` map. A missing entry means the backend makes no claim about the function — treat it as "unknown", not "not supported".
 
diff --git a/gradle.properties b/gradle.properties
@@ -1,5 +1,5 @@
 GROUP=sk.ainet.core
-VERSION_NAME=0.18.0
+VERSION_NAME=0.19.0
 POM_DESCRIPTION=SKaiNET
 
 POM_URL=https://github.com/SKaiNET-developers/skainet/