Skip to content

Commit 5072593

Browse files
Merge pull request #539 from SKaiNET-developers/release/0.19.0
Release 0.19.0
2 parents 0aa9ba3 + d9b64ae commit 5072593

5 files changed

Lines changed: 84 additions & 16 deletions

File tree

CHANGELOG.md

Lines changed: 71 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,85 @@
22

33
## [Unreleased]
44

5+
## [0.19.0] - 2026-04-20
6+
57
### Added
6-
- **Qwen / GPT-2 Byte-Level BPE Tokenizer**: `QwenByteLevelBpeTokenizer` implements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (`fromGgufFields`) or a HuggingFace `tokenizer.json` (`fromTokenizerJson`). Verified against Qwen2.5-0.5B reference token IDs from HuggingFace `transformers`.
7-
- **LLaMA / SentencePiece Tokenizer**: `SentencePieceTokenizer` implements the llama.cpp SPM pipeline — whitespace escape (``), code-point symbol split, **score-priority** BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and `<0xNN>` byte fallback for unknown characters. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace `tokenizer.json` (`model.type == "Unigram"`). Verified against TinyLlama-1.1B reference token IDs from HuggingFace `transformers`.
8-
- **`TokenizerFactory` with Per-Architecture Dispatch**: Tokenizer selection is now **per-architecture, not per file format**. `TokenizerFactory.fromGguf(fields)` and `.fromTokenizerJson(json)` inspect `tokenizer.ggml.model` / `model.type` and dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors.
8+
9+
#### Tokenizers
10+
- **Qwen / GPT-2 Byte-Level BPE Tokenizer**: `QwenByteLevelBpeTokenizer` implements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (`fromGgufFields`) or a HuggingFace `tokenizer.json` (`fromTokenizerJson`). Verified against Qwen2.5-0.5B reference token IDs from HuggingFace `transformers`. (#463)
11+
- **LLaMA / SentencePiece Tokenizer**: `SentencePieceTokenizer` implements the llama.cpp SPM pipeline — whitespace escape (``), code-point symbol split, **score-priority** BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and `<0xNN>` byte fallback for unknown characters. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace `tokenizer.json` (`model.type == "Unigram"`). Verified against TinyLlama-1.1B reference token IDs from HuggingFace `transformers`. (#464)
12+
- **`TokenizerFactory` with Per-Architecture Dispatch**: Tokenizer selection is now **per-architecture, not per file format**. `TokenizerFactory.fromGguf(fields)` and `.fromTokenizerJson(json)` inspect `tokenizer.ggml.model` / `model.type` and dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors. (#463)
913
- **`Tokenizer` Interface**: Common surface implemented by `TekkenTokenizer`, `QwenByteLevelBpeTokenizer`, and `SentencePieceTokenizer` (`encode`, `decode`, `vocabSize`, `bosTokenId`, `eosTokenId`).
1014
- **GGUF Tokenizer Metadata**: `GgufModelMetadata` now exposes `tokenizerModel`, `tokenizerTokens`, `tokenizerMerges`, `tokenizerTokenTypes`, `bosTokenId`, and `eosTokenId` so callers can build a tokenizer without re-parsing the raw field map.
1115

16+
#### StableHLO → IREE compilation
17+
- **Whisper Encoder E2E**: Whisper encoder now compiles end-to-end via SKaiNET → StableHLO → IREE.
18+
- **Real StableHLO Lowerings**: `softmax`, `layerNorm`, and `rmsnorm` now lower to real StableHLO ops (reductions, `broadcast_in_dim`, standard ops) instead of `custom_call` stubs. (#467, #479, #480)
19+
- **New Op Converters**: `gather` / `embedding`, and `concat` / `slice` / `cast` StableHLO converters. (#483, #489)
20+
- **Activation Alias**: `silu` / `SiLU` registered as an alias for `swish` in `ActivationOperationsConverter`. (#484)
21+
- **`ConstantMaterializationPolicy`**: Seam for externalizing large weight tensors out of the StableHLO module (enables `.irpa` externalization). (#524)
22+
- **Splat Constant Folding**: Uniform-value tensor constants collapsed to `dense<v>` splat instead of fully materialized arrays. (#522)
23+
- **SSA Value Type Tracking**: Tracks SSA value types so `reshape` emits the operand's declared type, producing valid MLIR. (#521)
24+
- **Tensor Encoding in Output**: `tensor_encoding` comments in StableHLO output and a top-level `skainet.tensor_encodings` module attribute. (#473, #477)
25+
26+
#### IREE `.irpa` weight files
27+
- **`skainet-io-iree-params` Module**: New module with `IrpaWriter` for writing IREE Parameter Archive (`.irpa`) files. Accepts `FileBacked` handles via mmap on JVM / Android for zero-copy weight export. (#523, #525, #528, #529)
28+
29+
#### Backend API
30+
- **`skainet-backend-api` Module**: New module cleanly separating backend contracts; CPU backend now depends on it. (#468)
31+
- **`TensorEncoding` Metadata**: Accessor for `TensorSpec.metadata` and propagation through `TraceToGraphBuilder.finalize`, keeping quantization encoding visible end-to-end. (#469)
32+
33+
#### Java API (0.19.0 surface polish)
34+
- Annotated `StableHloConverterFactory` and `TokenizerFactory` for idiomatic Java call sites. (#400)
35+
- Renamed `TensorSpecEncoding.kt` class for Java callers. (#400)
36+
- Added `skainet-backend-api` to the BOM. (#400)
37+
- New `ReleaseApiJavaTest` covering the 0.19.0 Java surface. (#400)
38+
39+
#### Docs (Antora migration)
40+
- **Antora + Diátaxis**: Migrated docs to Antora with Divio / Diátaxis layout (tutorials, how-tos, reference, explanation). (#494)
41+
- **`skainet-docs-ui` v1.1.1**: Adopted the new theme with Diátaxis card-grid landing page. (#501)
42+
- **Operator Coverage Matrix**: Emit cross-backend Operator Coverage Matrix generated from `TensorOps` surface scan. (#494, #511)
43+
- **Ops Docs**: KDoc `@param` extraction, real version stamps, LaTeX rendering, fixed partials, and dropped void backend. (#511, #513)
44+
- **Dokka API Bundle**: Wired into the Antora site build. (#494)
45+
- **Local Mermaid**: Drop kroki, render Mermaid locally via `mmdc`. (#496)
46+
47+
#### Platform targets
48+
- **`androidNativeArm32`**: Added across core modules. (#503)
49+
1250
### Fixed
1351
- **Byte-Level BPE Broken for Qwen/GPT-2 Models**: Previously there was no GPT-2-style byte-level BPE tokenizer in the repo, and `GgufModelMetadata` ignored `tokenizer.ggml.merges` entirely — so any Qwen / GPT-2 / Mistral-Nemo model encoded text into garbage tokens (byte-level chars instead of merged vocab IDs), blocking chat mode and tool calling. The new `QwenByteLevelBpeTokenizer` + `TokenizerFactory` dispatch fix the issue for both GGUF and SafeTensors sources. (#463)
1452
- **No SentencePiece Path for LLaMA-Family GGUF Models**: `TokenizerFactory` previously threw `UnsupportedTokenizerException` for `tokenizer.ggml.model == "llama"`, leaving LLaMA / TinyLlama / Gemma / Mistral-v0.1 GGUFs untokenizable. The new `SentencePieceTokenizer` closes that gap. (#464)
1553
- **GGUF UInt Fields Silently Dropped**: GGUF UINT32 fields (e.g. `tokenizer.ggml.bos_token_id`) arrive from `StreamingGGUFReader` as `kotlin.UInt`, which is a value class — *not* a subclass of `kotlin.Number` — so a plain `as? Number` cast was returning null. The new `toIntFlexible` helper handles every signed and unsigned numeric type GGUF can produce, restoring the BOS/EOS/UNK ids on the tokenizer builders.
54+
- **Graph Conv Output Shape Inference**: `conv1d` / `conv2d` / `conv3d` operations in graph inference previously produced placeholder output shapes, breaking downstream shape-dependent passes. Graph ops now compute real output shapes. (#536, #537)
55+
- **Conv1d/Conv3d Not Recorded**: `conv1d` and `conv3d` were not routed through the recording decorator, so they disappeared from traced computation graphs. (#532, #533)
56+
- **Static Conv1d HLO Shape Crash**: Conv1d StableHLO lowering crashed when trace attributes were missing; now falls back to `TensorRef` shape / dtype. (#530, #531)
57+
- **Flatten Hardcoded to MNIST Shape**: `NetworkBuilder.flatten()` returned a hardcoded `lastDimension = 1568` (the MNIST CNN value); any other architecture — e.g. a 64-channel CNN over 32×32 inputs — crashed with `ArrayIndexOutOfBoundsException` in the following `dense()` layer. The DSL now tracks per-sample shape through a new `input(IntArray)` overload, `conv1d` / `conv2d` / `conv3d`, `maxPool2d`, `avgPool2d`, and `upsample2d`, reusing the `ConvShapeUtils` arithmetic introduced in #537; `flatten()` reads the tracked shape and honors `startDim` / `endDim`, and `Conv*` layers can auto-infer `inChannels` from the declared input. (#535, #538)
58+
- **StableHLO `transpose` / `dot_general` MLIR Emission**: Fixed malformed MLIR produced by `stablehlo.transpose` and `stablehlo.dot_general` that blocked IREE compilation. (#520)
59+
- **WasmJS / JS / Native Compile**: Replaced JVM-only `putIfAbsent` with a common-stdlib idiom. (#485)
60+
- **Antora Container**: `HOME=/tmp` so Chromium crashpad can launch during Mermaid rendering in CI. (#534)
61+
- **`bundleDokkaIntoSite` CI Permission Failure**: Fixed docs pipeline permission error. (#496)
62+
- **Pandoc Artifacts in Docs**: Stripped pandoc anchors and demoted heading levels in migrated pages. (#496)
63+
64+
### Changed
65+
- **`compile-hlo` Dependencies**: Dropped vestigial `skainet-backend-cpu` dependency from `compile-hlo` jvmMain. (#472)
66+
- **Moved-LLM Docs**: Replaced relocated LLM pages with redirect stubs pointing at the standalone repo. (#499)
67+
- **Maven Group / Version Refs**: Bumped stale version references and fixed Maven group coordinates. (#499)
68+
69+
### Removed
70+
- Stale `TURBOQUANT_ISSUES.md` tracker at the repo root. (#490)
71+
72+
### Dependencies
73+
- agp: 9.1.0 → 9.1.1.
74+
- com.networknt:json-schema-validator: 3.0.1 → 3.0.2.
75+
- org.jetbrains.kotlinx:kotlinx-serialization-json: bumped to 1.11.0.
76+
- actions/checkout: 4 → 6.
77+
- actions/upload-pages-artifact: 3 → 5.
78+
- actions/cache: 4 → 5.
79+
- actions/setup-java: 4 → 5.
80+
- actions/deploy-pages: 4 → 5.
81+
- actions/github-script: 8 → 9.
82+
- docker/build-push-action: 5 → 7.
83+
- docker/setup-buildx-action: 3 → 4.
1684

1785
## [0.18.0] - 2026-04-08
1886

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ Add the core dependencies (Gradle Kotlin DSL):
1919

2020
```kotlin
2121
dependencies {
22-
implementation("sk.ainet.core:SKaiNET-lang-core:0.18.0")
23-
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.18.0")
22+
implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0")
23+
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0")
2424
}
2525
```
2626

@@ -149,14 +149,14 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,
149149

150150
---
151151

152-
## What's New in 0.18.0
152+
## What's New in 0.19.0
153153

154-
- **TurboQuant KV-Cache Compression**Runtime KV-cache compression for LLM inference: ~8x memory reduction with 4-bit, works with any model (LLaMA, Mistral, Gemma, Qwen). One-line integration via `KvCacheStore.turboQuant("balanced", ...)`.
155-
- **Memory Architecture Hardening**First-class storage/placement abstractions (`TensorStorage`, `TensorEncoding`, `BufferHandle`, `Placement`), zero-copy ownership semantics, quantization-preserving loaders.
156-
- **KV-Cache Subsystem**Dedicated `KvCacheStore` with append-by-token writes, layer/head addressing, asymmetric K/V encoding policies, and `CompressedKvAttention` SDPA bridge.
157-
- **Mistral Tokenizer**Tekken (tiktoken-based BPE) tokenizer support for Mistral models.
158-
- **Large Tensor Fix**Fixed Int overflow in GGUF and SafeTensors loaders for tensors > 2 GB (Gemma 4 E4B support).
159-
- **CPU SIMD Kernels**Java Vector API acceleration for TurboQuant encode/decode/rotation operations.
154+
- **Qwen / GPT-2 Byte-Level BPE Tokenizer**Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace `tokenizer.json`; verified against Qwen2.5-0.5B reference token IDs.
155+
- **LLaMA / SentencePiece Tokenizer**llama.cpp SPM pipeline with whitespace escape, **score-priority** BPE (SPM rule, opposite of GPT-2 merge-rank), and `<0xNN>` byte fallback. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace Unigram `tokenizer.json`.
156+
- **`TokenizerFactory` Per-Architecture Dispatch**Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors.
157+
- **Byte-Level BPE Fix for Qwen/GPT-2**Previously these models encoded text into garbage tokens because `GgufModelMetadata` ignored `tokenizer.ggml.merges` entirely, blocking chat mode and tool calling. (#463)
158+
- **LLaMA GGUF Tokenization Fix**`TokenizerFactory` previously threw `UnsupportedTokenizerException` for LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464)
159+
- **GGUF UInt Field Fix**UINT32 fields (e.g. `tokenizer.ggml.bos_token_id`) are Kotlin `UInt` value classes, not subclasses of `Number`, and were silently dropped by `as? Number` casts. Fixed via a `toIntFlexible` helper that handles every signed and unsigned numeric type GGUF can produce.
160160

161161
See [CHANGELOG.md](CHANGELOG.md) for the full release history.
162162

@@ -165,7 +165,7 @@ See [CHANGELOG.md](CHANGELOG.md) for the full release history.
165165
## Roadmap
166166

167167
- **Q1 2026**: Comprehensive documentation ✅
168-
- **Q2 2026**: TurboQuant KV-cache compression ✅ (shipped in 0.18.0)
168+
- **Q2 2026**: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0)
169169
- **Q3 2026**: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
170170
- **Q4 2026**: Federated learning support for multi-device training
171171

docs/modules/ROOT/pages/reference/operators/generated/index.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
= AI-NET Operators Reference
22

3-
Generated from version `0.18.0` on 2026-04-15
3+
Generated from version `0.19.0` on 2026-04-15
44

55
== Operators by Modality
66

docs/modules/ROOT/pages/reference/ops-status-matrix.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Operator Coverage Matrix
22
:description: Cross-backend status for every operator function in SKaiNET.
33

4-
Generated from `operators.json` version `0.18.0` on 2026-04-15.
4+
Generated from `operators.json` version `0.19.0` on 2026-04-15.
55

66
Rows are `Operator.function` pairs; columns are backends that appear in any function's `statusByBackend` map. A missing entry means the backend makes no claim about the function — treat it as "unknown", not "not supported".
77

gradle.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
GROUP=sk.ainet.core
2-
VERSION_NAME=0.18.0
2+
VERSION_NAME=0.19.0
33
POM_DESCRIPTION=SKaiNET
44

55
POM_URL=https://github.com/SKaiNET-developers/skainet/

0 commit comments

Comments
 (0)