Skip to content

Commit e4625da

Browse files
michalharakalclaude
andcommitted
chore(release): prepare SKaiNET-transformers 0.30.0
Version-aligned with the released SKaiNET 0.30.0 (Q5_K packed matmul, NEON native kernels, Kotlin/Native cinterop), already pinned in the catalog. - gradle.properties: VERSION_NAME 0.28.1 -> 0.30.0. - settings.gradle.kts: revert the mavenLocal()-first dev shim (0.30.0 is on Maven Central; the -PuseLocalSkainet composite build stays for local work). - CHANGELOG.md: add the [0.30.0] entry (Q5_K packed eager runtime, K/N-ready NATIVE_OPTIMIZED Gemma path, kernel-less/Q4_1 dequant fixes) + tag link. - README.md: bump "Current release" + BOM snippet to 0.30.0; add "What's new in 0.30.0". - docs tutorials: bump BOM coordinates 0.28.1 -> 0.30.0. No merge, no tag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 0406dc6 commit e4625da

6 files changed

Lines changed: 141 additions & 21 deletions

File tree

CHANGELOG.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,110 @@ version line is kept in lock-step with the underlying SKaiNET engine
77
The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
88
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
99

10+
## [0.30.0] — 2026-06-14
11+
12+
Version-aligned with **SKaiNET 0.30.0**. Skips 0.29.x — SKaiNET-transformers
13+
tracked the engine internally across that window (the in-progress Q5_K kernel
14+
shipped as a local `0.29.1`) without a tagged release. The headline is
15+
**Q5_K stays packed in the eager Gemma runtime** and the **Gemma
16+
`NATIVE_OPTIMIZED` packed-weight path is now Kotlin/Native–ready** — the board
17+
binary can keep K-quant weights packed without the JVM's `java.lang.foreign`
18+
MemSeg path.
19+
20+
### Added
21+
22+
- **Q5_K packed in-kernel dequant in the eager Gemma runtime.** FunctionGemma-270M
23+
ships as `Q5_K_M`, but `GemmaMemSegConverter` previously dequantized Q5_K
24+
weights to FP32 on load ("no native matmul kernel yet for Q5_K"), giving up
25+
both the memory saving and the in-kernel dequant. SKaiNET 0.30.0 provides a
26+
first-class Q5_K packed matmul (`Q5_KBlockTensorData` + `Q5KMatmulKernel`:
27+
scalar / Panama / native), so the converter now relayouts the GGUF bytes to
28+
block-major and wraps them as `Q5_KBlockTensorData` (176 B/block). Dispatch and
29+
the lazy transpose reach the kernel through `DefaultCpuOps`. Verified by
30+
`GemmaQ5KPackedParityTest` (`-PincludeIntegration`): the Q5_K packed path
31+
decodes FunctionGemma byte-identically to the FP32 baseline —
32+
`[262146, 236769, 3255, 718, 498, 1373, 262152, 106]`
33+
`<tool_0>(state="on")<end>` for *"Turn the light on."*
34+
- **Kotlin/Native–ready Gemma packed-weight path.** The `NATIVE_OPTIMIZED`
35+
packed conversion was `jvmMain`-only (it built `MemSeg`/`Arena`-backed tensors
36+
via `java.lang.foreign`), so the Kotlin/Native board binary couldn't keep
37+
K-quant weights packed. The platform-neutral pieces now live in `commonMain`:
38+
- **`GemmaQuantLayout.kt`** (`commonMain`) — `logicalShapeFor`,
39+
`relayoutKSeriesRowMajorToBlockMajor` (KMP-safe `copyInto`), and
40+
`packGemmaKQuant<T>()`, which builds heap-packed Q4_K/Q5_K/Q6_K
41+
`BlockTensorData` directly with no `MemSeg`/`Arena`.
42+
- **`GemmaPackedWeights.kt`** (`commonMain`) — `convertGemmaWeightsPacked`
43+
packs Q4/Q5/Q6_K matmul weights to heap `Q*_KBlockTensorData`, dequants
44+
`token_embd`/`output` to FP32 (gathered, no transpose) and any other quant
45+
type to FP32 `[out, in]`. `extractRawBytes` reads the loader's bytes back
46+
across both backings (JVM `IntArrayTensorData` / native `Byte`-typed).
47+
- **`GemmaNetworkLoader.load()`** now runs `convertGemmaWeightsPacked` before
48+
`applyWeightsToNetwork` under `NATIVE_OPTIMIZED`, so `load(NATIVE_OPTIMIZED)`
49+
yields a runnable network on the board *and* the JVM (previously it could not
50+
be built from raw-byte weights at all). `GemmaMemSegConverter` (`jvmMain`)
51+
now shares the `commonMain` helpers; only the `MemSeg`/FFM conversion and the
52+
FP32 fallbacks stay JVM-only.
53+
Verified on JVM and `linuxX64` (`GemmaQuantLayoutTest`): relayout, packing, and
54+
the native byte-extraction round-trip run on every target, and
55+
`GemmaQ5KPackedParityTest` confirms all three paths (FP32 baseline, `jvmMain`
56+
MemSeg-packed, `load()` packed) produce the identical token sequence.
57+
58+
### Changed
59+
60+
- **`gradle/libs.versions.toml` `skainet` pin: 0.28.1 → 0.30.0.** Picks up the
61+
released Q5_K packed matmul, the NEON native kernels, and the Kotlin/Native
62+
cinterop. Downstream consumers get the upstream SKaiNET BOM transparently via
63+
`:llm-bom`, so no per-consumer migration is needed.
64+
- **`gradle.properties` `VERSION_NAME=0.30.0`.** Lock-step with the engine.
65+
- **`settings.gradle.kts` reverts the `mavenLocal()`-first dev shim.** The
66+
ordering added while consuming the in-progress local SKaiNET `0.29.1` is no
67+
longer needed now that 0.30.0 is on Maven Central; the release resolves the
68+
engine purely from Central. The opt-in `-PuseLocalSkainet` composite build is
69+
unchanged for local engine work.
70+
71+
### Fixed
72+
73+
- **`fix(gemma): dequant kernel-less quant types in `NATIVE_OPTIMIZED` instead of
74+
leaving raw bytes`.** Loading a Gemma GGUF whose attention/FFN weights used a
75+
quant type with no packed SIMD kernel (e.g. Q5_1) under
76+
`QuantPolicy.NATIVE_OPTIMIZED` crashed at the first decode step
77+
(`Transpose requires at least 2 dimensions` in `MultiHeadAttention`
78+
`linearProject`): `GemmaMemSegConverter.convertOne` left every unhandled quant
79+
type as raw 1-D bytes. Kernel-less types now dequantize to a correct FP32
80+
`[out, in]` weight via a new `dequantPackedToFp32` helper (mirroring the proven
81+
`Gemma4WeightLoader.createTensor` column-major → row-major transpose). The
82+
supported packed types (Q4_0/Q8_0/Q4_K/Q6_K) keep their fast SIMD form; only
83+
kernel-less types pay the FP32 dequant.
84+
- **`fix(llama): dequantize Q4_1 (and all non-packed quant types) in
85+
`DecoderGgufMemSegConverter``.** The converter handled only Q4_0/Q8_0 (packed)
86+
and Q4_K/Q5_K/Q6_K (dequant); every other quant type fell through an `else`
87+
branch that logged a warning and passed the raw quant bytes through unchanged,
88+
crashing deep inside matmul (e.g. `unsupported quant type Q4_1 for
89+
blk.0.ffn_down.weight` on Q4_1 Qwen3 models). The `else` branch now routes
90+
through `DequantOps.dequantFromBytes` to FP32, covering Q4_1, Q5_0, Q5_1, Q8_1,
91+
IQ4_NL/XS, TQ1/2_0, etc.; genuinely unknown types now fail explicitly at load
92+
time instead of crashing later inside matmul. Closes
93+
[#654](https://github.com/SKaiNET-developers/SKaiNET-transformers/issues/654).
94+
95+
### Tests / CI
96+
97+
- **`GemmaQ5KPackedParityTest`** — byte-identical decode parity across the FP32
98+
baseline, the `jvmMain` MemSeg-packed path, and the `load(NATIVE_OPTIMIZED)`
99+
`commonMain` packed path.
100+
- **`GemmaQuantLayoutTest`** (`commonTest`) — block-transpose relayout, packing,
101+
and the byte-extraction round-trip; runs on JVM and `linuxX64`.
102+
- **`DecoderGgufMemSegConverterTest`** — regression that a Q4_1 weight is
103+
dequantized to its logical 2-D FP32 shape rather than passed through as 1-D
104+
bytes.
105+
- **`fix(gemma): macosArm64 target for `gemma-iree``** and CI parity fixes:
106+
MLIR-dump tests write to a portable build dir instead of a hardcoded local
107+
path; browser Mocha gets a 60 s timeout (parity with the engine repo).
108+
- **`test(gemma): repoint stale FunctionGemma GGUF path`** — six real-model
109+
integration tests now point at the in-repo
110+
`sl2610-function-calling/models/` location, matching
111+
`GemmaQ5KPackedParityTest`; all pass against the published SKaiNET 0.30.0
112+
(`-PincludeIntegration`).
113+
10114
## [0.28.1] — 2026-06-06
11115

12116
Version-aligned with **SKaiNET 0.28.1**. Skips 0.26.x / 0.27.x —
@@ -385,6 +489,8 @@ Version-aligned with **SKaiNET 0.21.0**.
385489
Last published transformers release before the engine-aligned version line.
386490
See `git log v0.16.0..0.18.0` for details.
387491

492+
[0.30.0]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.30.0
493+
[0.28.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.28.1
388494
[0.23.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.23.1
389495
[0.21.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.21.1
390496
[0.21.0]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.21.0

README.md

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -103,22 +103,21 @@ Honest status — see the project-status note at the top of this README.
103103

104104
## Current release
105105

106-
The current release is **0.28.1** — version-aligned with **SKaiNET 0.28.1**.
107-
Skips 0.26.x / 0.27.x: SKaiNET-transformers tracked the engine internally across
108-
that window without a tagged release. The headline is that the engine's
109-
**Kotlin DSL → StableHLO → IREE export path is now complete** — a full gemma3
110-
graph traces and lowers to StableHLO that `iree-compile`s to a `vmfb`
111-
(`GemmaMlirDumpTest` / `GemmaTraceTest` are green against 0.28.1). SKaiNET
112-
0.28.0/0.28.1 fixed the remaining export bugs: result-type inference for
113-
`reshape`/`matmul`/`concatenate` ([#673](https://github.com/SKaiNET-developers/SKaiNET/issues/673))
114-
and `conv1d`/`gather`/pooling/`flatten` shapes plus the `reduce_window` emission
115-
form ([#675](https://github.com/SKaiNET-developers/SKaiNET/issues/675)).
106+
The current release is **0.30.0** — version-aligned with **SKaiNET 0.30.0**.
107+
Skips 0.29.x: SKaiNET-transformers tracked the engine internally across that
108+
window without a tagged release. The headline is that **Q5_K weights now stay
109+
packed in the eager Gemma runtime** (SKaiNET 0.30.0 ships a first-class Q5_K
110+
packed matmul) and the Gemma `NATIVE_OPTIMIZED` packed-weight path is now
111+
**Kotlin/Native–ready** — the board binary can keep K-quant weights packed
112+
without the JVM's `java.lang.foreign` MemSeg path. FunctionGemma-270M (`Q5_K_M`)
113+
decodes byte-identically across the FP32 baseline and both packed paths
114+
(`GemmaQ5KPackedParityTest`).
116115

117116
The recommended way to consume is via the BOM. It pins every published `skainet-transformers-*` artifact and re-exports the upstream `sk.ainet:skainet-bom`, so the engine-side `sk.ainet.core:skainet-*` artifacts get the matching version too — you only need to declare the BOM version in one place.
118117

119118
```kotlin
120119
dependencies {
121-
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.28.1"))
120+
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0"))
122121

123122
// Versions resolved from the BOM:
124123
implementation("sk.ainet.transformers:skainet-transformers-core")
@@ -195,6 +194,27 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n
195194

196195
See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference.
197196

197+
## What's new in 0.30.0
198+
199+
- **Q5_K stays packed in the eager Gemma runtime.** `GemmaMemSegConverter` used to
200+
dequantize Q5_K weights to FP32 on load; SKaiNET 0.30.0 provides a first-class
201+
Q5_K packed matmul (`Q5_KBlockTensorData` + `Q5KMatmulKernel`), so the converter
202+
now relayouts the GGUF bytes to block-major and keeps them packed (176 B/block).
203+
FunctionGemma-270M (`Q5_K_M`) decodes byte-identically to the FP32 baseline
204+
(`GemmaQ5KPackedParityTest`).
205+
- **Gemma `NATIVE_OPTIMIZED` path is Kotlin/Native–ready.** The reusable layout +
206+
packing helpers (`GemmaQuantLayout.kt`, `GemmaPackedWeights.kt`) moved to
207+
`commonMain`, and `GemmaNetworkLoader.load()` now runs `convertGemmaWeightsPacked`
208+
under `NATIVE_OPTIMIZED` — so the board binary keeps K-quant weights packed with
209+
no `java.lang.foreign` MemSeg dependency. Verified on JVM and `linuxX64`.
210+
- **Engine pin `skainet 0.28.1 → 0.30.0`** — released Q5_K packed matmul, NEON
211+
native kernels, and Kotlin/Native cinterop. The `mavenLocal()`-first dev shim is
212+
reverted; the release resolves the engine from Maven Central.
213+
- **Fixes.** Kernel-less quant types under `NATIVE_OPTIMIZED` now dequant to FP32
214+
`[out, in]` instead of crashing on a rank-1 transpose; `DecoderGgufMemSegConverter`
215+
dequantizes Q4_1 and every other non-packed quant type instead of passing raw
216+
bytes through to a matmul crash ([#654](https://github.com/SKaiNET-developers/SKaiNET-transformers/issues/654)).
217+
198218
## What's new in 0.28.1
199219

200220
- **Engine pin `skainet 0.27.0 → 0.28.1`.** Picks up the completed Kotlin DSL →

docs/modules/ROOT/pages/tutorials/getting-started-java.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ In your `build.gradle.kts`:
2525
[source,kotlin]
2626
----
2727
dependencies {
28-
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.28.1"))
28+
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0"))
2929
3030
implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
3131
implementation("sk.ainet.transformers:skainet-transformers-agent")
@@ -41,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts):
4141
<dependency>
4242
<groupId>sk.ainet.transformers</groupId>
4343
<artifactId>skainet-transformers-bom</artifactId>
44-
<version>0.28.1</version>
44+
<version>0.30.0</version>
4545
<type>pom</type>
4646
<scope>import</scope>
4747
</dependency>

docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ The pieces you need live in three modules:
5252
[source,kotlin]
5353
----
5454
dependencies {
55-
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.28.1"))
55+
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0"))
5656
5757
implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
5858
implementation("sk.ainet.transformers:skainet-transformers-agent")

gradle.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
GROUP=sk.ainet.transformers
2-
VERSION_NAME=0.28.1
2+
VERSION_NAME=0.30.0
33

44
POM_DESCRIPTION=SKaiNET-transformers
55

settings.gradle.kts

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,6 @@ pluginManagement {
88

99
dependencyResolutionManagement {
1010
repositories {
11-
// mavenLocal first so a locally-published upstream SKaiNET (same
12-
// coordinates/version, e.g. sk.ainet.core:*:0.29.1 from a sibling
13-
// ../SKaiNET `publishToMavenLocal`) shadows Maven Central. Lets the
14-
// transformers build consume in-progress SKaiNET changes without the
15-
// composite build. Maven Central remains the fallback.
16-
mavenLocal()
1711
google()
1812
mavenCentral()
1913
}

0 commit comments

Comments
 (0)