Skip to content

Commit b25d7e2

Browse files
Merge pull request #580 from SKaiNET-developers/release/0.22.0
Prepare 0.22.0
2 parents 38fceca + 09caa00 commit b25d7e2

4 files changed

Lines changed: 147 additions & 7 deletions

File tree

.github/workflows/publish.yml

Lines changed: 110 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,106 @@
11
name: release
22

3+
# Tag-triggered Maven Central release.
4+
#
5+
# Two-phase flow so the published JAR for skainet-backend-native-cpu
6+
# carries every supported native lib (.so / .dylib / .dll) regardless
7+
# of which OS hosts the publish step:
8+
#
9+
# 1. build-native — matrix job: each runner builds its own host's
10+
# libskainet_kernels via CMake, uploads the resulting binary as
11+
# an artifact named `native-<arch>`. fail-fast stays on so a
12+
# missing arch aborts the release rather than shipping a partial
13+
# fat JAR.
14+
#
15+
# 2. publish — runs on macOS (signing tooling is wired up there),
16+
# downloads every native artifact, stages them into the native
17+
# module's resources tree (`build/native/resources/native/<arch>/`),
18+
# then runs `./gradlew publish`. Gradle's own CMake step rebuilds
19+
# for the macOS host into native/macos-arm64/; the pre-staged libs
20+
# for the other arches sit in their own subdirs and survive.
21+
# resources.srcDir(nativeResourcesRoot) on jvmMain picks them all
22+
# up into the published JAR.
23+
#
24+
# Linux ARM64 is intentionally absent: Kotlin/Native plugin 2.3.21
25+
# doesn't support `linux aarch64` as a HOST target ("Unknown host
26+
# target" — see SKaiNET PR #577). Linux ARM64 consumers fall back
27+
# cleanly to the Panama priority-50 provider.
28+
329
on:
430
push:
531
tags:
632
- '**'
733

834
jobs:
35+
build-native:
36+
name: native ${{ matrix.arch_label }}
37+
strategy:
38+
fail-fast: true
39+
matrix:
40+
include:
41+
- os: ubuntu-latest
42+
arch_label: linux-x86_64
43+
lib_name: libskainet_kernels.so
44+
- os: macos-14
45+
arch_label: macos-arm64
46+
lib_name: libskainet_kernels.dylib
47+
- os: windows-latest
48+
arch_label: windows-x86_64
49+
lib_name: skainet_kernels.dll
50+
runs-on: ${{ matrix.os }}
51+
timeout-minutes: 30
52+
steps:
53+
- name: Checkout
54+
uses: actions/checkout@v6
55+
56+
- name: Set up JDK 25
57+
uses: actions/setup-java@v5
58+
with:
59+
distribution: 'zulu'
60+
java-version: 25
61+
62+
- name: Verify cmake
63+
run: cmake --version
64+
65+
- name: Build native lib (Unix)
66+
if: runner.os != 'Windows'
67+
env:
68+
GRADLE_OPTS: -Dorg.gradle.jvmargs=-Xmx4g -Dfile.encoding=UTF-8
69+
run: |
70+
./gradlew --no-daemon --stacktrace --no-configuration-cache \
71+
:skainet-backends:skainet-backend-native-cpu:packageNativeKernels
72+
73+
- name: Build native lib (Windows)
74+
if: runner.os == 'Windows'
75+
shell: pwsh
76+
env:
77+
GRADLE_OPTS: -Dorg.gradle.jvmargs=-Xmx4g -Dfile.encoding=UTF-8
78+
run: |
79+
.\gradlew.bat --no-daemon --stacktrace --no-configuration-cache `
80+
:skainet-backends:skainet-backend-native-cpu:packageNativeKernels
81+
82+
- name: Upload native artifact
83+
uses: actions/upload-artifact@v7
84+
with:
85+
name: native-${{ matrix.arch_label }}
86+
path: skainet-backends/skainet-backend-native-cpu/build/native/resources/native/${{ matrix.arch_label }}/${{ matrix.lib_name }}
87+
if-no-files-found: error
88+
retention-days: 14
89+
990
publish:
1091
name: Release build and publish
92+
needs: build-native
1193
runs-on: macOS-latest
1294
steps:
1395
- name: Check out code
1496
uses: actions/checkout@v6
97+
1598
- name: Set up JDK 25
1699
uses: actions/setup-java@v5
17100
with:
18101
distribution: 'zulu'
19102
java-version: 25
103+
20104
- name: Validate signing configuration
21105
run: |
22106
if ! grep -Eq '^[[:space:]]*signAllPublications[[:space:]]*=[[:space:]]*true[[:space:]]*$' gradle.properties; then
@@ -25,10 +109,35 @@ jobs:
25109
grep -n 'signAllPublications' gradle.properties || echo "No signAllPublications property found" >&2
26110
exit 1
27111
fi
112+
113+
- name: Download cross-arch native artifacts
114+
uses: actions/download-artifact@v4
115+
with:
116+
path: native-artifacts
117+
# All artifacts named `native-*` from the build-native matrix.
118+
pattern: native-*
119+
merge-multiple: false
120+
121+
- name: Stage cross-arch native libs into module resources
122+
run: |
123+
set -euo pipefail
124+
DEST="skainet-backends/skainet-backend-native-cpu/build/native/resources/native"
125+
for arch in linux-x86_64 macos-arm64 windows-x86_64; do
126+
src_dir="native-artifacts/native-${arch}"
127+
if [ ! -d "$src_dir" ]; then
128+
echo "Missing native artifact for ${arch}" >&2
129+
exit 1
130+
fi
131+
mkdir -p "${DEST}/${arch}"
132+
cp -v "${src_dir}"/* "${DEST}/${arch}/"
133+
done
134+
echo "--- Staged tree ---"
135+
find "$DEST" -type f
136+
28137
- name: Publish to MavenCentral
29138
run: ./gradlew publish --no-configuration-cache --stacktrace
30139
env:
31140
ORG_GRADLE_PROJECT_mavenCentralUsername: ${{ secrets.MAVEN_CENTRAL_USERNAME }}
32141
ORG_GRADLE_PROJECT_mavenCentralPassword: ${{ secrets.MAVEN_CENTRAL_PASSWORD }}
33142
ORG_GRADLE_PROJECT_signingInMemoryKey: ${{ secrets.GPG_PRIVATE_KEY }}
34-
ORG_GRADLE_PROJECT_signingInMemoryKeyPassword: ${{ secrets.SIGNING_PASSWORD }}
143+
ORG_GRADLE_PROJECT_signingInMemoryKeyPassword: ${{ secrets.SIGNING_PASSWORD }}

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,36 @@
22

33
## [Unreleased]
44

5+
## [0.22.0] - 2026-04-30
6+
7+
### Added
8+
9+
#### Native (FFM) CPU kernel provider — M5 milestone closed
10+
11+
This release closes milestone M5 of the JVM inference performance roadmap with a priority-100 native kernel provider that wraps a bundled C shared library via Java's Foreign Function & Memory API. Plugs into the existing `KernelProvider` SPI so `KernelRegistry.bestAvailable()` automatically routes Q4_K and FP32 matmul through native when the lib loads, falling back cleanly to the priority-50 Panama Vector kernels otherwise.
12+
13+
- **`skainet-backend-native-cpu` module** — new JVM-only KMP module wrapping a CMake-built shared library (`libskainet_kernels.{so,dylib,dll}`). Bundled into the JAR resources at `native/<os>-<arch>/`, extracted at runtime to a process-scoped temp dir, loaded via `System.load`, and accessed via `Linker.nativeLinker().downcallHandle(...)`. ServiceLoader auto-registers `NativeKernelProviderFactory` via `META-INF/services/sk.ainet.backend.api.kernel.KernelProvider`. (PR #571)
14+
- **Native Q4_K matmul** — single-source scalar C kernel (`-O3 -ffast-math -funroll-loops`); the inner 32-iteration loop auto-vectorizes cleanly into `vfmadd231ps` (AVX2) / `fmla` (NEON). Mirrors `PanamaVectorQ4KMatmulKernel` byte-for-byte on the canonical ggml super-block layout (256 elements / 144 bytes, FP16 d/dMin, 12-byte `get_scale_min_k4` packed sub-scales, 128 bytes of strided 4-bit codes, lazy-`dmin` accumulation). Microbench (Linux x86_64, JDK 21.0.10): **5.87× / 4.71× / 4.17× faster than Panama Vector at 1024² / 2048² / 4096² Q4_K matmul shapes** — single-threaded native beating Panama's `parallelChunks` multi-threaded path on every measured shape. Numerical parity vs Panama within `1e-4` relative tolerance. (PR #572)
15+
- **`Q4KMemSegMatmulKernel` SPI sibling + zero-copy native variant** — JVM-only sibling kernel interface in `skainet-backend-api/jvmMain` taking weights as `MemorySegment` instead of `ByteArray`, plus a JVM-only `MemSegKernelProvider` provider interface that providers can implement alongside `KernelProvider` for the smart-cast lookup pattern at the call site. Reuses the same C symbol as the heap-input kernel — the bytes just don't round-trip through the JVM heap. **+20% wall-clock at 4096²** vs the heap-copy path (9 MB weight transfer eliminated); noise-level at smaller shapes. Bit-identical output to the heap variant. (PR #573)
16+
- **Cross-arch CI matrix** — new `.github/workflows/native-cpu-multiarch.yml` builds and tests the native module on `ubuntu-latest`, `macos-14` (Apple Silicon), and `windows-latest` for every push/PR that touches the native module. Catches portability regressions (linker, alignment, compiler-specific syntax) at PR time rather than after release. C portability tightened: `SKAINET_RESTRICT` macro maps to `__restrict__` on GCC/Clang and `__restrict` on MSVC; CMake grows an MSVC compile-flag branch (`/O2 /fp:fast /W3`) alongside the existing GCC/Clang one. Linux ARM64 was attempted but Kotlin/Native plugin 2.3.21 doesn't support `linux aarch64` as a HOST target ("Unknown host target") — left out for now. (PRs #574, #577)
17+
- **Native FP32 SGEMM** — row-major `C(m,n) = A(m,k) * B(k,n)` with stride support, i-p-j outer-product order so the inner `c[j] += a*b[j]` loop streams two contiguous arrays and auto-vectorizes into FMA. Wired into the existing `matmulFp32()` SPI accessor. Microbench at 256³ / 512³ / 1024³: **1.77× / 1.58× / 1.55× faster than `PanamaVectorMatmulKernel`**. The narrower margin vs Q4_K reflects Panama's already-polished FP32 path (tile-blocking + B-pack + `parallelChunks`); native still wins on every measured shape. Numerical parity within `1e-5 * k` relative tolerance. (PR #575)
18+
- **Multi-arch fat JAR publishing**`.github/workflows/publish.yml` extended to a two-phase flow: a matrix `build-native` job builds `libskainet_kernels` on each supported host (linux-x86_64, macos-arm64, windows-x86_64), and the `publish` job downloads all three artifacts, stages them into the native module's resources tree, and publishes with every supported arch bundled. Consumers on any of the three arches get a working native path out of the box — no manual side-loading.
19+
20+
#### Module + publishing infrastructure
21+
22+
- **`skainet-backend-native-cpu` registered in BOM**`skainet-bom` now constrains the new module alongside `skainet-backend-api` and `skainet-backend-cpu`. Consumers depending on the BOM get a constrained version without a separate pin. (PR #576)
23+
- **Publishing config wired**`vanniktech.mavenPublish` plugin + per-module `gradle.properties` (POM_ARTIFACT_ID + POM_NAME) on the new module. Composite-build consumers (e.g. SKaiNET-transformers via `includeBuild`) substitute the published coordinates with the local project ref through the same path every other SKaiNET module uses. (PR #576)
24+
25+
### Documentation
26+
27+
- **`NativeKernelProvider` consumption kdoc** — covers two gotchas downstream consumers hit on first wiring: (1) the module is JVM-only (FFM has no Native/JS/Wasm equivalents) so KMP consumers must add the dep to `jvmMain.dependencies`, never `commonMain`; (2) `com.gradleup.shadow:9.4.x` `mergeServiceFiles()` silently drops the `NativeKernelProviderFactory` entry when both `skainet-backend-cpu` and `skainet-backend-native-cpu` are on a shadow JAR's classpath — workaround pointer to the `kllama-cli` `doLast` fix in SKaiNET-transformers PR #88. (PR #579)
28+
- **`docs/.../perf/native-ffm-plan.adoc`** — design baseline for the native FFM provider (recovered from the 0.21.0-cycle PRD that was dropped from the repo root and rehomed as asciidoc). Documents module layout, FFM binding pattern, staged delivery, success metrics, and risks.
29+
30+
### Limitations
31+
32+
- **Linux ARM64 native lib is not in the published JAR.** Kotlin/Native plugin 2.3.21 doesn't support `linux aarch64` as a HOST target on the runners GitHub provides, so the cross-arch CI matrix excludes it. Linux ARM64 consumers (Raspberry Pi, AWS Graviton) cleanly fall back to the priority-50 Panama Vector provider — no functional regression, just no native speedup. Re-add when either the Kotlin/Native plugin gains the host or a self-hosted ARM64 runner is wired in.
33+
- **Shadow-jar consumers** using `com.gradleup.shadow:9.4.x` still need a `doLast` workaround to merge the `META-INF/services/sk.ainet.backend.api.kernel.KernelProvider` entries — see SKaiNET-transformers PR #88's `kllama-cli`/`skainet-cli` fix for the canonical implementation. Spring Boot apps consuming via Maven (BOOT-INF/lib/) are unaffected.
34+
535
## [0.21.0] - 2026-04-28
636

737
### Added

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ Add the core dependencies (Gradle Kotlin DSL):
1919

2020
```kotlin
2121
dependencies {
22-
implementation("sk.ainet.core:SKaiNET-lang-core:0.21.0")
23-
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.21.0")
22+
implementation("sk.ainet.core:SKaiNET-lang-core:0.22.0")
23+
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.22.0")
2424
}
2525
```
2626

@@ -137,10 +137,11 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,
137137

138138
---
139139

140-
## What's New in 0.21.0
140+
## What's New in 0.22.0
141141

142-
- **JVM CPU performance — Vector API SIMD across the board.** Pluggable `KernelProvider` SPI with priority-ordered lookup; FP32 matmul tile-blocked at **8.6×–10.8× over scalar**, Q4_K matmul fully SIMD-fused with inline dequant at **~30–73 GFLOPS** on Apple Silicon. Every quantized format we support (Q4_0, Q4_K, Q4_K MemSeg, Q6_K, Q8_0) is now SIMD'd to some degree.
143-
- **`ScratchPool` SPI and `TensorOps.permute(axes)`** — runtime workspace allocator for transient tensors and arbitrary-axis permutation.
142+
- **Native (FFM) CPU kernel provider — M5 milestone closed.** New `skainet-backend-native-cpu` module bundles a hand-tuned C shared library (`-O3 -ffast-math` auto-vectorized into AVX2 / NEON FMA) reachable via FFM downcalls. **4.17×–5.87× faster than Panama Vector on Q4_K matmul** at LLM-typical 1024²–4096² shapes; **1.55×–1.77× faster on FP32 SGEMM** at 256³–1024³. Auto-registers via ServiceLoader; `KernelRegistry.bestAvailable()` routes through native when the lib loads, falls through cleanly to the priority-50 Panama provider otherwise.
143+
- **Zero-copy MemSeg path for mmap'd Q4_K weights** — JVM-only `Q4KMemSegMatmulKernel` SPI sibling skips the staged `ByteArray → MemorySegment` copy that costs +20% wall-clock at 4096² shapes.
144+
- **Cross-arch shipping** — published JAR carries native libs for `linux-x86_64`, `macos-arm64`, and `windows-x86_64`. Linux ARM64 consumers cleanly fall back to Panama (Kotlin/Native host limitation tracked).
144145

145146
See [CHANGELOG.md](CHANGELOG.md) for the full release history.
146147

gradle.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
GROUP=sk.ainet.core
2-
VERSION_NAME=0.22.0-SNAPSHOT
2+
VERSION_NAME=0.22.0
33
POM_DESCRIPTION=SKaiNET
44

55
POM_URL=https://github.com/SKaiNET-developers/skainet/

0 commit comments

Comments
 (0)