Merge pull request #186 from SKaiNET-developers/release/0.31.1

michalharakal · web-flow · commit a49fe3066c3f · 2026-06-17T19:48:02.000+02:00
Release 0.31.1 — transformer-core + publish guardrail
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,33 @@ version line is kept in lock-step with the underlying SKaiNET engine
 The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.31.1] — 2026-06-17
+
+Adds **`transformer-core`** — the framework NN primitives (attention, the KV-cache family, embedding,
+norms, RoPE, SwiGLU/GeGLU FFN, residual, linear projection) extracted from `llm-core` so they build on the
+**full Kotlin target matrix including `androidNative`** (32-bit + 64-bit ARM). `llm-core` re-exports it, so
+existing consumers are unaffected; ARM-native downstreams (e.g. on-device whisper) can now reuse the
+primitives instead of reimplementing them.
+
+### Added
+
+- **`transformer-core` module** (`sk.ainet.transformers:skainet-transformers-transformer-core`) — the
+  lang-core-only NN primitives, reusable on every target incl. `androidNativeArm32`/`androidNativeArm64`.
+  Depends only on `skainet-lang-core`. Added to the BOM. (#183)
+
+### Changed
+
+- **`llm-core` now `api`-depends on `transformer-core` and re-exports it** (no behaviour change). The NN
+  primitive sources moved out of `llm-core` into `transformer-core`; `dsl/decoder/*` stayed (it needs the
+  compile-opt-coupled `HybridTransformerBlock`). `MultiHeadAttention`'s diagnostic `dumpStats` is decoupled
+  via a settable `mhaStatSink` that `HybridTransformerBlock` wires to llm-core's platform `dumpStats`.
+
+### Notes
+
+- **Engine pin unchanged (`skainet = 0.31.0`).** `transformer-core` needs nothing new from the engine (only
+  `skainet-lang-core`, already in 0.31.0), so this patch ships against engine **0.31.0** — the one case the
+  transformers-`X.Y.Z` ↔ engine-`X.Y.Z` alignment is intentionally relaxed (additive + engine-independent).
+
 ## [0.31.0] — 2026-06-15
 
 Version-aligned with **SKaiNET 0.31.0**. Completes the eager board-decode path
diff --git a/README.md b/README.md
@@ -103,8 +103,13 @@ Honest status — see the project-status note at the top of this README.
 
 ## Current release
 
-The current release is **0.31.0** — version-aligned with **SKaiNET 0.31.0**.
-The headline is that the eager `NATIVE_OPTIMIZED` Gemma path now keeps the
+The current release is **0.31.1** (against **SKaiNET 0.31.0**). It adds
+**`transformer-core`** — the framework NN primitives (attention, KV-cache family,
+embedding, norms, RoPE, FFNs, linear projection) extracted out of `llm-core` so they
+build on the **full target matrix including `androidNative`** (32-bit + 64-bit ARM);
+`llm-core` re-exports it, so nothing changes for existing consumers, and ARM-native
+downstreams (e.g. on-device whisper) can reuse the primitives instead of reimplementing
+them. The 0.31.0 highlights still apply: the eager `NATIVE_OPTIMIZED` Gemma path keeps the
 **tied Q8_0 lm_head packed** (paired with SKaiNET 0.31.0's `ops.transpose` fix
 for all packed dtypes), and `GemmaNetworkLoader.load()` takes an optional
 `maxInferenceLen` to cap the KV cache for constrained devices — together
@@ -116,7 +121,7 @@ The recommended way to consume is via the BOM. It pins every published `skainet-
 
 ```kotlin
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.1"))
 
     // Versions resolved from the BOM:
     implementation("sk.ainet.transformers:skainet-transformers-core")
@@ -141,6 +146,7 @@ dependencies {
 | Module               | Purpose                                                                 |
 | -------------------- | ----------------------------------------------------------------------- |
 | `llm-api`            | Framework-neutral interfaces (`ChatModel`, `EmbeddingModel`, `ToolDefinition`) — Spring AI-shaped. |
+| `transformer-core`   | Framework NN primitives (attention, KV-cache family, embedding, norms, RoPE, FFNs, linear projection). `lang-core`-only → **all targets incl. `androidNative`**; re-exported by `llm-core`. |
 | `llm-core`           | `OptimizedLLMRuntime`, `ModelRegistry`, `UnifiedModelLoader`, shared abstractions. |
 | `llm-inference/<arch>` | Per-architecture network DSLs and weight loaders (`llama`, `gemma`, `qwen`, `apertus`, `bert`). |
 | `llm-runtime/<arch>` | Per-architecture runtime facades (`kllama`, `kgemma`, `kqwen`, `kapertus`). |
@@ -193,6 +199,15 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n
 
 See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference.
 
+## What's new in 0.31.1
+
+- **`transformer-core` module — NN primitives reusable on all targets incl. `androidNative`.** The
+  attention / KV-cache / embedding / norm / RoPE / FFN / linear-projection primitives were trapped in
+  `llm-core` (whose io/compile/backend deps lack `androidNative`); they only need `skainet-lang-core`
+  (which has it), so they're extracted into `transformer-core` and `llm-core` re-exports them. Existing
+  consumers are unaffected; ARM-native downstreams (on-device whisper, future models) reuse them instead of
+  reimplementing. Ships against engine **0.31.0** (additive, no engine change). (#183)
+
 ## What's new in 0.31.0
 
 - **Tied Q8_0 lm_head stays packed (eager `NATIVE_OPTIMIZED`).** FunctionGemma's
diff --git a/buildSrc/src/main/kotlin/sk/ainet/transformers/gradle/BomCoveragePlugin.kt b/buildSrc/src/main/kotlin/sk/ainet/transformers/gradle/BomCoveragePlugin.kt
@@ -37,6 +37,28 @@ class BomCoveragePlugin : Plugin<Project> {
                 )
             }
 
+            // Fail fast (at configuration time, not at Maven Central deploy time) when a NEW published
+            // module forgot its gradle.properties. Without POM_ARTIFACT_ID the artifact silently defaults
+            // to the bare project name (wrong coordinates / not the skainet-transformers-* convention);
+            // without POM_NAME, Maven Central rejects the deploy. This recurs on every new module — catch it.
+            val pomProblems = publishedPaths.mapNotNull { path ->
+                val p = project.project(path)
+                val missing = buildList {
+                    if (p.findProperty("POM_ARTIFACT_ID")?.toString().isNullOrBlank()) add("POM_ARTIFACT_ID")
+                    if (p.findProperty("POM_NAME")?.toString().isNullOrBlank()) add("POM_NAME")
+                }
+                if (missing.isEmpty()) null else "$path — missing ${missing.joinToString(" + ")}"
+            }
+            if (pomProblems.isNotEmpty()) {
+                throw GradleException(
+                    "[bom-coverage] Published module(s) are missing required POM properties — the Maven " +
+                        "Central deploy would fail:\n" +
+                        pomProblems.joinToString("\n") { "  - $it" } +
+                        "\nAdd a `gradle.properties` to each module with POM_ARTIFACT_ID + POM_NAME " +
+                        "(see `llm-core/gradle.properties`)."
+                )
+            }
+
             project.dependencies.constraints {
                 publishedPaths.forEach { add("api", project.project(it)) }
             }
diff --git a/docs/modules/ROOT/pages/reference/architecture.adoc b/docs/modules/ROOT/pages/reference/architecture.adoc
@@ -4,7 +4,8 @@
 == Module Structure
 
 ----
-llm-core                    Core abstractions (Tokenizer, InferenceRuntime, ModelRegistry)
+transformer-core            NN primitives (attention, KV-cache, embedding, norms, RoPE, FFNs) — all targets incl. androidNative
+llm-core                    Core abstractions (Tokenizer, InferenceRuntime, ModelRegistry); re-exports transformer-core
 llm-agent                   Chat templates, tool calling, AgentLoop, ChatSession
 llm-inference/
   llama/                    LLaMA/Qwen network definition and weight loading
diff --git a/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc b/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
@@ -25,7 +25,7 @@ In your `build.gradle.kts`:
 [source,kotlin]
 ----
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.1"))
 
     implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
     implementation("sk.ainet.transformers:skainet-transformers-agent")
@@ -41,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts):
     <dependency>
       <groupId>sk.ainet.transformers</groupId>
       <artifactId>skainet-transformers-bom</artifactId>
-      <version>0.31.0</version>
+      <version>0.31.1</version>
       <type>pom</type>
       <scope>import</scope>
     </dependency>
diff --git a/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc b/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
@@ -52,7 +52,7 @@ The pieces you need live in three modules:
 [source,kotlin]
 ----
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.1"))
 
     implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
     implementation("sk.ainet.transformers:skainet-transformers-agent")
diff --git a/gradle.properties b/gradle.properties
@@ -1,5 +1,5 @@
 GROUP=sk.ainet.transformers
-VERSION_NAME=0.31.0
+VERSION_NAME=0.31.1
 
 POM_DESCRIPTION=SKaiNET-transformers
 
diff --git a/transformer-core/gradle.properties b/transformer-core/gradle.properties
@@ -0,0 +1,2 @@
+POM_ARTIFACT_ID=skainet-transformers-transformer-core
+POM_NAME=skainet transformers transformer-core

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+POM_ARTIFACT_ID=skainet-transformers-transformer-core`
	`2`	`+POM_NAME=skainet transformers transformer-core`