|
| 1 | +# transformer-core |
| 2 | + |
| 3 | +Framework NN primitives — attention, the KV-cache family, embedding, norms, RoPE, SwiGLU/GeGLU FFN, |
| 4 | +residual, linear projection — extracted from `llm-core` so they build on the **full Kotlin target matrix |
| 5 | +including `androidNativeArm32/Arm64`** (the on-device ARM path). Depends only on `skainet-lang-core` |
| 6 | +(which has androidNative); no io/compile/backend deps. |
| 7 | + |
| 8 | +`llm-core` `api`-depends on this module and **re-exports** it, so existing consumers are unaffected. |
| 9 | +ARM-native consumers (e.g. `skainet-whisper-kmp`) depend on `transformer-core` directly and reuse |
| 10 | +KV-cache/attention instead of reimplementing. |
| 11 | + |
| 12 | +## Why |
| 13 | +`llm-core`'s primitives only need `lang-core`, but were trapped there: `llm-core`'s *other* deps |
| 14 | +(`io-gguf`, `io-core`, `compile-*`, `backend-cpu`) lack androidNative, so ARM-native consumers couldn't |
| 15 | +depend on it. The primitives are **dtype-agnostic** (just call `ops.*`), so this target generalization is |
| 16 | +orthogonal to the quant/dtype generalization (issue #178) — they meet cleanly at these primitives. |
| 17 | + |
| 18 | +## What moved (15 files, lang-core-only) |
| 19 | +`transformer/*` (KVCache, RoPE, ResidualAdd, MultiHeadAttention, GeGLUFFN, SwiGLUFFN, XIELUActivation, |
| 20 | +LayerScalarMul, LinearProjection, VoidDense), `layers/*` (Embedding*), `normalization/RMSNormalization`, |
| 21 | +`dsl/TransformerDsl`. **Kept in `llm-core`:** `dsl/decoder/*` (DecoderTransformerNetwork needs |
| 22 | +`apps.llm.HybridTransformerBlock`, which is compile-opt-coupled). |
| 23 | + |
| 24 | +One back-reference decoupled: `MultiHeadAttention`'s diagnostic `dumpStats` → a settable `mhaStatSink` |
| 25 | +(default no-op) that `HybridTransformerBlock` wires to llm-core's platform `dumpStats` (no behaviour lost). |
| 26 | + |
| 27 | +## Verified |
| 28 | +`:transformer-core:` compiles for jvm + androidNativeArm32 + arm64; `:llm-core:jvmTest` green (5/5) via |
| 29 | +the re-export. |
| 30 | + |
| 31 | +## Landing (for the maintainer) |
| 32 | +Branch `feature/transformer-core` was cut from `release/0.31.0`. To land on `develop` (which has #178's |
| 33 | +merged #179/#180): |
| 34 | +1. `git fetch origin && git rebase origin/develop` — **no conflicts expected on the moved files**: #178's |
| 35 | + merged work is in the model layer (`GemmaPackedWeights`) + engine (`ops.transpose` Q8_0/Q4_0), not these |
| 36 | + primitives. (Verified against local refs; re-check against fresh `develop`.) |
| 37 | +2. Build the full target matrix + `:llm-core:` tests; PR; CI-publish; bump the `skainet`/transformers pins. |
| 38 | +3. **Note for future quant work:** the pre-transpose-marker (#178 "Solution C") will land in |
| 39 | + `LinearProjection.kt`, which now lives **here**, not `llm-core`. And `RowDequantSource` + packed-weight |
| 40 | + packing (today in `sk.ainet.models.gemma`) are the next candidates to hoist into a shared `quant` layer |
| 41 | + or this module — that's what makes quant reusable across models *and* whisper. |
0 commit comments