docs: note K/N pread random-access fix in 0.23.0; drop dead SKaiNET-LLM links

michalharakal · claude · michalharakal · commit 2d3cdd9c7c1c · 2026-05-02T19:20:29.000+02:00
CHANGELOG: add 0.23.0 entries for PR #591 — `PosixPreadRandomAccessSource` under Added and the GGUF >2 GiB load failure under Fixed. README: remove the SKaiNET-LLM ecosystem-table row and the matching Explore row; SKaiNET-LLM repo is 404. Reroute the LLM-inference entry to SKaiNET-transformers, which is the active LLM application layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,9 +7,11 @@
 ### Added
 
 - **`TensorDataFactory.placeholder(shape, dtype)`** — returns a `TensorData` whose underlying primitive array materializes lazily on first read, instead of allocating a `FloatArray(shape.volume)` eagerly. The default interface implementation falls back to `zeros`, preserving behavior for any custom factory; `DenseTensorDataFactory` overrides with `LazyZeroFloatArrayTensorData` / `LazyZeroIntArrayTensorData`. `ExecutionContext.placeholder(...)` exposes the same path at the `Tensor` level. (PR #588)
+- **`PosixPreadRandomAccessSource` for Kotlin/Native** — new public class in `skainet-io-core`'s `nativeMain` source set wrapping POSIX `pread(2)`. `pread` is positional and atomic, so concurrent reads from different positions are safe without locking. Companion `open(path)` returns `null` on open/stat failure to match the JVM `JvmRandomAccessSource.open(...)` behaviour, letting callers cleanly fall back to the legacy sequential reader if needed. Covers `macosArm64`, `linuxX64`, `linuxArm64`, `iosArm64`, `iosSimulatorArm64` — every target in the default `nativeMain` source set on this module. 11 `nativeTest` cases pin the contract (size, partial reads, offset/length variants, EOF/argument validation, idempotent close, missing-file null return). (PR #591)
 
 ### Fixed
 
+- **Kotlin/Native consumers couldn't load GGUFs larger than ~2 GiB** — `sk.ainet.io.gguf.createRandomAccessSource(filePath)` on the native target was a placeholder `actual fun … = null`, forcing every K/N caller (`StreamingGGUFReader.open(...)` via the gguf-specific factory, every `*NetworkLoader.fromGguf(...)` path, `LlamaWeightLoader`) to fall through to the legacy reader, which slurps the entire file into a single `ByteArray`. Kotlin arrays cap at `Int.MAX_VALUE` bytes (~2 GiB), so any GGUF over ~1.9 GiB threw `IllegalStateException: Can't create an array of size 2147483648`. Practical impact: macOS / Linux / iOS native builds couldn't open Q8 models above ~1B parameters or Q4 models above ~3B — the JVM target had no such cap because `JvmRandomAccessSource` was already implemented. The `skainet-io-gguf` factory's native actual now delegates to the new `PosixPreadRandomAccessSource` (see *Added* above) and returns the same `null` sentinel on open/stat failure, so existing fall-back code paths remain valid. Verified on macOS arm64 against `Qwen3-1.7B-Q8_0.gguf` (~1.8 GiB), which previously OOMed at construction time. (Issue #589, PR #591)
 - **DSL eagerly allocated zero tensors for every Linear / Conv1d / Conv2d, OOMing real-model loaders** — `NetworkBuilder.kt`'s `createLinear`, `DenseImpl`, `Conv1dImpl`, and `Conv2dImpl` paths called `tensorDataFactory.zeros<T, V>(shape, kClass)` eagerly to satisfy each module's constructor whenever the user had not provided initial weights or bias. Downstream loaders always build the network first and only then substitute weights via `WeightMapper.applyWeights`, so the eager zeros were always immediately discarded — but they determined the JVM's peak heap footprint. For `unsloth/Apertus-8B-Instruct-2509-GGUF` (Q4_K_S, 4.7 GB on disk) that was ~27 GB of FP32 zeros allocated and thrown away. Switched every eager-init call site to the new `placeholder(...)` API; the lazy fires only if a caller actually reads the tensor, which never happens on the substitution path because `parameter.value =` swaps the entire `Tensor`. Verified against the real Apertus-8B Q4_K_S GGUF: `ApertusNetworkLoader.fromGguf().load<FP32, Float>(ctx)` now succeeds in 12 GB heap (previously OOMed at 12 GB), constructs all 35 top-level modules in 13 s. Same fix benefits Gemma / Llama / Qwen / Voxtral DSL paths transparently. (Issue #587, PR #588)
 
 ## [0.22.2] - 2026-05-02
diff --git a/README.md b/README.md
@@ -78,7 +78,6 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,
 
 | Project | Description |
 |---|---|
-| [SKaiNET-LLM](https://github.com/SKaiNET-developers/SKaiNET-LLM) | Llama, Gemma, and BERT inference runtimes |
 | [SKaiNET-transformers](https://github.com/SKaiNET-developers/SKaiNET-transformers) | Pre-built transformer architectures and layers |
 | [SKaiNET-examples](https://github.com/SKaiNET-developers/SKaiNET-examples) | Sample projects and integration demos |
 
@@ -90,7 +89,7 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,
 |---|---|
 | Examples and sample projects | [SKaiNET-examples](https://github.com/SKaiNET-developers/SKaiNET-examples) |
 | Interactive notebooks | [SKaiNET-notebook](https://github.com/SKaiNET-developers/SKaiNET-notebook) |
-| LLM inference (Llama, Gemma) | [SKaiNET-LLM](https://github.com/SKaiNET-developers/SKaiNET-LLM) |
+| LLM inference (Llama, Gemma, Qwen) | [SKaiNET-transformers](https://github.com/SKaiNET-developers/SKaiNET-transformers) |
 
 ---