Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

## [Unreleased]

## [0.23.0] - 2026-05-02

### Added

- **`TensorDataFactory.placeholder(shape, dtype)`** — returns a `TensorData` whose underlying primitive array materializes lazily on first read, instead of allocating a `FloatArray(shape.volume)` eagerly. The default interface implementation falls back to `zeros`, preserving behavior for any custom factory; `DenseTensorDataFactory` overrides with `LazyZeroFloatArrayTensorData` / `LazyZeroIntArrayTensorData`. `ExecutionContext.placeholder(...)` exposes the same path at the `Tensor` level. (PR #588)
- **`PosixPreadRandomAccessSource` for Kotlin/Native** — new public class in `skainet-io-core`'s `nativeMain` source set wrapping POSIX `pread(2)`. `pread` is positional and atomic, so concurrent reads from different positions are safe without locking. Companion `open(path)` returns `null` on open/stat failure to match the JVM `JvmRandomAccessSource.open(...)` behaviour, letting callers cleanly fall back to the legacy sequential reader if needed. Covers `macosArm64`, `linuxX64`, `linuxArm64`, `iosArm64`, `iosSimulatorArm64` — every target in the default `nativeMain` source set on this module. 11 `nativeTest` cases pin the contract (size, partial reads, offset/length variants, EOF/argument validation, idempotent close, missing-file null return). (PR #591)

### Fixed

- **Kotlin/Native consumers couldn't load GGUFs larger than ~2 GiB** — `sk.ainet.io.gguf.createRandomAccessSource(filePath)` on the native target was a placeholder `actual fun … = null`, forcing every K/N caller (`StreamingGGUFReader.open(...)` via the gguf-specific factory, every `*NetworkLoader.fromGguf(...)` path, `LlamaWeightLoader`) to fall through to the legacy reader, which slurps the entire file into a single `ByteArray`. Kotlin arrays cap at `Int.MAX_VALUE` bytes (~2 GiB), so any GGUF over ~1.9 GiB threw `IllegalStateException: Can't create an array of size 2147483648`. Practical impact: macOS / Linux / iOS native builds couldn't open Q8 models above ~1B parameters or Q4 models above ~3B — the JVM target had no such cap because `JvmRandomAccessSource` was already implemented. The `skainet-io-gguf` factory's native actual now delegates to the new `PosixPreadRandomAccessSource` (see *Added* above) and returns the same `null` sentinel on open/stat failure, so existing fall-back code paths remain valid. Verified on macOS arm64 against `Qwen3-1.7B-Q8_0.gguf` (~1.8 GiB), which previously OOMed at construction time. (Issue #589, PR #591)
- **DSL eagerly allocated zero tensors for every Linear / Conv1d / Conv2d, OOMing real-model loaders** — `NetworkBuilder.kt`'s `createLinear`, `DenseImpl`, `Conv1dImpl`, and `Conv2dImpl` paths called `tensorDataFactory.zeros<T, V>(shape, kClass)` eagerly to satisfy each module's constructor whenever the user had not provided initial weights or bias. Downstream loaders always build the network first and only then substitute weights via `WeightMapper.applyWeights`, so the eager zeros were always immediately discarded — but they determined the JVM's peak heap footprint. For `unsloth/Apertus-8B-Instruct-2509-GGUF` (Q4_K_S, 4.7 GB on disk) that was ~27 GB of FP32 zeros allocated and thrown away. Switched every eager-init call site to the new `placeholder(...)` API; the lazy fires only if a caller actually reads the tensor, which never happens on the substitution path because `parameter.value =` swaps the entire `Tensor`. Verified against the real Apertus-8B Q4_K_S GGUF: `ApertusNetworkLoader.fromGguf().load<FP32, Float>(ctx)` now succeeds in 12 GB heap (previously OOMed at 12 GB), constructs all 35 top-level modules in 13 s. Same fix benefits Gemma / Llama / Qwen / Voxtral DSL paths transparently. (Issue #587, PR #588)

## [0.22.2] - 2026-05-02

### Fixed
Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Add the core dependencies (Gradle Kotlin DSL):
```kotlin
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.22.2"))
implementation(platform("sk.ainet:skainet-bom:0.23.0"))

implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
Expand Down Expand Up @@ -78,7 +78,6 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,

| Project | Description |
|---|---|
| [SKaiNET-LLM](https://github.com/SKaiNET-developers/SKaiNET-LLM) | Llama, Gemma, and BERT inference runtimes |
| [SKaiNET-transformers](https://github.com/SKaiNET-developers/SKaiNET-transformers) | Pre-built transformer architectures and layers |
| [SKaiNET-examples](https://github.com/SKaiNET-developers/SKaiNET-examples) | Sample projects and integration demos |

Expand All @@ -90,7 +89,7 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,
|---|---|
| Examples and sample projects | [SKaiNET-examples](https://github.com/SKaiNET-developers/SKaiNET-examples) |
| Interactive notebooks | [SKaiNET-notebook](https://github.com/SKaiNET-developers/SKaiNET-notebook) |
| LLM inference (Llama, Gemma) | [SKaiNET-LLM](https://github.com/SKaiNET-developers/SKaiNET-LLM) |
| LLM inference (Llama, Gemma, Qwen) | [SKaiNET-transformers](https://github.com/SKaiNET-developers/SKaiNET-transformers) |

---

Expand Down Expand Up @@ -144,15 +143,16 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine,

---

## What's New in 0.22.2
## What's New in 0.23.0

- **`sk.ainet:skainet-bom` now resolves from Maven Central.** The umbrella BOM was previously published at the wrong coordinates (`sk.ainet.core:skainet-bom`), so consumers following the standard `platform(...)` import pattern — and downstream BOMs like `sk.ainet.transformers:skainet-transformers-bom` that import it transitively — got 404s from Central. Hotfix; no API or behavior changes. (Issue #584)
- **Real-model GGUFs no longer OOM at network construction.** The DSL pre-allocated zero-filled `FloatArray(shape.volume)` for every Linear / Conv weight at module-creation time, even though downstream loaders overwrite those zeros immediately. For an Apertus-8B Q4_K_S GGUF (4.7 GB on disk) that was ~27 GB of FP32 zeros allocated and thrown away — OOMed at 12 GB heap. New `TensorDataFactory.placeholder(...)` API; every eager `zeros(...)` call site in the network builders routes through it. Lazy materialization fires only if a caller actually reads the tensor (which the load path never does). Verified end-to-end against `unsloth/Apertus-8B-Instruct-2509-GGUF`: now loads in 12 GB heap. Same fix benefits Gemma / Llama / Qwen / Voxtral DSL paths transparently. (Issue #587, PR #588)
- **Kotlin/Native: GGUFs over ~2 GiB now load.** `createRandomAccessSource(filePath)` had no native actual; K/N consumers fell through to the legacy slurp-into-`ByteArray` reader, which capped at `Int.MAX_VALUE` bytes (~2 GiB). Practical impact: macOS / Linux / iOS native couldn't open Q8 models above ~1B parameters or Q4 above ~3B. New POSIX-`pread`-backed `PosixPreadRandomAccessSource` covers `macosArm64`, `linuxX64`, `linuxArm64`, `iosArm64`, `iosSimulatorArm64`. (Issue #589, PR #591)

## What's New in 0.22.0
### Recent releases

- **Native (FFM) CPU kernel provider — M5 milestone closed.** New `skainet-backend-native-cpu` module bundles a hand-tuned C shared library (`-O3 -ffast-math` auto-vectorized into AVX2 / NEON FMA) reachable via FFM downcalls. **4.17×–5.87× faster than Panama Vector on Q4_K matmul** at LLM-typical 1024²–4096² shapes; **1.55×–1.77× faster on FP32 SGEMM** at 256³–1024³. Auto-registers via ServiceLoader; `KernelRegistry.bestAvailable()` routes through native when the lib loads, falls through cleanly to the priority-50 Panama provider otherwise.
- **Zero-copy MemSeg path for mmap'd Q4_K weights** — JVM-only `Q4KMemSegMatmulKernel` SPI sibling skips the staged `ByteArray → MemorySegment` copy that costs +20% wall-clock at 4096² shapes.
- **Cross-arch shipping** — published JAR carries native libs for `linux-x86_64`, `macos-arm64`, and `windows-x86_64`. Linux ARM64 consumers cleanly fall back to Panama (Kotlin/Native host limitation tracked).
- **0.22.2** — `sk.ainet:skainet-bom` now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)
- **0.22.1** — `StreamingShardedSafeTensorsReader.loadTensorStorageMapped` for zero-copy reads of multi-shard tensors above the 2 GB JVM `ByteArray` limit. (PR #582)
- **0.22.0** — Native (FFM) CPU kernel provider: **4–6× faster Q4_K matmul, 1.5–1.8× FP32 SGEMM** vs Panama Vector; auto-selected via `KernelRegistry.bestAvailable()`. (PR #571)

See [CHANGELOG.md](CHANGELOG.md) for the full release history.

Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GROUP=sk.ainet.core
VERSION_NAME=0.22.2
VERSION_NAME=0.23.0
POM_DESCRIPTION=SKaiNET

POM_URL=https://github.com/SKaiNET-developers/skainet/
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
package sk.ainet.io

import kotlinx.cinterop.ExperimentalForeignApi
import kotlinx.cinterop.addressOf
import kotlinx.cinterop.alloc
import kotlinx.cinterop.convert
import kotlinx.cinterop.memScoped
import kotlinx.cinterop.ptr
import kotlinx.cinterop.toKString
import kotlinx.cinterop.usePinned
import platform.posix.O_RDONLY
import platform.posix.errno
import platform.posix.fstat
import platform.posix.pread
import platform.posix.stat
import platform.posix.strerror

/**
* Native [RandomAccessSource] backed by POSIX `pread(2)`.
*
* `pread` is positional and atomic — it does not advance any shared seek
* pointer — so concurrent reads from different positions are safe without
* locking. [close] is single-shot.
*
* Used on macOS, iOS, and Linux native targets (which all share the
* `nativeMain` source set in this module). Android uses a separate JNI
* actual; JS / Wasm don't have a viable `pread` equivalent and continue
* to fall back to the legacy GGUF reader.
*/
@OptIn(ExperimentalForeignApi::class)
public class PosixPreadRandomAccessSource private constructor(
private val fd: Int,
override val size: Long
) : RandomAccessSource {

private var closed = false

override fun readAt(position: Long, length: Int): ByteArray {
require(position >= 0) { "Position must be non-negative: $position" }
require(length >= 0) { "Length must be non-negative: $length" }
require(position + length <= size) {
"Read beyond end of file: position=$position, length=$length, size=$size"
}
if (length == 0) return ByteArray(0)

val buffer = ByteArray(length)
val bytesRead = readAt(position, buffer, 0, length)
return if (bytesRead < length) buffer.copyOf(bytesRead) else buffer
}

override fun readAt(position: Long, buffer: ByteArray, offset: Int, length: Int): Int {
require(position >= 0) { "Position must be non-negative: $position" }
require(offset >= 0) { "Offset must be non-negative: $offset" }
require(length >= 0) { "Length must be non-negative: $length" }
require(offset + length <= buffer.size) {
"Buffer overflow: offset=$offset, length=$length, buffer.size=${buffer.size}"
}
check(!closed) { "Source is closed" }
if (length == 0) return 0

return buffer.usePinned { pinned ->
var totalRead = 0
while (totalRead < length) {
val n = pread(
fd,
pinned.addressOf(offset + totalRead),
(length - totalRead).convert(),
(position + totalRead).convert()
).toInt()
if (n < 0) {
val cause = strerror(errno)?.toKString() ?: "errno=$errno"
error("pread failed at offset ${position + totalRead}: $cause")
}
if (n == 0) break // EOF
totalRead += n
}
totalRead
}
}

override fun close() {
if (closed) return
closed = true
platform.posix.close(fd)
}

public companion object {
/**
* Open [path] for read-only random access. Returns `null` if the
* file cannot be opened or stat'd — matches [JvmRandomAccessSource]
* behaviour, letting consumers fall back to the legacy reader.
*/
public fun open(path: String): PosixPreadRandomAccessSource? = memScoped {
val fd = platform.posix.open(path, O_RDONLY)
if (fd < 0) return@memScoped null
val st = alloc<stat>()
if (fstat(fd, st.ptr) != 0) {
platform.posix.close(fd)
return@memScoped null
}
PosixPreadRandomAccessSource(fd, st.st_size.toLong())
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
package sk.ainet.io

import kotlinx.io.buffered
import kotlinx.io.files.Path
import kotlinx.io.files.SystemFileSystem
import kotlinx.io.files.SystemTemporaryDirectory
import kotlinx.io.write
import kotlin.test.AfterTest
import kotlin.test.BeforeTest
import kotlin.test.Test
import kotlin.test.assertContentEquals
import kotlin.test.assertEquals
import kotlin.test.assertFailsWith
import kotlin.test.assertNull
import kotlin.test.assertTrue

class PosixPreadRandomAccessSourceTest {

private val expected = ByteArray(8192) { (it and 0xFF).toByte() } // 0..255 repeating
private lateinit var path: Path

@BeforeTest
fun setUp() {
path = Path(SystemTemporaryDirectory, "pread-test-${kotlin.random.Random.nextLong()}.bin")
SystemFileSystem.sink(path).buffered().use { it.write(expected) }
}

@AfterTest
fun tearDown() {
if (SystemFileSystem.exists(path)) SystemFileSystem.delete(path)
}

@Test
fun open_reports_correct_size() {
val src = PosixPreadRandomAccessSource.open(path.toString())!!
try {
assertEquals(expected.size.toLong(), src.size)
} finally {
src.close()
}
}

@Test
fun read_at_zero_returns_prefix() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
val got = src.readAt(0, 16)
assertContentEquals(expected.copyOfRange(0, 16), got)
}
}

@Test
fun read_at_arbitrary_offset_returns_slice() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
val got = src.readAt(1234, 256)
assertContentEquals(expected.copyOfRange(1234, 1234 + 256), got)
}
}

@Test
fun read_at_end_returns_suffix() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
val got = src.readAt(expected.size - 32L, 32)
assertContentEquals(expected.copyOfRange(expected.size - 32, expected.size), got)
}
}

@Test
fun read_into_buffer_reports_bytes_read() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
val buf = ByteArray(64)
val n = src.readAt(100L, buf, 0, 64)
assertEquals(64, n)
assertContentEquals(expected.copyOfRange(100, 164), buf)
}
}

@Test
fun read_into_buffer_with_offset() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
val buf = ByteArray(128)
val n = src.readAt(50L, buf, offset = 32, length = 64)
assertEquals(64, n)
assertContentEquals(expected.copyOfRange(50, 114), buf.copyOfRange(32, 96))
// Bytes outside the requested window must remain zero.
for (i in 0 until 32) assertEquals(0, buf[i])
for (i in 96 until 128) assertEquals(0, buf[i])
}
}

@Test
fun read_past_end_throws() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
assertFailsWith<IllegalArgumentException> { src.readAt(expected.size - 1L, 16) }
}
}

@Test
fun negative_position_throws() {
PosixPreadRandomAccessSource.open(path.toString())!!.use { src ->
assertFailsWith<IllegalArgumentException> { src.readAt(-1L, 4) }
}
}

@Test
fun read_after_close_throws() {
val src = PosixPreadRandomAccessSource.open(path.toString())!!
src.close()
assertFailsWith<IllegalStateException> {
src.readAt(0L, ByteArray(4), 0, 4)
}
}

@Test
fun close_is_idempotent() {
val src = PosixPreadRandomAccessSource.open(path.toString())!!
src.close()
src.close() // must not throw
assertTrue(true)
}

@Test
fun open_missing_file_returns_null() {
val missing = Path(SystemTemporaryDirectory, "definitely-does-not-exist-${kotlin.random.Random.nextLong()}.bin")
assertNull(PosixPreadRandomAccessSource.open(missing.toString()))
}
}
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
package sk.ainet.io.gguf

import sk.ainet.io.PosixPreadRandomAccessSource
import sk.ainet.io.RandomAccessSource

/**
* Native implementation of [createRandomAccessSource].
* Native implementation of [createRandomAccessSource] using POSIX `pread(2)`.
*
* Returns null as native random file access is not yet implemented.
* Callers should fall back to legacy GGUFReader which loads the full file.
*
* Future: Could implement using POSIX pread() for efficient random access.
* Returns `null` if the file cannot be opened (missing, permission denied,
* etc.), matching the JVM actual's contract so callers can fall back to the
* legacy sequential reader.
*/
public actual fun createRandomAccessSource(filePath: String): RandomAccessSource? = null
public actual fun createRandomAccessSource(filePath: String): RandomAccessSource? =
PosixPreadRandomAccessSource.open(filePath)
Loading