feat(dsl): lazy zero-init for parameter placeholders by michalharakal · Pull Request #588 · SKaiNET-developers/SKaiNET

michalharakal · 2026-05-02T16:27:32Z

Summary

Closes #587.

The DSL eagerly allocates zero FloatArrays for every Linear / Conv1d / Conv2d / DenseImpl parameter at module construction time. Any downstream loader (LlamaNetworkLoader, GemmaNetworkLoader, ApertusNetworkLoader, …) builds the network first and only then substitutes weights via WeightMapper.applyWeights, so the eager zeros are always immediately discarded — but they determine the JVM's peak heap footprint. For Apertus-8B (32 layers × 6 projections × ~14k × 4k FP32 + 131k × 4k embed + …) that's ~27 GB of zeros allocated and thrown away — anything under 32 GB heap OOMs at NetworkBuilder.kt:652 before a single weight is loaded.

Fix

Add TensorDataFactory.placeholder(shape, dtype) returning a TensorData whose underlying primitive array materializes lazily on first read. DenseTensorDataFactory overrides with LazyZeroFloatArrayTensorData / LazyZeroIntArrayTensorData, which implement FloatArrayTensorData<T> / IntArrayTensorData<T> backed by a by lazy { ... } delegate. The default interface implementation falls back to zeros, preserving behavior for any custom factory.

Switch every eager-init call site in NetworkBuilder.kt (createLinear, DenseImpl.create, Conv1dImpl.create, Conv2dImpl.create) and the matching ExecutionContext.zeros(...) paths to call placeholder(...) instead. Behavior is strictly unchanged for any caller that reads the tensor — the lazy materializes to zeros on first access and is cached. For the WeightMapper substitution path, the placeholder's lazy never fires because parameter.value = swaps the entire Tensor, GC'ing the placeholder unread.

Verification

✅ :skainet-lang:skainet-lang-core:jvmTest — all 614 tests across 80 suites green.
✅ New PlaceholderTensorDataTest (8 cases) pins the contract: shape-only access without materialization, materialize-to-zeros on first read with parity to zeros(), write-through, buffer caching, instance independence, FP32/FP16/Int32/Int8 paths.
✅ End-to-end against unsloth/Apertus-8B-Instruct-2509-GGUF (Q4_K_S, 4.7 GB on disk) via the downstream ApertusRealGgufLoadingTest.kt: ApertusNetworkLoader.fromGguf().load<FP32, Float>(ctx) now succeeds in 12 GB heap (previously OOMed at 12 GB), constructs all 35 top-level modules in 13 s.

Knock-on impact

Combined with SKaiNET-transformers cleanup commit 8a7e0ff (which removed ApertusQuantizedRuntime), there was no working memory-efficient path to run Apertus-8B Q4_K_S end-to-end on a normal-sized JVM. This PR unblocks OptimizedLLMRuntime + apertusNetwork() as the canonical quantized-Apertus path.

Same fix applies transparently to Gemma, Llama, Qwen, Voxtral — every downstream model that uses the DSL benefits.

Test plan

Unit-test contract for placeholder (PlaceholderTensorDataTest, 8 cases)
Existing skainet-lang-core suite (614 tests, 0 regressions)
Real-model integration test against Apertus-8B Q4_K_S — ApertusNetworkLoader.fromGguf().load() no longer OOMs
Reviewer sanity-check: confirm WeightParameter.value = tensor setter is the only path WeightMapper uses (i.e. the lazy never accidentally fires before substitution)

🤖 Generated with Claude Code

The DSL's createLinear / Conv1d / Conv2d / DenseImpl construction paths called `tensorDataFactory.zeros<T, V>(shape, kClass)` to satisfy each module's constructor whenever the user had not provided initial weights or bias. The allocation was eager — a full `FloatArray(shape.volume)` materialized at module-construction time. For real-world transformers loaded via downstream weight loaders the call sequence is always: 1. Build the empty network (Llama / Gemma / Apertus / Qwen / ... `*NetworkLoader → *Network(metadata)`), eagerly allocating zeros for every Linear's weights and bias. 2. Load weights from disk (~5 GB raw bytes for an 8B Q4_K_S model). 3. Substitute via `WeightMapper.applyWeights`, which sets `parameter.value = loadedTensor`. The eager zeros are now garbage. For Apertus-8B (32 layers, 4096 hidden, ~14k FFN, 131k vocab) the eager zeros amount to ~27 GB of FP32 — peak heap ~32 GB just to construct + populate the model. Anything under that OOMs at NetworkBuilder.kt:652 during step 1, before weights are even read. Fix: introduce `TensorDataFactory.placeholder(shape, dtype)`, returning a `TensorData` whose underlying primitive array materializes lazily on first read. The default interface implementation falls back to `zeros` (any custom factory keeps existing behavior); `DenseTensorDataFactory` overrides with `LazyZeroFloatArrayTensorData` / `LazyZeroIntArrayTensorData` which back `FloatArrayTensorData<T>` / `IntArrayTensorData<T>` with a `by lazy { ... }` delegate. Int8 falls back to `zeros` (eager byte allocation is rarely the dominant cost on real models). Switch every eager-init call site in `NetworkBuilder.kt` (`createLinear`, `DenseImpl.create`, `Conv1dImpl.create`, `Conv2dImpl.create`) plus the matching `ExecutionContext.zeros(...)` paths to call `placeholder(...)` instead. Behavior is strictly unchanged for any caller that *reads* the tensor — the lazy materializes to zeros on first access and is cached. For the WeightMapper substitution path, the placeholder's lazy never fires because `parameter.value =` swaps the entire `Tensor`, GC'ing the placeholder unread. Verified end-to-end against unsloth/Apertus-8B-Instruct-2509-GGUF (Q4_K_S, 4.7 GB on disk) via the downstream `SKaiNET-transformers/llm-inference/apertus/.../ApertusRealGgufLoadingTest.kt`: `ApertusNetworkLoader.fromGguf().load<FP32, Float>(ctx)` now succeeds in 12 GB heap (previously OOMed at 12 GB), constructs all 35 top-level modules in 13 s. Tests: - New `PlaceholderTensorDataTest` (8 cases) pins the contract: shape-only access, materialize-to-zeros on first read, write-through, buffer caching, instance independence, FP32 / FP16 / Int32 paths, Int8 fallback. - Full `:skainet-lang:skainet-lang-core:jvmTest` (614 tests) green. Closes #587. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-02T16:29:32Z

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

Operator documentation: docs/modules/operators/_generated_/
JSON schema output: operators.json

Artifacts:

Download the documentation-preview-588 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

- Quickstart import now pins skainet-bom:0.23.0. - "What's New" rewritten for 0.23.0: placeholder API + DSL OOM fix (PR #588) and the K/N pread random-access fix (PR #591). Older 0.22.0 / 0.22.2 highlights moved out of the README; CHANGELOG.md remains the canonical full history (link already in place). - BOM caveat about 0.22.2 being the first correctly-coordinated publish is retained — still actionable for anyone trying to import older BOMs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal merged commit 75b82e2 into develop May 2, 2026
10 checks passed

michalharakal deleted the feature/dsl-lazy-zero-init branch May 2, 2026 16:35

michalharakal mentioned this pull request May 2, 2026

release/0.23.0: lazy parameter init + GGUF unsigned metadata #590

Merged

This was referenced May 2, 2026

chore(deps): bump skainet to 0.23.0 SKaiNET-developers/SKaiNET-transformers#101

Merged

Kotlin/Native inference produces deterministic nonsense across model families and runtime paths (JVM unaffected) SKaiNET-developers/SKaiNET-transformers#104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dsl): lazy zero-init for parameter placeholders#588

feat(dsl): lazy zero-init for parameter placeholders#588
michalharakal merged 1 commit intodevelopfrom
feature/dsl-lazy-zero-init

michalharakal commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 2, 2026

Summary

Fix

Verification

Knock-on impact

Test plan

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant