You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Phase 5b consumer migration. Mirrors #122 / #123 / #125 for the wasm
browser entry point.
- Replaces `loadLlamaRuntimeWeights` + `LlamaRuntime` + `CpuAttentionBackend`
with `DecoderGgufWeightLoader.loadToMap` (sequential `Source` variant)
→ `LlamaNetworkLoader.fromWeights` → `OptimizedLLMRuntime` DIRECT mode.
- Wasm has no `MemorySegment`, so the converter step is skipped — the
loader uses `QuantPolicy.DEQUANTIZE_TO_FP32` (no change from before;
packed Q4/Q8 don't have a wasm-side fast path).
- Tokenizer load via `GGUFTokenizer.fromSource(source)` is unchanged
(sequential Source-friendly; not migrated to upstream byte-BPE in
this PR — wasm browser is FP32 + Llama by default, byte-BPE is a
separate concern).
- Return type loosened from `Pair<LlamaRuntimeInterface<*>, Tokenizer>`
to `Pair<InferenceRuntime<FP32>, Tokenizer>`.
`:llm-runtime:kllama:compileKotlinWasmJs` clean. JVM and core tests
unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments