Problem
ModelManager.importModel() and the model loading pipeline read entire files into memory via file.arrayBuffer(). For 2-8 GB LLM models, this spikes JS heap memory and can cause OOM crashes on constrained devices.
Current State
importModel() calls new Uint8Array(await file.arrayBuffer()) — full file in memory
ModelLoadContext.data: Uint8Array forces loaders to receive the full file
- Double-buffering: JS heap copy + WASM linear memory copy
Proposed Solution
- Add streaming interface to ModelLoadContext:
dataStream?: ReadableStream<Uint8Array>
- Update
importModel() to use file.stream() and pipe chunks to storage
- When LocalFileStorage is active, avoid copy entirely by passing the File handle
- Update backend loaders to support chunked writes to their WASM FS
Impact
- High for users downloading large LLMs (2+ GB)
- Medium complexity — requires interface changes across core + backends
From PR #370 review comments (greptile + coderabbit).
Problem
ModelManager.importModel()and the model loading pipeline read entire files into memory viafile.arrayBuffer(). For 2-8 GB LLM models, this spikes JS heap memory and can cause OOM crashes on constrained devices.Current State
importModel()callsnew Uint8Array(await file.arrayBuffer())— full file in memoryModelLoadContext.data: Uint8Arrayforces loaders to receive the full fileProposed Solution
dataStream?: ReadableStream<Uint8Array>importModel()to usefile.stream()and pipe chunks to storageImpact
From PR #370 review comments (greptile + coderabbit).