Skip to content

Commit dc521ef

Browse files
TimDettmersclaude
andcommitted
feat: Add RAM strategy auto-detection (pinned/hybrid/mmap)
Automatically choose the best streaming backend based on available system RAM. Three strategies: - Pinned: pre-load all streamed layers to CPU pinned memory (fast) - Hybrid: pin as many layers as fit, mmap the rest from safetensors - Mmap: all layers loaded on demand from safetensors with staging buffers Key changes: - get_available_ram_bytes() reads MemAvailable from /proc/meminfo - _init_weight_streaming() detects strategy and initializes accordingly - _stream_load_layer() dispatches to pinned or mmap path - _mmap_load_to_gpu() loads from safetensors → staging → GPU - from_quantized() builds tensor name maps for mmap lookups - Staging buffers are CPU pinned, sized for largest streamed layer 6 new tests verify: default pinned, forced mmap, forced hybrid, mmap forward/backward, hybrid forward/backward, and gradient consistency between pinned and mmap paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 48c2eca commit dc521ef

File tree

2 files changed

+425
-51
lines changed

2 files changed

+425
-51
lines changed

0 commit comments

Comments
 (0)