Commit dc521ef

and

committed

feat: Add RAM strategy auto-detection (pinned/hybrid/mmap)

Automatically choose the best streaming backend based on available system RAM. Three strategies: - Pinned: pre-load all streamed layers to CPU pinned memory (fast) - Hybrid: pin as many layers as fit, mmap the rest from safetensors - Mmap: all layers loaded on demand from safetensors with staging buffers Key changes: - get_available_ram_bytes() reads MemAvailable from /proc/meminfo - _init_weight_streaming() detects strategy and initializes accordingly - _stream_load_layer() dispatches to pinned or mmap path - _mmap_load_to_gpu() loads from safetensors → staging → GPU - from_quantized() builds tensor name maps for mmap lookups - Staging buffers are CPU pinned, sized for largest streamed layer 6 new tests verify: default pinned, forced mmap, forced hybrid, mmap forward/backward, hybrid forward/backward, and gradient consistency between pinned and mmap paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

1 parent 48c2eca commit dc521efCopy full SHA for dc521ef

2 files changed

+425

-51

lines changed

bitsandbytes
- kbit_lora.py
tests
- test_checkpoint.py

2 files changed

+425

-51

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit dc521ef

2 files changed

2 files changed

Uh oh!

File tree

2 files changed

2 files changed

0 commit comments