Skip to content

Commit 1416ac6

Browse files
simbasimba
authored andcommitted
feat: add memory-aware model partitioning framework
- New ModelProfiler.swift: reads config.json, measures weight files (follows HF Hub symlinks), computes memory requirements (weights + KV cache + 20% overhead), and outputs a PartitionPlan with strategy (fullGPU/swapAssisted/layerPartitioned/tooLarge) - New --info flag: dry-run profiler prints formatted memory analysis report and exits without loading the model - New --gpu-layers option: accepts 'auto' or integer, ready for future GPU/CPU layer splitting (Phase 2) - Pre-load profiling: automatically detects overcommit ratio and sets MLX cache limits (2MB cache for swap-assisted mode to let OS manage page caching, inspired by Flash-MoE research) - Enhanced /health endpoint: includes partition data (strategy, overcommit_ratio, weight/kv/total GB, GPU layers, estimated tok/s) - Ready event JSON: includes partition data for downstream integration - Rename main.swift -> Server.swift (required by Swift compiler when adding second source file with @main attribute)
1 parent 4d4ade2 commit 1416ac6

2 files changed

Lines changed: 677 additions & 13 deletions

File tree

0 commit comments

Comments
 (0)