You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DYNAMIC_UNBOUND support for portable runtime: lazy KV cache allocation
Enable DYNAMIC_UNBOUND tensors in the portable runtime, allowing KV cache
buffers to be dynamically managed rather than statically memory-planned.
This is the architectural foundation for pay-as-you-go memory allocation
in ExecuTorch LLM inference.
Core changes:
- DynamicAllocator interface with allocate/reallocate/free
- PalDynamicAllocator default impl (PAL-backed, 2x growth policy)
- TrackingDynamicAllocator for memory stats observability
- MemoryManager gains 4th slot for DynamicAllocator (backward compatible)
- TensorImpl gains dynamic_allocator_ and capacity_bytes_ fields
- TensorImpl::internal_resize_contiguous handles DYNAMIC_UNBOUND resize
- tensor_parser_portable.cpp: remove DYNAMIC_UNBOUND rejection, wire up
allocator at load time for tensors with no memory-planned data
- method.cpp: FreeCall frees dynamic memory; destructor cleans up all
- Module API auto-creates PalDynamicAllocator (DYNAMIC_UNBOUND just works)
Export changes:
- MarkDynamicUnboundPass marks KV cache buffers as DYNAMIC_UNBOUND
- --lazy_kv_cache flag for Llama export
Co-authored-by: Claude <noreply@anthropic.com>
0 commit comments