title	KV Cache Offloading
subtitle	CPU and disk offloading integrations for vLLM in Dynamo

KV Cache Offloading

Dynamo supports multiple KV cache offloading backends for vLLM, allowing you to extend effective KV cache capacity beyond GPU memory using CPU RAM and disk storage. Each backend integrates through vLLM's connector interface and works with both aggregated and disaggregated serving.

Backend	Source
KVBM	Dynamo
LMCache	GitHub
FlexKV	GitHub

KVBM

KVBM (KV Block Manager) is Dynamo's built-in KV cache offloading system. It provides a three-layer architecture (LLM runtime, logical block management, NIXL transport) with support for CPU and disk cache tiers, and integrates natively with Dynamo's KV-aware routing and disaggregated serving.

Deployment	Launch Script
Aggregated	`agg_kvbm.sh`
Aggregated + KV routing	`agg_kvbm_router.sh`
Disaggregated (1P1D)	`disagg_kvbm.sh`
Disaggregated (2P2D)	`disagg_kvbm_2p2d.sh`
Disaggregated + KV routing	`disagg_kvbm_router.sh`

For configuration details, see the KVBM Guide.

LMCache

LMCache is an open-source KV cache engine that provides prefill-once, reuse-everywhere caching with multi-level storage backends (CPU RAM, local storage, Redis, GDS, InfiniStore/Mooncake).

Deployment	Launch Script
Aggregated	`agg_lmcache.sh`
Aggregated (multiprocess metrics)	`agg_lmcache_multiproc.sh`
Disaggregated	`disagg_lmcache.sh`

For configuration details, see the LMCache Integration Guide.

FlexKV

FlexKV is a scalable, distributed KV cache runtime developed by Tencent Cloud's TACO team. It supports multi-level caching (GPU, CPU, SSD), distributed KV cache reuse across nodes, and high-performance I/O via io_uring and GPUDirect Storage.

Deployment	Launch Script
Aggregated	`agg_flexkv.sh`
Aggregated + KV routing	`agg_flexkv_router.sh`
Disaggregated	`disagg_flexkv.sh`

For configuration details, see the FlexKV Integration Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV Cache Offloading

KVBM

LMCache

FlexKV

See Also

FilesExpand file tree

vllm-kv-offloading.md

Latest commit

History

vllm-kv-offloading.md

File metadata and controls

KV Cache Offloading

KVBM

LMCache

FlexKV

See Also