[Feat] Add UCM store compression module and compression config parametersFeature store compress by xwLearnsLLM · Pull Request #938 · ModelEngine-Group/unified-cache-management

xwLearnsLLM · 2026-04-27T08:38:08Z

Purpose

This PR introduces a configurable compression module for the UCM store, designed to reduce TTFT (Time-To-First-Token) under SSD hit. By compressing stored BF16 tensors and supporting multi-threaded decompression, aim to lower I/O overhead and speed up cache access, improving end-to-end inference latency.

Modifications

Added three new configuration parameters under ucm_connector_config to control compression behavior:
- compress_ratio: Compression ratio (currently only 2.0x is supported)
- data_type: Target data type (currently only BF16 is supported)
- decompress_thread_num: Number of threads used for parallel decompression
Implemented full compression/decompression pipeline based on KVfold coding
Integrated the compression module into the existing UCM store pipeline
Added YAML configuration support for compression feature toggling and tuning

Test

Model: Qwen2.5-14B-Instruct
Hardware: Kunpeng 920 5250 + 4 × Ascend 910B4 (tensor parallel = 4)
Test Config: 0% hit on DRAM cache, 50% / 80% / 100% hit on local SSD.

Layer_wise (use_layerwise: true) :

input tokens	output tokens	batch size	odirect	TTFT/ms, 50% hit, w/o compression	TTFT/ms, 50% hit, with compression	TTFT reduction	TTFT/ms, 80% hit, w/o compression	TTFT/ms, 80% hit, with compression	TTFT reduction	TTFT/ms, 100% hit, w/o compression	TTFT/ms, 100% hit, with compression	TTFT reduction
4000	1	1	TRUE	357.99	341.97	4.47%	207.02	225.74	-9.04%	208.37	193.22	7.27%
8000	1	1	TRUE	531.6	578.86	-8.89%	304.06	321.29	-5.67%	350.01	331.22	5.37%
16000	1	1	TRUE	1236.44	1284.19	-3.86%	800.25	720.4	9.98%	611.74	532.57	12.94%
32000	1	1	TRUE	3278.62	3152.37	3.85%	2077.46	1807.59	12.99%	1204.62	959.57	20.34%
4000	1	8	TRUE	1339.06	1396.03	-4.25%	868.46	839.52	3.33%	1056.99	829.89	21.49%
8000	1	8	TRUE	2353.23	2373.6	-0.87%	1436.33	1356.39	5.57%	2024.04	1421.53	29.77%
16000	1	8	TRUE	5311.28	5548.89	-4.47%	3433.49	2829.16	17.60%	3808.22	2906.72	23.67%
32000	1	8	TRUE	13796.04	13660.42	0.98%	8353.08	7851.27	6.01%	6939.55	5274.13	24.00%
4000	1	16	TRUE	2280.48	2343.98	-2.78%	1428.27	1232.16	13.73%	2154.65	1342.75	37.68%
8000	1	16	TRUE	4414.05	4436.29	-0.50%	2639.71	2388.07	9.53%	4170.57	2646	36.56%
16000	1	16	TRUE	10144.97	9960.42	1.82%	6167.56	5084.08	17.57%	7170.8	4005.21	44.15%
32000	1	16	TRUE	25967.02	24714.92	4.82%	15550.23	12950.47	16.72%	12261.77	7504.62	38.80%

Block_wise (use_layerwise: false) :

input tokens	output tokens	batch size	odirect	TTFT/ms, 50% hit, w/o compression	TTFT/ms, 50% hit, with compression	TTFT reduction	TTFT/ms, 80% hit, w/o compression	TTFT/ms, 80% hit, with compression	TTFT reduction	TTFT/ms, 100% hit, w/o compression	TTFT/ms, 100% hit, with compression	TTFT reduction
4000	1	1	TRUE	423.87	406.85	4.02%	307.49	253.9	17.43%	260.84	198.13	24.04%
8000	1	1	TRUE	688.38	665.83	3.28%	507.52	419.18	17.41%	460.22	352.85	23.33%
16000	1	1	TRUE	1528.11	1479.79	3.16%	1083.55	874.61	19.28%	617.35	446.64	27.65%
32000	1	1	TRUE	3678.05	3426.42	6.84%	2327.42	2039.48	12.37%	1387.85	832.33	40.03%
4000	1	8	TRUE	1801.28	1673.41	7.10%	1261.3	1025.12	18.73%	1117.65	745.11	33.33%
8000	1	8	TRUE	3262.38	2952.3	9.50%	2344.01	1880.82	19.76%	2381.43	1314.63	44.80%
16000	1	8	TRUE	6982.66	6343.26	9.16%	4897.66	3862.16	21.14%	4157.12	2060.17	50.44%
32000	1	8	TRUE	16615.58	14960.37	9.96%	10797.82	8443.65	21.80%	7872.14	3896.58	50.50%
4000	1	16	TRUE	3181.24	2928.33	7.95%	2228.06	1775.39	20.32%	2402.1	1152.3	52.03%
8000	1	16	TRUE	6082.47	5437.54	10.60%	4213.28	3197.17	24.12%	4381.51	2386.19	45.54%
16000	1	16	TRUE	12832.2	11805.53	8.00%	8489.46	6721.34	20.83%	6231.51	3555.04	42.95%
32000	1	16	TRUE	29728.23	28228.75	5.04%	18745.21	15293.23	18.42%	13077.52	6831.96	47.76%

mag1c-h and others added 13 commits April 15, 2026 23:54

[feat] Create workspace for CompressStore

b0233ec

first commit of store_compress

beaacf9

fix bug: 压缩后块大小用浮点数计算->用整数计算，确保结果为4096的倍数

3d209de

添加单测

e40ce53

modefied pipeline

2b36bfc

Modify compress store to support both cache and POSIX

b217107

Fix the bug where cached store data becomes abnormal after compression.

e924b88

fix compression failure for data larger than 128KB

04333e5

fix the issue of overwriting the memory pool during compression

09ace9b

fix bug: requseted an excessively large memory poll

c2c9ab6

fix bug: incollect data format conversion

2bd523b

Add data type validation for config parameters

2781519

add kvflod

79d40b8

xwLearnsLLM requested review from Infinite666, harrisonyhq, mag1c-h, qyh111 and ygwpz as code owners April 27, 2026 08:38

xwLearnsLLM force-pushed the feature_store_compress branch from 0db0cb5 to fffdb66 Compare April 28, 2026 07:09

Adapt to the new develop cache load submission logic

73bc0c7

xwLearnsLLM force-pushed the feature_store_compress branch from fffdb66 to 73bc0c7 Compare April 28, 2026 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add UCM store compression module and compression config parametersFeature store compress#938

[Feat] Add UCM store compression module and compression config parametersFeature store compress#938
xwLearnsLLM wants to merge 14 commits intodevelopfrom
feature_store_compress

xwLearnsLLM commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xwLearnsLLM commented Apr 27, 2026

Purpose

Modifications

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants