Skip to content

【Hackathon 10th Spring No.53】RFC: Head-wise SWA Recycle Design [cfx]#1362

Open
jonny-cloudforge wants to merge 1 commit into
PaddlePaddle:masterfrom
CloudForge-Solutions:rfc/h10-053-headwise-swa-cfx3
Open

【Hackathon 10th Spring No.53】RFC: Head-wise SWA Recycle Design [cfx]#1362
jonny-cloudforge wants to merge 1 commit into
PaddlePaddle:masterfrom
CloudForge-Solutions:rfc/h10-053-headwise-swa-cfx3

Conversation

@jonny-cloudforge
Copy link
Copy Markdown

PR types

RFC

PR changes

Others

Description

提交 Hackathon 10th Spring 个人挑战赛 No.53 的设计文档:FastDeploy KV Cache 离散管理及 AppendAttention 性能优化

任务原文:【Hackathon 10th】任务合集 §No.53

设计内容拆为两个 PR

PR 内容 验收(任务规则书原文)
PR1 CacheManagerV1 支持 head-wise 离散 block_idx + 及时 SWA 回收 ERNIE-4.5-21B-A3B-Paddle,相同显存固定 IO,recycle ON vs OFF 吞吐 +30%
PR2 AppendAttention 离散布局 fused kernel(消除 dual-pass) H/B 卡上,1D vs 2D block_idx,TTFT 与 TBT 均 +5%

与已有 PR 的关系

RFC 结构对齐 community 模板

8 章节齐全:概述 / 飞桨现状 / 业内方案调研(vLLM v1 + SGLang + TRT-LLM 三方对比)/ 对比分析 / 设计思路与实现方案(伪代码)/ 测试和验收 / 影响面 / 排期。

调研:设计 ≈ 30:60,关键模块用伪代码呈现(CacheManager free list、ResourceManagerV1 recycle 接口、AppendAttention fused kernel)。

实现进度

实现 PR 在 CloudForge-Solutions/FastDeploy 上推进,PR1 与 PR2 将在本 RFC 落地后陆续提交至 PaddlePaddle/FastDeploy

作者声明

本 RFC 与后续 PR1/PR2 的设计与实现均为提交者独立完成。#6702 仅作为社区参考背景列出,本设计在 V1 路径上重写,不携带未授权的 Co-authored-by 署名。未来若有其他贡献者实际参与代码,将按提交粒度补充署名。

cc @luotao1 @CSWYF3634076

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 2, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants