Skip to content

[codex] Optimize DeepSeek V4 UCM hit path#942

Draft
wuhuxiao wants to merge 1 commit intodevelopfrom
codex/deepseek-v4-ucm-hit-latency-modelengine
Draft

[codex] Optimize DeepSeek V4 UCM hit path#942
wuhuxiao wants to merge 1 commit intodevelopfrom
codex/deepseek-v4-ucm-hit-latency-modelengine

Conversation

@wuhuxiao
Copy link
Copy Markdown
Contributor

@wuhuxiao wuhuxiao commented Apr 29, 2026

Summary

  • Add DeepSeek V4 packed/group0 UCM stores so multi-block external hits only read full tail state for the boundary block.
  • Add DeepSeek V4 capture validation tooling and developer validation docs.
  • Keep timing diagnostics behind UCM_DEEPSEEK_V4_TIMING=1.

Validation

  • python3 -m py_compile ucm/integration/vllm/ucm_connector.py scripts/verify_deepseek_v4_ucm_capture.py
  • git diff --check origin/develop..HEAD -- ucm/integration/vllm/ucm_connector.py docs/source/index.md docs/source/developer-guide/deepseek_v4_ucm_validation.md scripts/verify_deepseek_v4_ucm_capture.py
  • 1k DeepSeek V4 UCM smoke test: second request hit external: 3, 0.53s; group0 load wait <= ~0.0005s; packed load wait <= ~0.0028s.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants