Add DeepSeek prefix-cache stability mode#194
Open
sternelee wants to merge 1 commit into
Open
Conversation
Adapt Reasonix Pillar 1 design for jcode's profile=deepseek path: - New prefix_cache_stable module with preflight checks, turn-end tool result truncation (3k token cap), and post-usage fold decisions at 50%/70%/80% thresholds. - CacheTracker gains cache-hit-rate tracking (record_usage, cache_hit_summary) so sessions can monitor prefix-cache effectiveness. - Agent messages_for_provider truncates oversized tool results when deepseek profile is active, without touching persisted session history. - Turn loop emits preflight warnings before API calls and periodic cache-hit summaries every 5 turns.
There was a problem hiding this comment.
Pull request overview
Adds a DeepSeek-specific “prefix-cache stability mode” intended to improve DeepSeek automatic prefix-cache hit rates by preflighting near-limit requests, truncating large tool results for provider requests (without mutating persisted history), and tracking provider-reported cache hit rates.
Changes:
- Introduces
prefix_cache_stablemodule with DeepSeek activation detection, request token estimation + preflight warnings, and tool-result truncation for API-bound messages. - Extends
CacheTrackerwith cumulative cache hit/miss tracking and a hit-rate summary string. - Hooks the new mode into the agent turn loop (preflight warnings + periodic cache-hit summaries) and message preparation (tool-result truncation).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/prefix_cache_stable.rs | New DeepSeek stability utilities: preflight estimation, post-usage decisions, tool-result truncation, and tests. |
| src/lib.rs | Exposes the new prefix_cache_stable module. |
| src/cache_tracker.rs | Adds cache hit/miss accounting and summary helpers to track prefix-cache effectiveness. |
| src/agent/turn_loops.rs | Emits DeepSeek preflight warnings and periodic cache-hit summaries; records cache usage per turn. |
| src/agent.rs | Truncates oversized tool results in messages_for_provider() when DeepSeek stability mode is active. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+527
to
+535
| let (truncated, truncate_count) = | ||
| prefix_cache_stable::truncate_tool_results_for_api(&messages); | ||
| if truncate_count > 0 { | ||
| logging::info(&format!( | ||
| "Prefix-cache mode: truncated {} tool results for API", | ||
| truncate_count | ||
| )); | ||
| } | ||
| truncated |
Comment on lines
+560
to
+568
| let messages = if prefix_cache_stable::is_prefix_cache_stable_mode() { | ||
| let (truncated, truncate_count) = | ||
| prefix_cache_stable::truncate_tool_results_for_api(&messages); | ||
| if truncate_count > 0 { | ||
| logging::info(&format!( | ||
| "Prefix-cache mode: truncated {} tool results for API (session path)", | ||
| truncate_count | ||
| )); | ||
| } |
Comment on lines
+529
to
+534
| // Record cache usage for prefix-cache hit-rate tracking | ||
| self.cache_tracker.record_usage(usage_cache_read, usage_input.unwrap_or(0)); | ||
| if prefix_cache_stable::is_prefix_cache_stable_mode() | ||
| && self.cache_tracker.usage_turn_count() % 5 == 0 | ||
| { | ||
| logging::info(&format!("Prefix-cache stats: {}", self.cache_tracker.cache_hit_summary())); |
Comment on lines
+35
to
+36
| /// Max chars for a tool result after turn-end truncation (chars / 4 heuristic) | ||
| const TURN_END_RESULT_CAP_CHARS: usize = TURN_END_RESULT_CAP_TOKENS * 4; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adapt Reasonix Pillar 1 design for jcode's profile=deepseek path:
New prefix_cache_stable module with preflight checks, turn-end tool result truncation (3k token cap), and post-usage fold decisions at 50%/70%/80% thresholds.
CacheTracker gains cache-hit-rate tracking (record_usage, cache_hit_summary) so sessions can monitor prefix-cache effectiveness.
Agent messages_for_provider truncates oversized tool results when deepseek profile is active, without touching persisted session history.
Turn loop emits preflight warnings before API calls and periodic cache-hit summaries every 5 turns.
Need help on this PR? Tag
@codesmithwith what you need.