Skip to content

fix(memory): close triage gaps and add proof tests#579

Open
vsumner wants to merge 2 commits intomainfrom
codex/working-memory-triage-proof
Open

fix(memory): close triage gaps and add proof tests#579
vsumner wants to merge 2 commits intomainfrom
codex/working-memory-triage-proof

Conversation

@vsumner
Copy link
Copy Markdown
Collaborator

@vsumner vsumner commented Apr 23, 2026

This pull request focuses on improving the safety, correctness, and testability of working memory and knowledge synthesis features. It addresses several bug reports and review findings, particularly around sensitive data handling, event triggers, and regeneration logic. Key changes include sanitizing persisted worker tasks, refactoring memory persistence triggers for clarity and testability, and improving the debounce logic for knowledge synthesis regeneration. Comprehensive tests have been added for new and existing logic.

Working Memory Safety and Sanitization:

  • Worker task text persisted in working memory is now redacted and length-bounded using shared scrubbing helpers to prevent secrets or sensitive information from being stored. This applies to both standard and OpenCode worker spawns, with tests verifying secret redaction and truncation. [1] [2] [3] [4] [5] [6] [7] [8]
  • Cron error events now persist only scrubbed, bounded summaries to avoid leaking sensitive internals.

Memory Persistence Trigger Refactor:

  • Refactored the memory persistence trigger logic into a dedicated function (memory_persistence_trigger_kind) for clarity and reuse, with updated logic to prefer message count, then time, then event density. This is now thoroughly unit tested. [1] [2] [3] [4]

Knowledge Synthesis Regeneration Logic:

  • Extracted and clarified the debounce and version-check logic for knowledge synthesis regeneration into a pure function, with new tests ensuring correct handling of version changes, debounce windows, and clock skew. [1] [2] [3] [4]

Bug Fixes and Reporting:

  • Fixed silent error swallowing in inspect_prompt by logging and propagating errors per guidelines.
  • Improved task update event summaries to accurately describe field changes, not just status changes.

Compactor and Miscellaneous:

  • Added tests to ensure the compactor summary path remains toolless and single-turn, and that markdown headers are correctly stripped from summaries.

These changes collectively enhance the reliability, security, and maintainability of the system's memory and synthesis features.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Walkthrough

Refactors memory-persistence trigger logic; adds working-memory sanitization (redaction, encoded-secret fail-closed, truncation) and applies it to worker spawn, cron errors, and task messages; changes inspect_prompt to log and return errors on render failures; improves task-update summaries to report field-level deltas; expands tests.

Changes

Cohort / File(s) Summary
Working-memory sanitizers & helpers
src/secrets/scrub.rs
Adds WORKING_MEMORY_TASK_MAX_CHARS, scrub_working_memory_text, scrub_worker_task_for_memory: redact tool secrets, detect encoded leaks (fail-closed), and enforce truncation with UTF-8 safety.
Worker spawn / dispatch uses sanitized task
src/agent/channel_dispatch.rs, src/tools/spawn_worker.rs
Load tool_secret_pairs, compute sanitized_task via scrubber, and use scrubbed task for WorkerSpawned/WorkerStarted events, status text, logs, and returned messages; unit tests added for redaction and truncation.
Cron error reporting sanitized
src/cron/scheduler.rs
Route cron error text through scrubber and cap to 500 chars before persisting/emitting; tests verify redaction of plaintext and encoded secrets and truncation behavior.
Inspect prompt: fail fast and log
src/api/channels.rs
Replace silent .ok()/unwrap_or_default fallbacks with tracing::warn! and return INTERNAL_SERVER_ERROR on prompt/context render or metadata fetch failures; propagate errors instead of defaults.
Memory persistence trigger & checks
src/agent/channel.rs
Extract memory_persistence_trigger_kind helper returning trigger labels (message_count, time, event_density), compute cheap checks first (no DB access), query event density only when needed, and early-exit when no trigger applies; tests for trigger priority and edge cases.
Working memory rendering limits & tests
src/memory/working.rs
Enforce token-budget for “## Other Channels” section: return early on zero budget, avoid writing when header exceeds budget, iteratively append while within budget; add async tests for persistence, channel prefixing, timezone rollover, and budget limiting.
Task update summaries & snapshot
src/tools/task_update.rs, src/tasks/store.rs
Add previous_task to TaskUpdateResult (and derive PartialEq, Eq on TaskSubtask); task_update_memory_summary now diffs fields and emits appropriate summary: no-delta, status-only, or detailed deltas; tests updated/added.
Knowledge synthesis gating & compactor
src/agent/cortex.rs, src/agent/compactor.rs
Add pure predicate should_regenerate_knowledge_synthesis_state with stricter version check (current_version > last_version), explicit clock-skew-safe debounce using unix secs; refactor compactor agent creation into build_compaction_summary_agent and add tests for prompt policy and summary extraction.
Docs update
docs/design-docs/working-memory-triage.md
Mark checklist items R4, R12, R13, R16 as fixed; replace remediation notes with implemented outcome descriptions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • jamiepine
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: closing triage gaps (fixing identified issues) and adding comprehensive proof/tests for working memory safety and knowledge synthesis logic.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering all major areas including working memory sanitization, memory persistence triggers, knowledge synthesis regeneration, bug fixes, and test additions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/working-memory-triage-proof

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/secrets/scrub.rs Outdated
Comment thread src/agent/channel_dispatch.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the safety and correctness of the “working memory” + knowledge synthesis subsystems by ensuring sensitive/user-provided strings aren’t persisted verbatim, tightening regeneration/persistence triggering logic, and adding targeted unit tests to prevent regressions.

Changes:

  • Redacts + bounds worker task text and cron error summaries before persisting them into working memory.
  • Refactors memory persistence trigger selection and knowledge synthesis regeneration debounce/version logic into testable pure helpers.
  • Expands unit test coverage across working memory rendering, task update summaries, compactor invariants, and cron error persistence.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/tools/task_update.rs Adds task update delta summaries for working memory events + tests for summary behavior and done/outcome behavior.
src/tools/spawn_worker.rs Sanitizes worker task text before emitting “worker spawned (cortex)” working-memory events.
src/tasks/store.rs Extends task update result to include a “previous task” snapshot and adds tests validating snapshot pairing.
src/secrets/scrub.rs Introduces scrub_working_memory_text with secret/leak scrubbing + encoded-leak fail-closed behavior + truncation and tests.
src/memory/working.rs Adds token-budget enforcement when rendering the channel activity map and adds coverage for new rendering behaviors.
src/cron/scheduler.rs Scrubs/bounds cron error summaries before writing them to working memory + tests covering redaction/truncation.
src/api/channels.rs Stops swallowing errors in inspect_prompt; logs + propagates failures as 500s instead.
src/agent/cortex.rs Extracts knowledge synthesis regeneration decision logic into a pure helper and adds debounce/version tests.
src/agent/compactor.rs Adds tests ensuring compactor stays single-turn and toolless; tests summary header stripping.
src/agent/channel_dispatch.rs Scrubs/bounds worker spawn task text persisted to working memory (builtin + opencode paths) + tests.
src/agent/channel.rs Refactors memory persistence trigger selection into a helper and adds unit tests for trigger priority rules.
docs/design-docs/working-memory-triage.md Marks triage items as resolved and documents the fixes shipped in this PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/spawn_worker.rs Outdated
Comment thread src/agent/channel_dispatch.rs Outdated
Comment thread src/secrets/scrub.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
src/agent/compactor.rs (1)

457-474: Source-scan guard is reasonable, but slightly fragile.

Scanning the file's own source with include_str! to assert invariants works, but it has a couple of brittle edges worth noting:

  • source.find("#[cfg(test)]") assumes the tests module is the only/first such attribute in the file. Any future #[cfg(test)] helper placed above the tests module would silently shrink the scanned region and could hide a regression.
  • The assertion targets exact substrings (.tool_server_handle(, .default_max_turns(1)). A harmless formatting change (e.g., line break between .default_max_turns and (1), or a constant like const MAX_TURNS: usize = 1;) would fail the test without an actual behavior change.

A more robust alternative would be to factor the agent construction into a small helper returning the builder (or its config) and assert on that directly, but given this is an intentional belt-and-suspenders guard, the current form is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/compactor.rs` around lines 457 - 474, The current test
run_compaction_path_remains_toolless_and_one_turn brittlely scans the file
source with include_str! and substring searches; instead extract the
compactor/summary builder creation into a small helper (e.g., a function like
build_compactor_summary_builder or get_run_compaction_builder) that returns the
builder or its config, then update the test to call that helper and assert on
the builder/config (check absence of tool_server_handle and that
default_max_turns equals 1) rather than string-matching the source; update
references to run_compaction_source and include_str! to use the new helper in
assertions.
src/agent/cortex.rs (1)

5062-5100: Tests cover the intended invariants.

Newer-version requirement, debounce boundary (1_050 rejects, 1_060 accepts with debounce=60), and clock-skew safety (now < last_change) are all exercised. Consider adding one more case for debounce_secs == 0 with now == last_change if that's a reachable config, since the clamp-then->= path returns true there — useful to pin down intentionally vs. accidentally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/cortex.rs` around lines 5062 - 5100, Add a test that covers the
edge-case when debounce_secs == 0 and now == last_change to assert the function
returns true; specifically, in the test module that contains
knowledge_synthesis_trigger_* tests, add a case calling
should_regenerate_knowledge_synthesis_state(current_version > last_version,
last_version, last_change, 0, last_change) and assert true so the clamp-then->=
path is explicitly exercised and the behavior is pinned down.
src/agent/channel.rs (1)

3734-3766: Nit: double-call pattern could be folded into one evaluation.

The helper is called twice — once with None to decide whether to query the DB, then again with the resolved count. Since the message/time triggers don't depend on event_count_since_last, you can compute the trigger once and only consult the DB when the first result is None:

♻️ Proposed simplification
-        let event_count_since_last = if memory_persistence_trigger_kind(
-            self.message_count,
-            elapsed.as_secs(),
-            None,
-            &wm_config,
-        )
-        .is_none()
-        {
-            let since = chrono::Utc::now() - chrono::Duration::seconds(elapsed.as_secs() as i64);
-            match self
-                .deps
-                .working_memory
-                .count_events_since(self.id.as_ref(), since)
-                .await
-            {
-                Ok(count) => Some(count as usize),
-                Err(error) => {
-                    tracing::debug!(%error, "event density check failed, skipping");
-                    None
-                }
-            }
-        } else {
-            None
-        };
-
-        let Some(trigger) = memory_persistence_trigger_kind(
-            self.message_count,
-            elapsed.as_secs(),
-            event_count_since_last,
-            &wm_config,
-        ) else {
-            return;
-        };
+        let elapsed_secs = elapsed.as_secs();
+        let trigger = match memory_persistence_trigger_kind(
+            self.message_count,
+            elapsed_secs,
+            None,
+            &wm_config,
+        ) {
+            Some(kind) => kind,
+            None => {
+                let since = chrono::Utc::now() - chrono::Duration::seconds(elapsed_secs as i64);
+                let event_count = match self
+                    .deps
+                    .working_memory
+                    .count_events_since(self.id.as_ref(), since)
+                    .await
+                {
+                    Ok(count) => Some(count as usize),
+                    Err(error) => {
+                        tracing::debug!(%error, "event density check failed, skipping");
+                        None
+                    }
+                };
+                let Some(kind) = memory_persistence_trigger_kind(
+                    self.message_count,
+                    elapsed_secs,
+                    event_count,
+                    &wm_config,
+                ) else {
+                    return;
+                };
+                kind
+            }
+        };

Functionally equivalent; just avoids recomputing the message/time predicates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 3734 - 3766, The code calls
memory_persistence_trigger_kind twice (once with None to decide whether to call
deps.working_memory.count_events_since, then again with the resolved
event_count_since_last); fold this into one evaluation by first calling
memory_persistence_trigger_kind(self.message_count, elapsed.as_secs(), None,
&wm_config) and if that returns None then await
deps.working_memory.count_events_since(self.id.as_ref(), since) to get
event_count_since_last and re-run memory_persistence_trigger_kind only once with
the actual count, assigning the resulting Some(trigger) to trigger (preserving
the existing tracing::debug on DB error and the early return when trigger is
None).
src/api/channels.rs (1)

776-795: Consider including project.id context in aggregate-level warnings too.

Per-project list_repos / list_worktrees_with_repos failures correctly include project_id in the warning. For symmetry and easier triage, the aggregate-level renders (render_projects_context at Line 829, top-level list_projects at Line 767) could log the number of projects being processed so an operator can correlate a template-render failure with the specific input size.

This is purely a log-ergonomics nit — the current logs are already sufficient for correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/channels.rs` around lines 776 - 795, The aggregate-level warning logs
in render_projects_context and list_projects should include the number of
projects being processed for better triage; update the tracing::warn calls that
currently report template-render or aggregate failures to add a context field
(e.g., projects_count or projects_len) with projects.len() (or the collection
length) so operators can correlate failures with input size while keeping
existing messages and error fields intact.
src/tasks/store.rs (1)

1094-1119: Good snapshot-pair coverage; consider also asserting an unchanged field.

The test verifies both mutated fields (title, priority) on the previous_task/task pair. As a low-effort strengthening, asserting an unchanged field too (e.g. previous_task.status == result.task.status == TaskStatus::Backlog, or assigned_agent_id) would lock in that the "before" snapshot isn't accidentally populated from the updated row in future refactors.

🧪 Optional tightening
         assert_eq!(result.previous_task.title, "old title");
         assert_eq!(result.previous_task.priority, TaskPriority::Medium);
+        assert_eq!(result.previous_task.status, TaskStatus::Backlog);
+        assert_eq!(result.task.status, TaskStatus::Backlog);
         assert_eq!(result.task.title, "new title");
         assert_eq!(result.task.priority, TaskPriority::High);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/store.rs` around lines 1094 - 1119, The test
update_result_returns_applied_snapshot_pair currently asserts changed fields but
not an unchanged field, so add an assertion that a field not modified by the
update (e.g., previous_task.status and task.status both equal
TaskStatus::Backlog, or previous_task.assigned_agent_id ==
result.task.assigned_agent_id) to ensure the "previous_task" snapshot isn't
accidentally updated; locate the test function
update_result_returns_applied_snapshot_pair and add a concise assert_eq!
comparing the unchanged field on result.previous_task and result.task to the
expected value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/channel_dispatch.rs`:
- Around line 806-813: The code currently sanitizes the WorkerSpawned memory
event via sanitize_worker_memory_task but still emits or logs the raw task
elsewhere; replace all uses of the unsanitized task in
ProcessEvent::WorkerStarted payloads and tracing::info! calls with the sanitized
value (e.g., use the memory_task or a newly returned sanitized_task from
sanitize_worker_memory_task) so no raw task ever flows to
ProcessEvent::WorkerStarted, tracing::info!, or any other emitted events; apply
the same change to the other occurrence around the channel worker spawn logic
(the second block analogous to this one) to ensure consistency.

In `@src/secrets/scrub.rs`:
- Around line 262-278: redact_working_memory_encoded_leaks currently only calls
scan_for_leaks, which misses encoded forms of repo/tool secrets; update it to
also detect encoded variants of each secret from tool_secrets passed into
scrub_working_memory_text (e.g., base64, hex, percent-URL encodings) and treat a
match as a redact trigger. In practice modify scrub_working_memory_text to pass
tool_secrets into redact_working_memory_encoded_leaks (or call a new helper) and
implement detection by generating common encodings for each tool secret and
checking if any appear in text (or by decoding suspicious substrings and running
scrub_secrets/scan_for_leaks on the decoded content); if any encoded secret is
found, return the same "[WORKING_MEMORY_REDACTED:encoded-secret]" result. Ensure
function names referenced: scrub_working_memory_text,
redact_working_memory_encoded_leaks, scrub_secrets, and scan_for_leaks.

In `@src/tools/spawn_worker.rs`:
- Around line 525-530: The surrounding code logs and persists the unsanitized
task string (args.task) even though you created a sanitized memory_task; update
all logging and persistence calls to use the sanitized value instead of the raw
args.task. Specifically, replace uses of args.task in calls like
run_logger.log_worker_started(...) and any tracing/logging or worker_runs
persistence with memory_task (or a clearly named sanitized_task) so that
WorkingMemoryEventType::WorkerSpawned emit(...) already uses the sanitized text
and nothing else writes the original un-sanitized task; also audit any other
detached-worker persistence paths to ensure they consume
sanitize_worker_memory_task(...) output rather than args.task.

---

Nitpick comments:
In `@src/agent/channel.rs`:
- Around line 3734-3766: The code calls memory_persistence_trigger_kind twice
(once with None to decide whether to call
deps.working_memory.count_events_since, then again with the resolved
event_count_since_last); fold this into one evaluation by first calling
memory_persistence_trigger_kind(self.message_count, elapsed.as_secs(), None,
&wm_config) and if that returns None then await
deps.working_memory.count_events_since(self.id.as_ref(), since) to get
event_count_since_last and re-run memory_persistence_trigger_kind only once with
the actual count, assigning the resulting Some(trigger) to trigger (preserving
the existing tracing::debug on DB error and the early return when trigger is
None).

In `@src/agent/compactor.rs`:
- Around line 457-474: The current test
run_compaction_path_remains_toolless_and_one_turn brittlely scans the file
source with include_str! and substring searches; instead extract the
compactor/summary builder creation into a small helper (e.g., a function like
build_compactor_summary_builder or get_run_compaction_builder) that returns the
builder or its config, then update the test to call that helper and assert on
the builder/config (check absence of tool_server_handle and that
default_max_turns equals 1) rather than string-matching the source; update
references to run_compaction_source and include_str! to use the new helper in
assertions.

In `@src/agent/cortex.rs`:
- Around line 5062-5100: Add a test that covers the edge-case when debounce_secs
== 0 and now == last_change to assert the function returns true; specifically,
in the test module that contains knowledge_synthesis_trigger_* tests, add a case
calling should_regenerate_knowledge_synthesis_state(current_version >
last_version, last_version, last_change, 0, last_change) and assert true so the
clamp-then->= path is explicitly exercised and the behavior is pinned down.

In `@src/api/channels.rs`:
- Around line 776-795: The aggregate-level warning logs in
render_projects_context and list_projects should include the number of projects
being processed for better triage; update the tracing::warn calls that currently
report template-render or aggregate failures to add a context field (e.g.,
projects_count or projects_len) with projects.len() (or the collection length)
so operators can correlate failures with input size while keeping existing
messages and error fields intact.

In `@src/tasks/store.rs`:
- Around line 1094-1119: The test update_result_returns_applied_snapshot_pair
currently asserts changed fields but not an unchanged field, so add an assertion
that a field not modified by the update (e.g., previous_task.status and
task.status both equal TaskStatus::Backlog, or previous_task.assigned_agent_id
== result.task.assigned_agent_id) to ensure the "previous_task" snapshot isn't
accidentally updated; locate the test function
update_result_returns_applied_snapshot_pair and add a concise assert_eq!
comparing the unchanged field on result.previous_task and result.task to the
expected value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8248863-fac4-439e-af6b-788ed8194c45

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb07b8 and 1d95038.

📒 Files selected for processing (12)
  • docs/design-docs/working-memory-triage.md
  • src/agent/channel.rs
  • src/agent/channel_dispatch.rs
  • src/agent/compactor.rs
  • src/agent/cortex.rs
  • src/api/channels.rs
  • src/cron/scheduler.rs
  • src/memory/working.rs
  • src/secrets/scrub.rs
  • src/tasks/store.rs
  • src/tools/spawn_worker.rs
  • src/tools/task_update.rs

Comment thread src/agent/channel_dispatch.rs Outdated
Comment thread src/secrets/scrub.rs
Comment thread src/tools/spawn_worker.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/agent/channel.rs (1)

3724-3766: ⚠️ Potential issue | 🔴 Critical

Remove the stale message_interval == 0 check from the persistence trigger path.

check_memory_persistence() delegates trigger selection to memory_persistence_trigger_kind(), which uses WorkingMemoryConfig thresholds (persistence_message_threshold, persistence_time_threshold_secs, persistence_event_density_threshold). However, the early guard still returns when message_interval == 0, bypassing the new trigger system entirely. Users who set message_interval: 0 to disable the old cadence-based persistence will now lose all persistence triggers, including time and event-density checks. The config comments already acknowledge this field is legacy; remove the gate so the new thresholds control trigger behavior independently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 3724 - 3766, The early-return guard in
check_memory_persistence currently blocks the new trigger system by returning
when config.message_interval == 0; remove that condition so the guard only
checks config.enabled and self.resolved_settings.memory.persistence_enabled()
(i.e., stop using config.message_interval in the initial return), allowing
memory_persistence_trigger_kind(...) to handle message/time/event thresholds;
update the guard around self.deps.runtime_config.memory_persistence.load()
accordingly and leave the downstream call to memory_persistence_trigger_kind,
the fallback event-density check that uses
self.deps.working_memory.count_events_since, unchanged.
src/agent/channel_dispatch.rs (1)

758-781: ⚠️ Potential issue | 🟠 Major

Extract duplicate-detection key from task before sanitization.

The sanitization pipeline truncates tasks to 500 characters and redacts secrets/leaks, then stores this sanitized version in StatusBlock. However, duplicate detection still checks the original full task against the StatusBlock contents. For tasks longer than 500 characters, the truncated version in StatusBlock will not match the original task on the next duplicate check, allowing identical long tasks to spawn multiple workers.

This affects the builtin spawn path (line 781), OpenCode spawn (line 994), resumed OpenCode workers (line 1263), and resumed builtin workers (line 1393). The reservation mechanism uses the original task string but is released immediately after spawn succeeds, allowing the race condition to manifest.

Compute a stable duplicate-detection key (using the original task or a deterministic normalization) before sanitization, and pass both the key and sanitized display text to add_worker. Modify find_duplicate_worker_task to compare against the stored keys instead of the display text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_dispatch.rs` around lines 758 - 781, Compute a stable
duplicate-detection key from the original task BEFORE calling
crate::secrets::scrub::scrub_worker_task_for_memory (e.g., hash or normalized
string) and pass both that key and the sanitized display text into
StatusBlock::add_worker instead of only the sanitized text; update calls that
create workers (the builtin path around spawn_worker_task and the
OpenCode/resume paths mentioned) to supply the pre-sanitization key, and change
find_duplicate_worker_task to compare against the stored duplicate keys in
StatusBlock rather than the display text so truncated/redacted display text no
longer prevents duplicate detection.
🧹 Nitpick comments (3)
src/agent/compactor.rs (1)

34-43: tool_server_enabled is not wired into the builder.

CompactionPromptPolicy::tool_server_enabled is read only by the unit test; build_compaction_summary_agent never consumes it. If a future change flips the flag to true, the builder will silently keep producing a toolless agent, and the "toolless" test will still pass while giving a false sense of enforcement. Either drop the field and have the test assert the agent shape directly, or branch on the flag in the builder (e.g., attach a tool_server_handle when true).

♻️ Suggested tightening
 fn build_compaction_summary_agent(
     model: SpacebotModel,
     compactor_prompt: &str,
 ) -> rig::agent::Agent<SpacebotModel> {
     let policy = compaction_prompt_policy();
+    debug_assert!(
+        !policy.tool_server_enabled,
+        "compactor must remain toolless"
+    );
     AgentBuilder::new(model)
         .preamble(compactor_prompt)
         .default_max_turns(policy.max_turns)
         .build()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/compactor.rs` around lines 34 - 43, The builder in
build_compaction_summary_agent ignores
CompactionPromptPolicy::tool_server_enabled (returned by
compaction_prompt_policy()), so either wire that flag into the AgentBuilder or
remove the flag and update the test; specifically, fetch the policy via
compaction_prompt_policy(), check policy.tool_server_enabled, and if true attach
the tool server handle to the AgentBuilder (e.g., call the builder method that
sets tool_server_handle or equivalent) before calling .build(), otherwise leave
current toolless build; alternatively, remove tool_server_enabled from
CompactionPromptPolicy and update the unit test to assert the agent shape
directly.
src/tasks/store.rs (1)

1094-1121: Test covers the new snapshot pair; consider asserting unchanged status.

The test verifies title/priority deltas and that status remains Backlog. Optionally assert result.previous_status == result.task.status here to pin down the invariant that a non-status update leaves previous_status equal to the post-update status, preventing regressions in downstream diff logic that keys off previous_status.

♻️ Optional addition
         assert_eq!(result.task.priority, TaskPriority::High);
+        assert_eq!(result.previous_status, TaskStatus::Backlog);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/store.rs` around lines 1094 - 1121, The test
update_result_returns_applied_snapshot_pair should also assert that the previous
and post-update statuses are identical for non-status-only updates: add an
assertion comparing result.previous_task.status and result.task.status (e.g.,
assert_eq!(result.previous_task.status, result.task.status)) alongside the
existing status/title/priority checks to lock in the invariant used by
downstream diff logic.
src/agent/cortex.rs (1)

5062-5107: Good coverage of the four critical branches.

Tests exercise equality/older version (no regen), within/at debounce boundary, backward clock skew, and zero-debounce-at-instant. One optional addition worth considering: a case where last_change_unix_secs == 0 (atomic default) with a fresh version bump — to document/pin the behavior that a never-initialized timestamp with current > last immediately triggers regen once debounce elapses since the epoch (effectively "fire now"). Not a correctness issue given that bump_knowledge_synthesis_version always stores the timestamp before incrementing the version, but the guarantee is worth locking in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/cortex.rs` around lines 5062 - 5107, Add a test covering the case
when last_change_unix_secs == 0 to document behavior when the timestamp is
uninitialized: call should_regenerate_knowledge_synthesis_state with
current_version > last_version and last_change_unix_secs set to 0 (use a
debounce_secs value and a now_unix_secs such that now >= debounce_secs) and
assert that regen triggers appropriately; reference the function
should_regenerate_knowledge_synthesis_state and, optionally, mention
bump_knowledge_synthesis_version in the test name or comment to show this
scenario arises from the atomic default behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/agent/channel_dispatch.rs`:
- Around line 758-781: Compute a stable duplicate-detection key from the
original task BEFORE calling crate::secrets::scrub::scrub_worker_task_for_memory
(e.g., hash or normalized string) and pass both that key and the sanitized
display text into StatusBlock::add_worker instead of only the sanitized text;
update calls that create workers (the builtin path around spawn_worker_task and
the OpenCode/resume paths mentioned) to supply the pre-sanitization key, and
change find_duplicate_worker_task to compare against the stored duplicate keys
in StatusBlock rather than the display text so truncated/redacted display text
no longer prevents duplicate detection.

In `@src/agent/channel.rs`:
- Around line 3724-3766: The early-return guard in check_memory_persistence
currently blocks the new trigger system by returning when
config.message_interval == 0; remove that condition so the guard only checks
config.enabled and self.resolved_settings.memory.persistence_enabled() (i.e.,
stop using config.message_interval in the initial return), allowing
memory_persistence_trigger_kind(...) to handle message/time/event thresholds;
update the guard around self.deps.runtime_config.memory_persistence.load()
accordingly and leave the downstream call to memory_persistence_trigger_kind,
the fallback event-density check that uses
self.deps.working_memory.count_events_since, unchanged.

---

Nitpick comments:
In `@src/agent/compactor.rs`:
- Around line 34-43: The builder in build_compaction_summary_agent ignores
CompactionPromptPolicy::tool_server_enabled (returned by
compaction_prompt_policy()), so either wire that flag into the AgentBuilder or
remove the flag and update the test; specifically, fetch the policy via
compaction_prompt_policy(), check policy.tool_server_enabled, and if true attach
the tool server handle to the AgentBuilder (e.g., call the builder method that
sets tool_server_handle or equivalent) before calling .build(), otherwise leave
current toolless build; alternatively, remove tool_server_enabled from
CompactionPromptPolicy and update the unit test to assert the agent shape
directly.

In `@src/agent/cortex.rs`:
- Around line 5062-5107: Add a test covering the case when last_change_unix_secs
== 0 to document behavior when the timestamp is uninitialized: call
should_regenerate_knowledge_synthesis_state with current_version > last_version
and last_change_unix_secs set to 0 (use a debounce_secs value and a
now_unix_secs such that now >= debounce_secs) and assert that regen triggers
appropriately; reference the function
should_regenerate_knowledge_synthesis_state and, optionally, mention
bump_knowledge_synthesis_version in the test name or comment to show this
scenario arises from the atomic default behavior.

In `@src/tasks/store.rs`:
- Around line 1094-1121: The test update_result_returns_applied_snapshot_pair
should also assert that the previous and post-update statuses are identical for
non-status-only updates: add an assertion comparing result.previous_task.status
and result.task.status (e.g., assert_eq!(result.previous_task.status,
result.task.status)) alongside the existing status/title/priority checks to lock
in the invariant used by downstream diff logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3b0c05db-19da-440c-a9ce-897215062cbb

📥 Commits

Reviewing files that changed from the base of the PR and between 1d95038 and eda6807.

📒 Files selected for processing (8)
  • src/agent/channel.rs
  • src/agent/channel_dispatch.rs
  • src/agent/compactor.rs
  • src/agent/cortex.rs
  • src/api/channels.rs
  • src/secrets/scrub.rs
  • src/tasks/store.rs
  • src/tools/spawn_worker.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/tools/spawn_worker.rs
  • src/secrets/scrub.rs
  • src/api/channels.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants