Skip to content

fix(CoSTEER): rebind RAG trace cursor to evolving_trace identity (#1398)#1399

Open
voidborne-d wants to merge 1 commit into
microsoft:mainfrom
voidborne-d:fix/1398-costeer-rag-trace-cursor
Open

fix(CoSTEER): rebind RAG trace cursor to evolving_trace identity (#1398)#1399
voidborne-d wants to merge 1 commit into
microsoft:mainfrom
voidborne-d:fix/1398-costeer-rag-trace-cursor

Conversation

@voidborne-d
Copy link
Copy Markdown

@voidborne-d voidborne-d commented Apr 28, 2026

Closes #1398.

Bug

`CoSTEERRAGStrategyV2.current_generated_trace_count` is an instance attribute on the strategy, but `generate_knowledge()` interprets it as an index into whichever `evolving_trace` list is passed in.

`CoSTEER.develop()` constructs a strategy once (`rdagent/components/coder/CoSTEER/init.py:53`) and reuses it across every `develop()` call. Each call builds a fresh `RAGEvoAgent` whose `evolving_trace` is a brand-new list. When that fresh trace happens to reach the same length the cursor was advanced to in a prior call, line 366 short-circuits:

```python
if len(evolving_trace) == self.current_generated_trace_count:
return None
```

…dropping repair feedback that the next loop body would have written into `success_task_to_knowledge_dict`. The next repair round then doesn't know which tasks already passed and may re-implement the whole group instead of only repairing the failed task — wasting LLM budget and risking regression of previously-successful code (issue #1398).

Fix

Bind the cursor to `id(evolving_trace)` and reset it when (a) the trace identity changes (new `develop()` call) or (b) the cursor is greater than the current trace length (defensive against truncation / fresh-trace reuse).

```diff
class CoSTEERRAGStrategyV2(CoSTEERRAGStrategy):
def init(self, settings: CoSTEERSettings, *args, **kwargs) -> None:
super().init(*args, **kwargs)
self.current_generated_trace_count = 0

  •    self._generated_trace_identity: int | None = None
       self.settings = settings
    

    def generate_knowledge(...):

  •    trace_identity = id(evolving_trace)
    
  •    if (
    
  •        getattr(self, "_generated_trace_identity", None) != trace_identity
    
  •        or self.current_generated_trace_count > len(evolving_trace)
    
  •    ):
    
  •        self._generated_trace_identity = trace_identity
    
  •        self.current_generated_trace_count = 0
    
  •    if len(evolving_trace) == self.current_generated_trace_count:
           return None
    

```

`getattr(... None)` handles instances rebuilt without going through `init` (resume / pickle paths).

V1 is unchanged — it raises `NotImplementedError` immediately and is documented as deprecated.

Tests

`test/utils/coder/test_CoSTEER_RAG_cursor.py` adds four offline regressions:

  1. `test_fresh_trace_with_same_length_is_ingested` — exact reproducer of CoSTEER RAG trace cursor can skip fresh repair feedback #1398. Fails on `main` (early-return drops the new trace), passes here.
  2. `test_extending_same_trace_only_processes_new_steps` — guards against re-ingesting trace[0] when trace is extended in place.
  3. `test_truncated_trace_resets_cursor` — covers the `cursor > len(trace)` reset path. Fails on `main`, passes here.
  4. `test_idempotent_call_on_same_trace_returns_none` — keeps the fast-path no-op intact.

Tests are marked `@pytest.mark.offline` and avoid LLM/knowledge-base setup by passing trace steps with empty `sub_tasks`/`sub_workspace_list` so the inner loop is a no-op; visit counters on the steps prove the outer loop ran.

Local gates

```text
$ pytest test/utils/coder/test_CoSTEER_RAG_cursor.py -v
4 passed, 2 warnings in 1.86s

$ pytest (without the fix on main): 2 failed, 2 passed (confirms the regression test bites)

$ ruff check --no-fix rdagent/components/coder/CoSTEER/knowledge_management.py
no new errors introduced (110 pre-existing → 110)

$ black --check --target-version py311 -l 120 test/utils/coder/test_CoSTEER_RAG_cursor.py
All done! ✨ 🍰 ✨

$ isort --check-only test/utils/coder/test_CoSTEER_RAG_cursor.py rdagent/components/coder/CoSTEER/knowledge_management.py
clean
```

Diff is +136 / -0 (13 in source, 123 in the new test file).


🤖 Worked on by an AI agent. The cursor logic is small but adversarial-looking; happy to fold in any review notes.


📚 Documentation preview 📚: https://RDAgent--1399.org.readthedocs.build/en/1399/

CoSTEERRAGStrategyV2.current_generated_trace_count is an instance-level
cursor, but generate_knowledge() interprets it as an index into whichever
evolving_trace list is passed in. CoSTEER.develop() reuses a single
strategy instance across calls (rdagent/components/coder/CoSTEER/__init__.py
line 53), so a fresh trace whose length happens to match the stale
cursor was silently short-circuited at the len()==cursor early-return,
dropping repair feedback that should have been ingested.

Bind the cursor to id(evolving_trace) and reset it when (a) the trace
identity changes, or (b) the cursor is greater than the current trace
length (defensive against truncation/resume edge cases).

Adds test/utils/coder/test_CoSTEER_RAG_cursor.py with four offline
regressions covering: fresh-trace-of-stale-length re-ingest, extension
only processing new steps, truncation reset, and idempotent same-trace
recall.

Closes microsoft#1398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoSTEER RAG trace cursor can skip fresh repair feedback

1 participant