fix(CoSTEER): rebind RAG trace cursor to evolving_trace identity (#1398)#1399
Open
voidborne-d wants to merge 1 commit into
Open
fix(CoSTEER): rebind RAG trace cursor to evolving_trace identity (#1398)#1399voidborne-d wants to merge 1 commit into
voidborne-d wants to merge 1 commit into
Conversation
CoSTEERRAGStrategyV2.current_generated_trace_count is an instance-level cursor, but generate_knowledge() interprets it as an index into whichever evolving_trace list is passed in. CoSTEER.develop() reuses a single strategy instance across calls (rdagent/components/coder/CoSTEER/__init__.py line 53), so a fresh trace whose length happens to match the stale cursor was silently short-circuited at the len()==cursor early-return, dropping repair feedback that should have been ingested. Bind the cursor to id(evolving_trace) and reset it when (a) the trace identity changes, or (b) the cursor is greater than the current trace length (defensive against truncation/resume edge cases). Adds test/utils/coder/test_CoSTEER_RAG_cursor.py with four offline regressions covering: fresh-trace-of-stale-length re-ingest, extension only processing new steps, truncation reset, and idempotent same-trace recall. Closes microsoft#1398
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1398.
Bug
`CoSTEERRAGStrategyV2.current_generated_trace_count` is an instance attribute on the strategy, but `generate_knowledge()` interprets it as an index into whichever `evolving_trace` list is passed in.
`CoSTEER.develop()` constructs a strategy once (`rdagent/components/coder/CoSTEER/init.py:53`) and reuses it across every `develop()` call. Each call builds a fresh `RAGEvoAgent` whose `evolving_trace` is a brand-new list. When that fresh trace happens to reach the same length the cursor was advanced to in a prior call, line 366 short-circuits:
```python
if len(evolving_trace) == self.current_generated_trace_count:
return None
```
…dropping repair feedback that the next loop body would have written into `success_task_to_knowledge_dict`. The next repair round then doesn't know which tasks already passed and may re-implement the whole group instead of only repairing the failed task — wasting LLM budget and risking regression of previously-successful code (issue #1398).
Fix
Bind the cursor to `id(evolving_trace)` and reset it when (a) the trace identity changes (new `develop()` call) or (b) the cursor is greater than the current trace length (defensive against truncation / fresh-trace reuse).
```diff
class CoSTEERRAGStrategyV2(CoSTEERRAGStrategy):
def init(self, settings: CoSTEERSettings, *args, **kwargs) -> None:
super().init(*args, **kwargs)
self.current_generated_trace_count = 0
def generate_knowledge(...):
```
`getattr(... None)` handles instances rebuilt without going through `init` (resume / pickle paths).
V1 is unchanged — it raises `NotImplementedError` immediately and is documented as deprecated.
Tests
`test/utils/coder/test_CoSTEER_RAG_cursor.py` adds four offline regressions:
Tests are marked `@pytest.mark.offline` and avoid LLM/knowledge-base setup by passing trace steps with empty `sub_tasks`/`sub_workspace_list` so the inner loop is a no-op; visit counters on the steps prove the outer loop ran.
Local gates
```text
$ pytest test/utils/coder/test_CoSTEER_RAG_cursor.py -v
4 passed, 2 warnings in 1.86s
$ pytest (without the fix on main): 2 failed, 2 passed (confirms the regression test bites)
$ ruff check --no-fix rdagent/components/coder/CoSTEER/knowledge_management.py
no new errors introduced (110 pre-existing → 110)
$ black --check --target-version py311 -l 120 test/utils/coder/test_CoSTEER_RAG_cursor.py
All done! ✨ 🍰 ✨
$ isort --check-only test/utils/coder/test_CoSTEER_RAG_cursor.py rdagent/components/coder/CoSTEER/knowledge_management.py
clean
```
Diff is +136 / -0 (13 in source, 123 in the new test file).
🤖 Worked on by an AI agent. The cursor logic is small but adversarial-looking; happy to fold in any review notes.
📚 Documentation preview 📚: https://RDAgent--1399.org.readthedocs.build/en/1399/