Skip to content

Commit 93b798e

Browse files
committed
Add rich feedback mode to k_module_problem example
Introduces a RICH_FEEDBACK=1 mode that provides detailed feedback on which modules are correct or incorrect, along with actionable hints. Updates the evaluator and iterative agent to support and display this feedback, and documents the new mode and its impact in the README.
1 parent fda1963 commit 93b798e

File tree

3 files changed

+75
-3
lines changed

3 files changed

+75
-3
lines changed

examples/k_module_problem/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,25 @@ This establishes the "no learning" baseline. Any method that beats this is demon
166166

167167
**Key insight**: While OpenEvolve takes more iterations on average (52.3 vs 13), it has a **100% success rate** compared to iterative refinement's 33%. The evolutionary approach's population diversity ensures it eventually escapes local optima that trap single-trajectory methods.
168168

169+
### Rich Feedback Mode: Proving Attribution Matters
170+
171+
To verify that feedback attribution is the key factor, we added a `RICH_FEEDBACK=1` mode that tells the agent exactly which modules are correct/incorrect:
172+
173+
```bash
174+
RICH_FEEDBACK=1 python run_iterative_trials.py --trials 3 --iterations 100
175+
```
176+
177+
| Method | Success Rate | Avg Iterations |
178+
|--------|-------------|----------------|
179+
| **Iterative (no feedback)** | 33% | 13 (when found) |
180+
| **Iterative (rich feedback)** | **100%** | **3** |
181+
182+
With rich feedback, iterative refinement achieves **100% success rate in only 3 iterations** - dramatically faster than OpenEvolve's 52 iterations! This proves that:
183+
184+
1. **Feedback attribution is the key factor**, not the optimization method
185+
2. When feedback is attributable, iterative refinement is highly effective
186+
3. Evolution is necessary when feedback is NOT attributable (you can't tell which component is wrong)
187+
169188
## Why This Matters
170189

171190
This example illustrates when you should prefer evolutionary approaches:

examples/k_module_problem/evaluator.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,21 @@
99
This creates a challenging landscape for iterative refinement but
1010
allows evolutionary crossover to combine good "building blocks"
1111
from different individuals.
12+
13+
Set RICH_FEEDBACK=1 to enable rich feedback mode, which tells you
14+
exactly which modules are correct/incorrect. This demonstrates that
15+
iterative refinement works well when feedback is attributable.
1216
"""
1317

18+
import os
1419
import sys
1520
import time
1621
import traceback
1722
import importlib.util
1823

24+
# Rich feedback mode - when enabled, reveals which modules are correct
25+
RICH_FEEDBACK = os.environ.get("RICH_FEEDBACK", "0") == "1"
26+
1927
# The correct solution (hidden from the optimizer)
2028
# This represents the "optimal" pipeline configuration discovered through
2129
# extensive testing/domain expertise
@@ -141,14 +149,34 @@ def score_config(config: dict) -> tuple:
141149

142150
def build_artifacts(config: dict, correct_count: int, module_results: dict, eval_time: float) -> dict:
143151
"""
144-
Build artifacts that provide useful feedback without revealing
145-
exactly which modules are correct.
152+
Build artifacts that provide useful feedback.
153+
154+
In normal mode: Only reveals how many modules are correct, not which ones.
155+
In rich feedback mode (RICH_FEEDBACK=1): Reveals exactly which modules are correct/incorrect.
146156
"""
147157
artifacts = {}
148158

149159
# Configuration summary
150160
artifacts["configuration"] = str(config)
151161

162+
# Rich feedback mode - reveals which modules are correct/incorrect
163+
if RICH_FEEDBACK:
164+
correct_modules = [m for m, is_correct in module_results.items() if is_correct]
165+
incorrect_modules = [m for m, is_correct in module_results.items() if not is_correct]
166+
167+
artifacts["module_feedback"] = {
168+
"correct": correct_modules,
169+
"incorrect": incorrect_modules,
170+
}
171+
172+
if incorrect_modules:
173+
hints = []
174+
for module in incorrect_modules:
175+
hints.append(f"'{module}' is WRONG - try a different option from {VALID_OPTIONS[module]}")
176+
artifacts["actionable_hints"] = hints
177+
else:
178+
artifacts["actionable_hints"] = ["All modules are correct!"]
179+
152180
# Score feedback - tells you how many are correct, but not which ones
153181
if correct_count == NUM_MODULES:
154182
artifacts["status"] = "PERFECT! All modules correctly configured!"

examples/k_module_problem/iterative_agent.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,26 @@ def write_program(program_path: str, code: str) -> None:
6464
f.write(code)
6565

6666

67+
def format_rich_feedback(artifacts: dict) -> str:
68+
"""Format rich feedback if available (RICH_FEEDBACK=1)."""
69+
if "module_feedback" not in artifacts:
70+
return ""
71+
72+
feedback = artifacts["module_feedback"]
73+
hints = artifacts.get("actionable_hints", [])
74+
75+
result = "\n## DETAILED MODULE FEEDBACK (Rich Feedback Mode)\n"
76+
result += f"- CORRECT modules: {feedback.get('correct', [])}\n"
77+
result += f"- INCORRECT modules: {feedback.get('incorrect', [])}\n"
78+
79+
if hints:
80+
result += "\n### Actionable Hints:\n"
81+
for hint in hints:
82+
result += f"- {hint}\n"
83+
84+
return result
85+
86+
6787
def create_improvement_prompt(
6888
current_code: str,
6989
metrics: dict,
@@ -108,6 +128,7 @@ def create_improvement_prompt(
108128
- Score: {metrics.get('combined_score', 0):.2%}
109129
- Status: {artifacts.get('status', 'N/A')}
110130
- Suggestion: {artifacts.get('suggestion', 'N/A')}
131+
{format_rich_feedback(artifacts)}
111132
{history_str}
112133
113134
## Your Task
@@ -205,7 +226,11 @@ def run_iterative_refinement(
205226

206227
# Evaluate current program
207228
eval_result = evaluate(str(current_program_path))
208-
metrics = eval_result.get("metrics", {})
229+
# Handle both flat (success) and nested (error) return formats
230+
if "metrics" in eval_result:
231+
metrics = eval_result["metrics"]
232+
else:
233+
metrics = {k: v for k, v in eval_result.items() if k != "artifacts"}
209234
artifacts = eval_result.get("artifacts", {})
210235

211236
score = metrics.get("combined_score", 0)

0 commit comments

Comments
 (0)