Global learnings 2#1
Conversation
Implements a global learnings aggregator that collects insights across all
iterations and islands, injecting them into prompts to help the LLM avoid
repeated mistakes and learn from successful patterns.
Features:
- Tracks common failures: syntax errors, runtime errors, performance regressions
- Optionally tracks successful patterns with improvement metrics
- Configurable aggregation window (default: last 50 iterations)
- Flexible injection (system/user prompts) with verbosity levels
- Full checkpoint save/load support
- Zero overhead when disabled (default: off)
Implementation Details:
- New GlobalLearningsConfig in config.py with 16+ configuration options
- GlobalLearnings class extracts patterns using regex on stderr/artifacts
- Pattern detection: NameError, IndexError, TypeError, SyntaxError, etc.
- Performance regression tracking (>10% metric decreases)
- Prompt integration via PromptSampler.build_prompt()
- ProcessParallelController passes learnings to workers and updates from results
- Checkpoint persistence to global_learnings.json
Configuration Example:
global_learnings:
enabled: true
track_failures: true
track_successes: false
window_size: 50
max_learnings: 5
inject_in_system_prompt: true
verbosity: "concise"
Output Format:
## Evolution Insights (Global Learnings)
### Common Pitfalls:
❌ Undefined variable: temp (seen 5x)
❌ IndexError: list index out of range (seen 3x)
⚠️ score decreased by 15.2% (seen 4x)
Bug Fixes:
- Fix worker process API key propagation by calling update_model_params()
with overwrite=True after LLMConfig reconstruction
Files Changed:
- openevolve/config.py: Add GlobalLearningsConfig
- openevolve/global_learnings.py: New core implementation
- openevolve/controller.py: Initialize and checkpoint management
- openevolve/prompt/sampler.py: Inject learnings into prompts
- openevolve/process_parallel.py: Worker integration and updates
- examples/config_with_global_learnings.yaml: Example config
- GLOBAL_LEARNINGS_IMPLEMENTATION.md: Complete documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
@claude can you review this? |
|
Claude finished @dtdannen's task —— View job PR Review: Global Learnings SystemI've completed a thorough review of the Global Learnings implementation. This is a well-designed feature that adds cross-island learning capabilities to OpenEvolve. Here's my comprehensive feedback: ✅ Overall AssessmentStrengths:
Architecture Grade: A- 🔍 Detailed Findings1. Core Implementation (openevolve/global_learnings.py)Strengths:
Issues Found: 🔴 Critical - Regex Escape Issue (openevolve/global_learnings.py:168) (r"NameError: name '(\w+)' is not defined", lambda m: f"Undefined variable: {m.group(1)}"),The single quotes in the regex pattern should be escaped. This will fail to match actual Python NameErrors which use single quotes. Should be: (r"NameError: name \'(\w+)\' is not defined", lambda m: f"Undefined variable: {m.group(1)}"),🟡 Medium - Performance Regression Threshold Hardcoded (openevolve/global_learnings.py:223) if parent_value > 0 and child_value < parent_value * 0.9:The 10% threshold (0.9) is hardcoded. Consider making this configurable via 🟡 Medium - Limited Pattern Normalization (openevolve/global_learnings.py:265-266) key = f"{pattern_type}:{description}"Similar errors might not group together well (e.g., "list index out of range" vs "tuple index out of range"). Consider more sophisticated normalization. 🟢 Minor - Improvement Calculation (openevolve/global_learnings.py:232-255) 2. Configuration (openevolve/config.py)Strengths:
Issues Found: 🟢 Minor - Documentation (openevolve/config.py:336-339) 3. Controller Integration (openevolve/controller.py)Strengths:
Issues Found: 🟡 Medium - No Cleanup on Disable (openevolve/controller.py:459-461) if self.config.global_learnings.enabled:
self.global_learnings.save(Path(checkpoint_path))If global learnings is disabled mid-run, old checkpoint files persist. Consider cleanup logic or documenting this behavior. 4. Parallel Processing (openevolve/process_parallel.py)Strengths:
Issues Found: 🔴 Critical - Potential None Reference (openevolve/process_parallel.py:525-526) parent_program = self.database.get(result.parent_id) if result.parent_id else None
parent_metrics = parent_program.metrics if parent_program else NoneIf parent_program = self.database.get(result.parent_id) if result.parent_id else None
parent_metrics = parent_program.metrics if parent_program and hasattr(parent_program, 'metrics') else None🟡 Medium - Update Frequency Not Honored (openevolve/process_parallel.py:527-529) if completed_iteration % self.global_learnings.config.update_interval == 0:
self.global_learnings.update_from_iteration(...)5. Prompt Integration (openevolve/prompt/sampler.py)Strengths:
Issues Found: 🟢 Minor - Duplicate Newlines (openevolve/prompt/sampler.py:115) system_message = f"{system_message}\n\n{global_learnings}"The 6. Example ConfigurationsStrengths:
Issues Found: 🟢 Minor - Blocksworld Config Missing Global Learnings (examples/blocksworld/config.yaml) 📝 Documentation ReviewGLOBAL_LEARNINGS_IMPLEMENTATION.md
CLAUDE.md Updates
🧪 Testing RecommendationsMissing Tests (should add):
🔧 Additional IssuesDocumentation Files🟢 Minor - Large Markdown Files Added (MULTI_BLOCK_IMPLEMENTATION_PLAN.md: 2018 lines, SYSTEM_PROMPT_EVOLUTION_PLAN.md: 1309 lines)
Papers Directory🟡 Medium - New Papers Download Scripts (papers/*.py) 🎯 RecommendationsBefore Merging:
✅ ConclusionThis is a solid implementation of a useful feature. The critical issues are minor and easily fixed. The architecture integrates well with the existing codebase. With the suggested fixes, this is ready to merge. Recommendation: Approve with minor changes requested for the critical issues. Review completed: 2025-10-08 |
Critical Fixes: - Fix regex pattern for NameError to match both single and double quotes Pattern now: r"NameError: name ['\"](\w+)['\"] is not defined" This will properly match Python NameError messages Medium Improvements: - Add configurable performance_regression_threshold (default: 0.10 = 10%) Allows tuning sensitivity for different problem domains - Improve error pattern normalization for better grouping: * Normalize container types (list/tuple/str/dict → sequence) * Normalize numeric values (123 → N) * Lowercase for consistency Example: "list index 5 out of range" → "sequence index n out of range" Minor Improvements: - Prioritize combined_score in improvement calculation (consistency with codebase) Falls back to averaging all metrics if combined_score unavailable - Add comprehensive docstring to GlobalLearningsConfig explaining track_both precedence Files Changed: - openevolve/config.py: Add performance_regression_threshold field, docstring - openevolve/global_learnings.py: All pattern detection and normalization improvements All changes are backward compatible with existing configurations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request Review: Global Learnings SystemSummaryThis PR introduces a Global Learnings System (~500 lines) that aggregates insights across evolution iterations, plus new blocksworld examples and documentation. Strengths
Critical Issues1. Missing Test Coverage (HIGH)No unit tests found for the 510-line global_learnings.py module. Need tests for:
2. Security: Artifact Injection Risk (MEDIUM)Location: global_learnings.py:160-162 stderr is injected into prompts without sanitization. Malicious programs could craft stderr to manipulate LLM behavior. Recommend adding sanitization to truncate long errors, remove prompt injection patterns, and filter control characters. 3. Bug: Division Issues (MEDIUM)Location: global_learnings.py:223-229 Regression calculation checks parent_value > 0 but doesn't handle overflow or extreme values. Add try-except wrapper and clamp percentages. Medium Priority4. Unbounded Pattern Growth (LOW)failure_patterns dict can grow unbounded. Add pattern limit (~1000) with LRU eviction. 5. Code Smell: Nested Conditionals (LOW)Lines 238-243 have triple-nested ifs with no logging. Use early returns and add debug logs. 6. Documentation: Missing Performance Details (LOW)Excellent docs but missing runtime/memory impact, token cost, and guidance on when NOT to use. Other ChangesBlocksworld Example: Excellent quality (737 lines) with proper validation and comprehensive metrics. Checkpoint Files: 150+ checkpoint files add 10K+ lines. These should be removed and added to .gitignore. Summary
Must Fix Before Merge
Approval: ConditionalSolid design and valuable feature, but needs tests and cleanup. Estimate 3-4 hours to address critical issues. Great work on the implementation and documentation! |
- Remove 441 accidentally committed checkpoint files from global_learning_failures_only_window_50/ - Add global_learning*/ to .gitignore to prevent future checkpoint commits - Checkpoint files are local experiment outputs and should not be in version control This cleanup removes ~10K+ lines of JSON checkpoint data that were accidentally included in the previous commit.
Pull Request Review - Global Learnings 2SummaryThis PR introduces a Global Learnings System to help the evolution process learn from past failures and successes across all islands. Additionally, it includes new examples (blocksworld), planning documents, and paper download utilities. Overall, the core implementation is solid, but there are concerns about test coverage, security practices, and organizational aspects. 🟢 Strengths1. Well-Designed Global Learnings SystemThe
Code Quality: The error extraction regexes are robust and the formatting methods adapt well to different verbosity levels. 2. Comprehensive Documentation
3. Configuration DesignThe
🟡 Areas of Concern1. Test Coverage - CRITICALIssue: No unit tests were added for the global learnings system. Impact: This is a complex feature with pattern extraction, aggregation logic, and prompt injection. Lack of tests means:
Recommendation: Add at minimum: tests/test_global_learnings.py:
- test_extract_syntax_errors()
- test_extract_runtime_errors()
- test_performance_regression_detection()
- test_pattern_normalization()
- test_failure_aggregation()
- test_success_tracking()
- test_prompt_generation()
- test_checkpoint_persistence()2. Papers Directory - Security & PurposeIssues:
Security Concerns:
Questions:
Recommendation:
3. Large Planning DocumentsThree markdown files add 3,547 lines:
Issue: These are massive planning documents that:
Recommendation:
4. Blocksworld Example - Incomplete?Two config files for the same example:
Issue: Unclear which is canonical or if one is outdated. Recommendation: Remove duplicate or add comment explaining the difference. 🔴 Bugs & Code Issues1. Missing Type AnnotationsIn def update_from_iteration(
self,
iteration: int,
result: Any, # ← Should be typed
parent_metrics: Optional[Dict[str, float]] = None,
) -> None:Recommendation: Define a proper type or use Protocol/TypedDict for 2. Pattern Normalization Could Lose Information# openevolve/global_learnings.py:269-273
def _normalize_error_description(self, description: str) -> str:
description = re.sub(r'\b(list|tuple|str|dict)\b index', 'sequence index', description)
description = re.sub(r'\b\d+\b', 'N', description) # All numbers → N
return description.lower()Issue: This might over-normalize. For example:
The first case loses line number context that could be valuable. Recommendation: Consider preserving line numbers or adding configuration for normalization level. 3. Config.py - Missing @DataClass Decorator# openevolve/config.py:330
class GlobalLearningsConfig: # ← Missing @dataclass decorator!
"""Configuration for global learnings aggregation"""
enabled: bool = False
...Impact: This will fail at runtime - the class needs Fix Required: Add 🧪 Testing Strategy NeededGiven the lack of tests, here is a testing roadmap: Unit Tests (high priority): tests/test_global_learnings.py:
- Pattern extraction accuracy (various error formats)
- Aggregation with windowing
- Checkpoint serialization roundtrip
- Prompt formatting (all verbosity levels)
- Edge cases: empty patterns, missing metrics
tests/test_global_learnings_integration.py:
- Integration with controller
- Multi-iteration tracking
- Cross-island aggregation
- Resume from checkpoint with learningsIntegration Tests (medium priority): make test-integration # Verify learnings appear in prompts📝 Documentation & Code QualityPositives:
Improvements Needed:
🔒 Security AssessmentLow-to-Medium Risk:
Recommendations:
📊 Performance ConsiderationsGlobal Learnings:
Recommendation: Consider adding metrics:
✅ Recommendations SummaryMust Fix Before Merge:
Should Fix:
Nice to Have:
🎯 Overall AssessmentScore: 6.5/10 Strengths: The global learnings implementation is architecturally sound and well-integrated. Documentation improvements are excellent. Weaknesses: Critical lack of tests, unclear purpose of papers directory, missing Recommendation: Request Changes - Fix the critical issues (especially the missing decorator and test coverage) before merging. Consider splitting this into multiple PRs:
🤝 Final ThoughtsThis is ambitious work with good ideas, but needs polish before production readiness. The global learnings concept is valuable - seeing LLMs avoid repeated mistakes could significantly improve convergence. However, without tests, we cannot be confident it works correctly. Happy to discuss any of these points! The core work is solid - let us just ensure it is production-ready. References:
|
Implements automatic summarization of failure/success patterns using LLM to generate actionable, natural language guidance instead of raw metric changes.
Key changes:
- Enhanced FailurePattern and SuccessPattern dataclasses with code context fields (parent_code, child_code, code_diff, metrics)
- Added _summarize_pattern_with_llm_sync() that calls LLM with pattern + code context to generate 2-sentence summaries
- Handles full rewrites by including both parent and child code when code_diff="Full rewrite"
- Uses thread pool to handle async LLM calls when event loop is already running
- Caches LLM summaries to avoid redundant calls
- Logs warnings when code context is missing
- Updated prompt to ask LLM to: 1) describe the code change/pattern, 2) state what to avoid/apply
- Format changed to "(seen Nx) {summary}" for better readability
- Lowered min_failure_count to 1 for testing
- Changed log_level to DEBUG in circle packing config for visibility
- Updated circle packing evaluator to return EvaluationResult with artifacts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request Review: Global Learnings 2SummaryThis PR adds significant new functionality (Global Learnings system) and a new example (blocksworld), but includes ~19,000 lines of test output/checkpoint files that should not be committed to the repository. 🚨 Critical Issues1. Checkpoint/Test Output Files Should Not Be in GitThe PR includes hundreds of checkpoint files and test outputs that should be gitignored:
Action Required:
2. Missing Core Code ChangesThe implementation docs reference changes to core files, but I cannot verify these are included:
Action Required: Confirm these core files were modified and are part of the PR. ✅ Positive AspectsGlobal Learnings System (
|
- Enable global learnings system with failure tracking - Set window_size to 100 iterations and max_learnings to 10 - Configure concise verbosity for learnings injection - Add DEBUG log level for detailed execution monitoring - Track syntax, runtime, and performance regression errors This configuration allows the blocksworld planner to learn from common failure patterns across islands and avoid repeating mistakes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request Review: Global Learnings 2Thank you for this substantial contribution! I've reviewed the PR and have observations organized by category. Overall AssessmentThis PR focuses on planning and documentation for major OpenEvolve enhancements. The changes (953 files, ~20K additions) consist of:
Documentation & PlanningStrengths:
Concerns:
Repository Hygiene - MAJOR ISSUESTest Artifacts Committed (~900 files)Problem: examples/circle_packing/test_global_learnings_working/ contains hundreds of checkpoint JSON files Recommendation:
Code ChangesModified files:
Cannot review without diffs. Questions:
Security ConsiderationsBased on documentation:
Testing - MISSING
Recommendations:
Performance ConsiderationsFrom documentation:
RecommendationsCritical:
High Priority:
Nice to Have:
Positive Highlights
ConclusionThis PR demonstrates excellent planning and design work but needs cleanup before merging. Accept if: Design proposal PR + artifacts removed + .gitignore updated Request changes if: Intended as feature implementation (needs code + tests) OR test artifacts remain Overall: Request changes to remove test artifacts and clarify scope. Questions for Author
Review per CLAUDE.md conventions. Focused on code quality, security, testing, and best practices. |
No description provided.