Add plot tab by dtdannen · Pull Request #3 · dtdannen/openevolve

dtdannen · 2025-10-08T19:27:19Z

No description provided.

…r adding

This commit adds a fourth "Plot" tab to the evolution visualizer that displays how metrics evolve across checkpoint iterations. Changes: - Backend (visualizer.py): Scan sibling checkpoints and aggregate best program metrics per iteration - Frontend (index.html): Add Plot tab UI element and view container - Plot visualization (plot.js): D3.js line chart with zoom/pan support, tooltips, and metric dropdown integration - Tab management (mainUI.js): Add "plot" to tabs array for switching - Data loading (main.js): Pass iteration_metrics to plot renderer Features: - Line chart showing metric value vs iteration number - Responsive to metric dropdown changes - Zoom and pan support consistent with other tabs - Hover tooltips showing exact values - Grid lines for readability - Works with existing checkpoint_interval (default: 10) - Gracefully handles missing data (single checkpoint mode) This enables easy comparison of evolution progress across different runs and helps identify convergence speed and plateaus. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

dtdannen · 2025-10-08T19:27:31Z

@claude can you review this?

claude · 2025-10-08T19:27:47Z

Claude finished @dtdannen's task —— View job

Code Review for Plot Tab PR

I've completed a thorough review of this PR. Overall, the plot tab implementation is solid and the new examples/documentation add significant value. However, there are several issues that need attention before merging.

✅ Strengths

Plot Tab Implementation - Well-architected visualization with D3.js
- Clean separation of concerns in plot.js
- Proper zoom/pan behavior with state preservation
- Responsive design with window resize handling
- Good UX: tooltips, animations, empty state messaging
Backend Integration - Smart checkpoint scanning approach
- load_iteration_metrics_from_checkpoints() in visualizer.py:29-86 efficiently loads metrics from sibling checkpoints
- Proper error handling for missing/malformed data
Documentation - Significant CLAUDE.md improvements
- Much more comprehensive development guidance
- Clear architecture documentation
- Good troubleshooting section

🐛 Issues to Fix

Critical Issues

Broken mainUI.js - Dark mode initialization error (scripts/static/js/mainUI.js:6-8)

const darkToggleContainer = document.getElementById('darkmode-toggle').parentElement;
const darkToggleInput = document.getElementById('darkmode-toggle');
const darkToggleLabel = document.getElementById('darkmode-label');

Problem: Code references DOM elements before checking if they exist, then tries to create custom toggle at line 10-28. This will fail if the elements are missing on initial load.

Fix: Add null checks before accessing .parentElement and other properties:

const darkToggleInput = document.getElementById('darkmode-toggle');
const darkToggleLabel = document.getElementById('darkmode-label');
if (!darkToggleInput || !darkToggleLabel) return;
const darkToggleContainer = darkToggleInput.parentElement;

Memory leak in plot.js (scripts/static/js/plot.js:256-267)
The tooltip div is created every time updatePlot() is called but never removed. On repeated renders (metric changes, window resize), this creates multiple tooltip divs.

Fix: Reuse existing tooltip or remove old one:
```
// Remove old tooltip if exists
d3.select('#view-plot .plot-tooltip').remove();

const tooltip = d3.select('#view-plot')
    .append('div')
    .attr('class', 'plot-tooltip')
    // ... rest of styles
```
Race condition in plot rendering (scripts/static/js/main.js:116-119)
```
if (window.renderPlot && data.iteration_metrics) {
    window.renderPlot(data.iteration_metrics);
}
```
window.renderPlot is set at the end of plot.js (line 326) but may not be loaded yet when main.js runs. This is an ES6 module import issue.

Fix: Import properly at top of main.js:
```
import { renderPlot } from './plot.js';
// Then use: renderPlot(data.iteration_metrics);
```

Major Issues

Missing error handling in visualizer.py (scripts/visualizer.py:64-78)
The loop that loads best_program_info.json files catches json.JSONDecodeError and KeyError but not FileNotFoundError or PermissionError which could occur at line 64.

Fix: Broader exception handling:

try:
    with open(best_info_path, "r") as f:
        best_info = json.load(f)
    # ...
except (json.JSONDecodeError, KeyError, IOError) as e:
    logger.debug(f"Error loading best_program_info from {ckpt_dir}: {e}")
    continue

Inconsistent metric handling (scripts/static/js/plot.js:76-83)
The code filters for valid data but doesn't handle the case where d.metrics exists but is not an object (could be null, array, etc.).

Fix: Add type check:

const validData = data.filter(d =>
    d.metrics &&
    typeof d.metrics === 'object' &&
    !Array.isArray(d.metrics) &&
    typeof d.metrics[metric] === 'number' &&
    isFinite(d.metrics[metric])
);

Minor Issues

Code duplication in plot.js
- Lines 53-68 and 86-99 have nearly identical "no data" message rendering
- Should extract to shared function
Missing JSDoc comments
- plot.js exports and functions lack documentation
- Makes maintenance harder
Potential XSS in tooltip (scripts/static/js/plot.js:279-282)
While metric comes from a controlled dropdown, it's interpolated directly into HTML without escaping. If metric names ever come from user data, this could be a security issue.

Recommendation: Use textContent or escape HTML:
```
.html(`
 Iteration ${d.iteration} 
 : ${d.metrics[metric].toFixed(4)}
`)
// Then: tooltip.select('.metric-name').textContent = metric;
```
Tab switching logic duplication (scripts/static/js/mainUI.js:31-56)
Very similar pattern to existing tab code - could be refactored to reduce duplication

🤔 Questions & Concerns

Blocksworld Example

Evaluator complexity (examples/blocksworld/blocksworld_evaluator.py)
- 571 lines is quite large for an evaluator
- Contains substantial domain logic (state validation, problem generation)
- Should some of this be extracted to a separate module?
Timeout configuration (examples/blocksworld/config.yaml:65)
```
evaluator:
  timeout: 1000  # Comment says "5s × 95 problems = 475s max"
```
Why 1000s if max expected is 475s? Is this intentional buffer or should it be tuned?

Papers Scripts

Massive PR scope - 5 different paper downloading scripts totaling ~2800 lines
- download_papers.py, download_papers_v2.py, download_papers_openreview.py, etc.
- Are all of these needed? Seems like exploratory code
- Should these be in a separate PR or cleaned up before merge?
API credentials in code (papers/download_papers.py:36-37)
```
SEMANTIC_SCHOLAR_API_KEY = os.getenv("SEMANTIC_SCHOLAR_API_KEY")  # Optional
EMAIL = os.getenv("EMAIL", "openevolvetesting.worrier295@passmail.net")
```
Default email is hardcoded - is this intentional for testing or should it require explicit configuration?

Planning Documents

Large uncommitted design docs
- MULTI_BLOCK_IMPLEMENTATION_PLAN.md (2018 lines)
- SYSTEM_PROMPT_EVOLUTION_PLAN.md (1309 lines)
These are excellent planning documents but:
- Should they be in /docs directory instead of repo root?
- Are they meant as temporary planning or permanent documentation?
- Consider adding a README section explaining their purpose

📋 Recommendations

High Priority

Fix dark mode initialization error in mainUI.js (Issue Global learnings 2 #1)
Fix memory leak in plot.js tooltips (Issue Add Claude Code GitHub Workflow #2)
Fix race condition in plot rendering (Issue Add plot tab #3)
Add broader error handling in visualizer.py (Issue Deep research system prompt #4)

Medium Priority

Improve metric type checking in plot.js (Issue Merged algorithmicsuperintelligence/openevolve#5)
Add JSDoc comments to plot.js functions
Consider extracting duplicate "no data" message rendering

Low Priority

Consider moving planning docs to /docs
Add documentation about paper scripts purpose
Evaluate if all 5 paper download scripts are necessary

Before Merge

Add PR description explaining the changes
Consider splitting papers scripts into separate PR (they're 75% of the additions)
Run linting: make lint or python -m black openevolve examples tests scripts

📊 Test Coverage

Not tested (should add before merge):

Plot tab with empty iteration_metrics
Plot tab with missing metrics
Metric dropdown changes while plot is visible
Window resize during plot interaction
Multiple checkpoint loads

Summary

The plot tab feature is well-implemented and adds valuable visualization capability. The main concerns are:

3 critical bugs that will cause runtime errors (dark mode init, memory leak, race condition)
Large PR scope - mixing feature implementation with exploratory code (paper scripts)
Missing tests for the new visualization features

I recommend fixing the critical bugs, cleaning up the paper scripts (or moving to separate PR), and adding basic tests before merging.

Estimated effort to fix critical issues: 1-2 hours

Job: https://github.com/dtdannen/openevolve/actions/runs/
Branch: add-plot-tab

claude · 2025-10-08T19:29:43Z

Pull Request Review: Add Plot Tab

Overview

This PR adds significant new functionality including visualization features, a new blocksworld example, documentation improvements, and paper download utilities. Overall, the changes are well-structured but there are several areas that need attention before merging.

🟢 Strengths

1. Excellent Documentation Updates (CLAUDE.md)

Comprehensive improvements with clear command examples
Better organized architecture descriptions
Added library API examples showing the public interface
Improved troubleshooting section with actionable guidance
Good use of code examples throughout

2. Well-Designed Plot Visualization Feature

Clean separation of concerns in plot.js (326 lines)
Good use of D3.js for interactive plotting
Proper zoom behavior preservation
Responsive design with window resize handling
Helpful tooltips and grid lines for better UX
Graceful error handling for missing data

3. Solid Blocksworld Example Implementation

Clean domain modeling with immutable State dataclass
Comprehensive evaluator with multiple test strategies
Good separation between simulator, planner, and evaluator
Thorough validation of state validity (no cycles, proper structure)
Supports both hardcoded and random problem generation
Early termination on failures for efficient cascade evaluation

🟡 Issues & Concerns

Critical: Paper Download Scripts May Not Belong in Core Repository

The papers/ directory (5 Python scripts, 627+ lines) appears to be research tooling rather than core OpenEvolve functionality:

Concerns:

Scope creep: Paper downloading is tangential to an evolutionary coding framework
API keys required: Requires SEMANTIC_SCHOLAR_API_KEY and email in .env (security consideration)
Large data footprint: Comments mention "~5GB disk space for PDFs"
Multiple external dependencies: requests, dotenv, tqdm, tenacity
Rate limiting complexity: Different rate limits for 4 different APIs

Recommendation:

Consider moving to a separate research/ or tools/ subdirectory
Add clear documentation about why this is included
OR move to a separate repository and reference it in docs
At minimum: Add a dedicated README in papers/ explaining its purpose and that it's optional

If keeping in repo:

Add papers/README.md explaining this is optional research tooling
Ensure papers/requirements.txt is properly documented in setup docs
Add to .gitignore properly (already has papers/data and papers/.env ✓)

Major: Planning Documents in Root Directory

Two large planning documents added to root:

MULTI_BLOCK_IMPLEMENTATION_PLAN.md (2018 lines)
SYSTEM_PROMPT_EVOLUTION_PLAN.md (1309 lines)

Issues:

Not implemented yet: These are planning docs for future work, not current functionality
Root clutter: Root directory should contain only essential docs
Misleading: Users might think these features exist

Recommendation:

Move to docs/design/ or docs/planning/ subdirectory
Add clear headers stating these are future work proposals
OR remove from this PR and add them when the features are actually implemented

Minor: Visualizer Changes Mixed with Feature Addition

File: scripts/visualizer.py

Changes:

Added iteration metrics loading (lines 29-86) ✓
Modified data endpoint to include metrics ✓
Overall well-implemented

Minor concern: The load_iteration_metrics_from_checkpoints() function has some edge case handling that could be cleaner:

# Line 68: Prefer current_iteration but fallback to iteration
iteration = best_info.get("current_iteration", best_info.get("iteration", 0))

Suggestion: Document why this fallback exists (checkpoint format changes over time?).

Code Quality Issues

1. Blocksworld Evaluator - Threading Safety

File: examples/blocksworld/blocksworld_evaluator.py

Lines 308-312, 462-474:

# Threading timeout implementation
thread = threading.Thread(target=solve_wrapper)
thread.daemon = True
thread.start()
thread.join(timeout=timeout_seconds)

Issues:

Daemon threads may not clean up properly
No mechanism to actually stop the thread after timeout (thread keeps running)
Using nonlocal result in nested function could cause race conditions

Recommendation:

Use multiprocessing with proper process termination instead of threading
OR use concurrent.futures.ThreadPoolExecutor with proper cancellation
OR document that threads will continue running until natural completion

2. Hardcoded Values

File: examples/blocksworld/blocksworld_evaluator.py

Lines 507-508:

TOTAL_PROBLEMS = 95  # Hardcoded constant for problems 6-100
progress_score = problems_solved / TOTAL_PROBLEMS

Issue: If problem_sizes parameter changes, this hardcoded value becomes incorrect.

Fix:

# Calculate total from range
TOTAL_PROBLEMS = len(range(6, 101))  # = 95
# OR better: pass as parameter

3. Potential State Validation Bug

File: examples/blocksworld/blocksworld_evaluator.py

Lines 148-149:

assert is_valid_state(initial, blocks), "Generated invalid initial state"
assert is_valid_state(goal, blocks), "Generated invalid goal state"

Issue: Using assert for runtime validation. Asserts are disabled with python -O.

Fix:

if not is_valid_state(initial, blocks):
    raise ValueError("Generated invalid initial state")
if not is_valid_state(goal, blocks):
    raise ValueError("Generated invalid goal state")

4. Plot Tab Addition - Missing Error Boundary

File: scripts/static/js/plot.js

Line 76-99: Good error handling for missing metrics ✓

Suggestion: Add error boundary for D3.js initialization failures (e.g., if D3 CDN fails to load).

5. Index.html - CDN Dependency

File: scripts/templates/index.html

Line 6:

<script src="https://d3js.org/d3.v7.min.js"></script>

Concern: CDN dependency means visualizer won't work offline.

Recommendation:

Document this requirement in visualizer docs
OR vendor D3.js locally for offline support
Consider adding integrity hash for security: integrity="sha384-..."

🔵 Best Practices & Suggestions

1. Testing Coverage

No tests added for new functionality. Consider adding:

Unit tests for load_iteration_metrics_from_checkpoints()
Integration test for plot tab rendering
Tests for blocksworld problem generation and validation

2. Configuration Files

File: examples/blocksworld/config.yaml and blocksworld_config.yaml

Two config files exist - this seems redundant. Document the difference or remove one.

3. Code Comments

The plot.js code is well-structured but lacks JSDoc comments for exported functions. Consider adding:

/**
 * Renders the evolution progress plot with iteration metrics
 * @param {Array} data - Array of {iteration, metrics} objects
 */
export function renderPlot(data) { ... }

4. Blocksworld Planner Evolution Block

File: examples/blocksworld/blocksworld_planner.py

Lines 95-137: The evolve block implements random search.

Suggestion for evolution: This is a great starting point, but consider adding comments inside the evolve block guiding the LLM:

# EVOLVE-BLOCK-START
# TODO: Improve this planner! Ideas:
# - Add heuristic guidance (goal distance)
# - Implement BFS/A* search
# - Add state caching to avoid loops
def plan_blocks(...):
    ...
# EVOLVE-BLOCK-END

🔒 Security Considerations

Paper Download Scripts - API Keys: .env file with API keys added to .gitignore ✓
CDN Integrity: D3.js loaded from CDN without integrity hash (see above)
Email Exposure: Hardcoded email in download_papers.py line 37 - consider this is public code

📊 Performance Considerations

Iteration Metrics Loading: load_iteration_metrics_from_checkpoints() reads all sibling checkpoints on every data request. Consider:
- Caching this data
- Only reloading when filesystem changes detected
Plot Rendering: Re-renders entire plot on metric change - this is fine for typical dataset sizes but could be optimized for large evolution runs

✅ Recommendations Summary

Before Merging:

Decision required: Keep or relocate paper download scripts?
Move planning documents to docs/ subdirectory (or remove from PR)
Fix threading safety in blocksworld evaluator (use multiprocessing)
Replace asserts with proper runtime validation
Resolve duplicate config files in blocksworld example
Document the optional vs. required nature of papers/ directory

Nice to Have:

Add tests for new visualization features
Add JSDoc comments to plot.js
Add CDN integrity hashes
Add caching for iteration metrics loading
Add error boundary for D3.js initialization

📝 Final Verdict

Conditional Approval - The core visualization features and blocksworld example are well-implemented, but the PR mixes several concerns:

✅ Plot tab feature (good)
✅ Blocksworld example (good, minor fixes needed)
✅ Documentation improvements (excellent)
⚠️ Paper download scripts (questionable fit)
⚠️ Unimplemented planning docs (should be in docs/ or separate PR)

Suggested Actions:

Address threading safety and assert issues in blocksworld
Make a clear decision on papers/ directory location
Move planning documents to appropriate location
Add brief release notes or update main README mentioning new plot tab feature

Great work overall! The plot visualization is a valuable addition to the project. 🚀

Bug fixes: - Fix dark mode initialization error with null checks (mainUI.js) - Fix memory leak by removing old tooltip before creating new (plot.js) - Fix race condition with proper ES6 import of renderPlot (main.js) - Add IOError handling for file operations (visualizer.py) - Fix inconsistent metric type validation (plot.js) New feature: - Add plot_metrics.py: standalone matplotlib plotting script - Support grouped runs with mean line and min/max shaded range - Use ':' separator to define groups of checkpoint directories - Configurable output, metrics, and plot styles 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

claude · 2025-10-09T19:27:25Z

Pull Request Review: Add Plot Tab

Summary

This PR adds significant new functionality including a plotting tab for the visualizer, a blocksworld example, planning documents, and paper download scripts. The changes span 24 files with 8,525 additions.

Positive Aspects ✅

1. Well-Structured Visualization Code

scripts/plot_metrics.py is professionally written with clear documentation, proper error handling, statistical aggregation, and good matplotlib practices

2. Clean JavaScript Implementation

scripts/static/js/plot.js demonstrates solid D3.js skills with proper module pattern, responsive design, zoom/pan functionality, and interactive tooltips

3. Comprehensive Documentation

CLAUDE.md improvements are excellent with clear testing instructions, CLI entry points, expanded architecture explanations, and troubleshooting section

4. Code Quality

Consistent formatting (Black with line-length 100), type hints, good naming conventions, proper dataclass usage

Issues and Concerns ⚠️

1. Security Concerns 🔴 HIGH PRIORITY

papers/download_papers.py Line 37: Email address exposed in code (openevolvetesting.worrier295@passmail.net). Move to environment variable only - do not hardcode email as fallback. Add exponential backoff for 429 responses. PDF downloads: No validation of downloaded file content.

2. Missing Test Coverage 🟡 MEDIUM PRIORITY

No tests found for scripts/plot_metrics.py statistical aggregation logic, examples/blocksworld/blocksworld_evaluator.py critical evaluation logic, or scripts/visualizer.py new load_iteration_metrics_from_checkpoints function.

3. Performance Issues 🟡 MEDIUM PRIORITY

scripts/plot_metrics.py Line 216: Subplot layout calculation is incorrect. scripts/visualizer.py Line 29-86: Synchronous checkpoint file reading could be slow for large experiments (1000+ checkpoints).

4. Code Quality Issues 🟢 LOW PRIORITY

examples/blocksworld/blocksworld_evaluator.py Line 20-55: is_valid_state function has O(n²) worst case. scripts/plot_metrics.py Line 210: Bare except clause should specify exception type.

Specific Recommendations

Must Fix Before Merge:

Remove hardcoded email from papers/download_papers.py:37
Add .env.example showing required environment variables
Add tests for blocksworld_evaluator.is_valid_state() and plot_metrics.compute_grouped_statistics()

Should Fix Before Merge:

Fix subplot layout calculation in plot_metrics.py:216
Add PDF content validation in paper download scripts
Fix bare except in plot_metrics.py:210

Questions for Author

Planning Documents: The MULTI_BLOCK_IMPLEMENTATION_PLAN.md is 2018 lines. Is this intended for this PR or future work?
Papers Directory: Are the paper download scripts core to OpenEvolve functionality or research tooling?
Blocksworld Example: What is the target audience? Educational or performance benchmarking?
Testing Strategy: Would you prefer to add tests in a follow-up PR or as part of this one?

Overall Assessment

Recommendation: Approve with changes required

This PR adds valuable functionality but has critical security issues (hardcoded email) and missing test coverage. The core plotting and visualization code is high quality. Once the security concern is addressed and basic tests are added, this will be a solid addition.

Estimated effort to address: ~2-4 hours

Code Review Checklist

✅ Code follows project conventions
✅ Documentation updated
⚠️ Tests added (MISSING)
⚠️ Security concerns addressed (hardcoded email)
✅ Performance considerations reasonable
✅ Error handling present
⚠️ Edge cases handled (some areas need work)
✅ Backwards compatibility maintained

Great work overall! The visualization features are well-implemented. Please address the security issue and consider adding tests before merging.

dtdannen and others added 3 commits October 6, 2025 15:21

first draft blocksworld and paper scripts

d6e56a3

gemini-2.5-pro does way better; planning for what features to conside…

93b4fee

…r adding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plot tab#3

Add plot tab#3
dtdannen wants to merge 4 commits into
mainfrom
add-plot-tab

dtdannen commented Oct 8, 2025

Uh oh!

dtdannen commented Oct 8, 2025

Uh oh!

claude Bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

claude Bot commented Oct 8, 2025

Uh oh!

claude Bot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dtdannen commented Oct 8, 2025

Uh oh!

dtdannen commented Oct 8, 2025

Uh oh!

claude Bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for Plot Tab PR

✅ Strengths

🐛 Issues to Fix

Critical Issues

Major Issues

Minor Issues

🤔 Questions & Concerns

Blocksworld Example

Papers Scripts

Planning Documents

📋 Recommendations

High Priority

Medium Priority

Low Priority

Before Merge

📊 Test Coverage

Summary

Uh oh!

claude Bot commented Oct 8, 2025

Pull Request Review: Add Plot Tab

Overview

🟢 Strengths

1. Excellent Documentation Updates (CLAUDE.md)

2. Well-Designed Plot Visualization Feature

3. Solid Blocksworld Example Implementation

🟡 Issues & Concerns

Critical: Paper Download Scripts May Not Belong in Core Repository

Major: Planning Documents in Root Directory

Minor: Visualizer Changes Mixed with Feature Addition

Code Quality Issues

1. Blocksworld Evaluator - Threading Safety

2. Hardcoded Values

3. Potential State Validation Bug

4. Plot Tab Addition - Missing Error Boundary

5. Index.html - CDN Dependency

🔵 Best Practices & Suggestions

1. Testing Coverage

2. Configuration Files

3. Code Comments

4. Blocksworld Planner Evolution Block

🔒 Security Considerations

📊 Performance Considerations

✅ Recommendations Summary

Before Merging:

Nice to Have:

📝 Final Verdict

Uh oh!

claude Bot commented Oct 9, 2025

Pull Request Review: Add Plot Tab

Summary

Positive Aspects ✅

1. Well-Structured Visualization Code

2. Clean JavaScript Implementation

3. Comprehensive Documentation

4. Code Quality

Issues and Concerns ⚠️

1. Security Concerns 🔴 HIGH PRIORITY

2. Missing Test Coverage 🟡 MEDIUM PRIORITY

3. Performance Issues 🟡 MEDIUM PRIORITY

4. Code Quality Issues 🟢 LOW PRIORITY

Specific Recommendations

Must Fix Before Merge:

Should Fix Before Merge:

Questions for Author

Overall Assessment

Code Review Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

claude Bot commented Oct 8, 2025 •

edited

Loading