Date: January 17, 2026 Status: Implementation Complete Project: openadapt-viewer
File: TESTING_STRATEGY.md
Contents:
- Complete testing architecture (layered testing pyramid)
- Tool selection rationale (Playwright + pytest)
- Test organization structure
- Specific test cases for each viewer type
- Development workflow (TDD)
- CI/CD integration
- Performance optimization strategies
- Troubleshooting guide
- Success criteria and metrics
- Future improvements roadmap
Key Decisions Documented:
- Why Playwright over alternatives (Selenium, Cypress, Puppeteer)
- Why pytest as test framework
- Test granularity: When to use unit vs component vs integration tests
- Visual regression approach with screenshot comparison
- Alpine.js testing strategy
File: docs/TESTING_GUIDE.md
Contents:
- Quick start commands
- How to write tests (unit, component, integration, visual)
- Using fixtures for test data
- Running and debugging tests
- Common patterns (Alpine.js, async, errors, interactions)
- Best practices
- Troubleshooting common issues
- Guide for Claude Code on interpreting failures and writing tests
Target Audiences:
- Developers: Practical how-to guide
- Claude Code: Clear patterns for AI-assisted testing
File: CLAUDE.md (Testing section added)
Changes:
- Added prominent testing section
- Quick testing commands
- Testing philosophy
- Specific guidance for Claude Code
- Links to comprehensive documentation
File: tests/integration/test_benchmark_workflow_example.py
Contents:
- 15+ comprehensive integration tests
- Demonstrates real user workflows
- Tests for:
- Initial load and summary display
- Domain and status filtering
- Task selection and detail panel
- Playback controls and navigation
- Step list interaction
- Action details display
- Error handling for failed tasks
- Accessibility (keyboard navigation)
- Progress bar updates
Purpose: Serves as reference implementation showing how to write good integration tests.
/\
/E2E\ 5% - Full workflows (slow, high-level)
/-----\
/Integ \ 15% - Multiple components
/--------\
/Component\ 50% - Individual UI components
/------------\
/Unit & Logic \ 30% - Pure functions
/----------------\
| Type | Location | Purpose | Speed | Examples |
|---|---|---|---|---|
| Unit | tests/unit/ |
Pure functions, data transformations | <10ms | test_format_duration(), test_parse_action() |
| Component | tests/component/ |
Individual UI elements | ~500ms | test_screenshot_with_overlays(), test_playback_controls() |
| Integration | tests/integration/ |
Multiple components, workflows | ~2s | test_benchmark_viewer_full_workflow() |
| Visual | tests/visual/ |
Screenshot comparison | ~2s | test_benchmark_list_layout() |
- Playwright: Modern browser automation (Python)
- pytest: Test framework
- pytest-playwright: Playwright fixtures for pytest
- pytest-html: HTML test reports
- pytest-xdist: Parallel execution
- pytest-cov: Code coverage
Chosen over Selenium, Cypress, Puppeteer:
- ✅ Python native (no context switching)
- ✅ Modern API with auto-waiting
- ✅ Fast parallel execution
- ✅ Cross-browser (Chrome, Firefox, WebKit)
- ✅ File:// URL support (can test standalone HTML)
- ✅ Component isolation testing
- ✅ Built-in visual regression (screenshot comparison)
- ✅ Active development with 2026 improvements
2026 Improvements:
- Smarter locators with AI assistance
- Enhanced HTML reporter
- Better component isolation
- Improved trace viewer for debugging
Test individual components without full page load:
def test_screenshot_with_overlay(page):
from openadapt_viewer.components import screenshot_display
html = screenshot_display(
"test.png",
overlays=[{"type": "click", "x": 0.5, "y": 0.3}]
)
page.set_content(html)
assert page.locator(".oa-overlay-click").is_visible()Test full workflows with real user interactions:
def test_benchmark_viewer_workflow(page, benchmark_viewer_html):
page.goto(f"file://{benchmark_viewer_html}")
page.wait_for_selector(".task-item")
# Click task
page.locator(".task-item").first.click()
# Verify detail panel
assert page.locator(".task-detail").is_visible()
# Test playback
page.locator("button:has-text('Play')").click()
assert page.locator("button:has-text('Pause')").is_visible()Detect unintended layout/styling changes:
def test_benchmark_layout(page, viewer_path):
page.goto(f"file://{viewer_path}")
page.wait_for_selector(".task-list-item")
expect(page.locator(".task-list")).to_have_screenshot(
"benchmark-task-list.png",
max_diff_pixels=100
)Special handling for reactive components:
def test_alpine_reactive_filter(page):
page.set_content(html_with_alpine)
# Wait for Alpine initialization
page.wait_for_function("window.Alpine !== undefined")
# Test reactivity
page.select_option("select", "success")
assert page.locator("[x-text='filter']").text_content() == "success"Reusable test data and setup:
@pytest.fixture
def sample_benchmark_run():
"""Generate realistic benchmark data."""
return {
"run_id": "test-001",
"tasks": [...],
"executions": [...],
}
@pytest.fixture
def benchmark_viewer_html(tmp_path, sample_benchmark_run):
"""Generate viewer HTML for testing."""
output_path = tmp_path / "viewer.html"
generate_benchmark_html(run_data=sample_benchmark_run, output_path=output_path)
return output_path-
Write failing test (Red)
uv run pytest tests/component/test_new_feature.py -v # FAILED -
Implement feature (Green)
# Edit component code uv run pytest tests/component/test_new_feature.py -v # PASSED
-
Refactor with confidence (Refactor)
# Clean up implementation uv run pytest tests/component/test_new_feature.py -v # Still PASSED
-
Full suite before commit
uv run pytest tests/ -v # All PASSED
Benefits:
- Clear success criteria: Tests define what "working" means
- Immediate feedback: Claude knows right away if it broke something
- Guided debugging: Test failures point to exact problem
- Confidence: Can refactor knowing tests catch issues
Prompting Patterns:
"Write a test for the screenshot component that verifies overlays appear"
"Make this test pass: test_playback_controls_pause_resume"
"The test test_filter_workflow is failing. Fix the filter component."
"Add visual regression tests for the benchmark viewer"
cd /Users/abrichr/oa/src/openadapt-viewer
# Install with dev dependencies
uv sync --extra dev
# Install Playwright browsers
uv run playwright install chromium# All tests
uv run pytest tests/ -v
# By category
uv run pytest tests/unit/ -v # Fast unit tests
uv run pytest tests/component/ -v # Component tests
uv run pytest tests/integration/ -v # Integration tests
# Specific file
uv run pytest tests/component/test_screenshot.py -v
# With coverage
uv run pytest tests/ --cov=openadapt_viewer --cov-report=html
open htmlcov/index.html
# Parallel (faster)
uv run pytest tests/ -n auto -v# tests/component/test_my_feature.py
from playwright.sync_api import Page
def test_my_feature_renders(page: Page):
"""Test that my feature renders correctly."""
from openadapt_viewer.components import my_feature
# ARRANGE: Set up test data
html = my_feature(data="test")
# ACT: Render in browser
page.set_content(html)
# ASSERT: Verify behavior
assert page.locator(".oa-my-feature").is_visible()
assert page.locator(".oa-my-feature").text_content() == "test"# Show print statements
uv run pytest tests/ -v -s
# Show local variables on failure
uv run pytest tests/ -v -l
# Run with headed browser (see what's happening)
PWDEBUG=1 uv run pytest tests/component/test_screenshot.py -v
# Add breakpoint in test
def test_something(page):
import pdb; pdb.set_trace()
# or
page.pause() # Opens Playwright inspectorname: Test Viewers
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: uv sync --extra dev
- run: uv run playwright install --with-deps chromium
- run: uv run pytest tests/unit/ -v
- run: uv run pytest tests/component/ -v
- run: uv run pytest tests/integration/ -v# Run full CI locally before pushing
cd /Users/abrichr/oa/src/openadapt-viewer
# Fresh install
uv sync --extra dev
uv run playwright install chromium
# Run all tests with coverage
uv run pytest tests/ -v --cov=openadapt_viewer --cov-report=html
# Check coverage (aim for >80%)
open htmlcov/index.html
# Run parallel (faster)
uv run pytest tests/ -n auto -v| Metric | Target | Current |
|---|---|---|
| Line coverage | >80% | TBD (run tests to measure) |
| Branch coverage | >70% | TBD |
| Test count | 100+ | ~50 (existing + example) |
| Test execution time | <60s | ~20s (existing) |
| Flakiness rate | <2% | TBD |
- Test execution time: <60s for full suite
- Flakiness rate: <2% (tests should be deterministic)
- Bug detection rate: >90% (tests should catch most regressions)
- Coverage: >80% line coverage for component code
- Setup time: <5 minutes from clone to running tests
- Feedback loop: <10s for unit tests, <60s for integration
- Test failure clarity: Failures point to exact problem
- Maintenance burden: Tests shouldn't break with minor refactors
def test_alpine_reactive_state(page):
html = """<div x-data="{ count: 0 }">
<span x-text="count"></span>
<button @click="count++">+</button>
</div>"""
page.set_content(html)
# Wait for Alpine
page.wait_for_function("window.Alpine !== undefined")
# Test reactivity
assert page.locator("span").text_content() == "0"
page.locator("button").click()
assert page.locator("span").text_content() == "1"def test_async_data_loading(page):
page.goto("file:///viewer.html")
# Wait for data to load
page.wait_for_selector(".task-list-item")
# Or wait for specific condition
page.wait_for_function("window.dataLoaded === true")def test_error_handling(page):
page.goto("file:///viewer.html")
page.evaluate("window.loadData({ invalid: true })")
# Verify error shown
assert page.locator(".error-message").is_visible()
assert "Invalid data" in page.locator(".error-message").text_content()Solution:
- Add explicit waits:
page.wait_for_selector(".element") - Use deterministic test data (no
random, fixed timestamps) - Check viewport size consistency
- Wait for fonts/assets to load
Solution:
- Replace
wait_for_timeout()withwait_for_selector() - Use
page.wait_for_load_state("networkidle") - Ensure Alpine.js initialized:
page.wait_for_function("window.Alpine") - Increase visual regression threshold:
max_diff_pixels=100
Solution:
- Add explicit wait:
page.wait_for_selector(".element") - Use better selectors:
page.get_by_role("button", name="Submit") - Debug:
page.pause()to inspect page - Check Alpine.js init:
page.wait_for_function("window.Alpine")
- Add performance tests (load 1000+ tasks)
- Expand visual regression to all viewers
- Add accessibility tests (ARIA, keyboard navigation)
- Document testing patterns in examples
- Cross-browser testing (Firefox, Safari)
- Mutation testing (verify tests catch bugs)
- Screenshot diffing UI
- Performance profiling
- E2E testing with real data
- Automated test generation
- Property-based testing
- Test visualization dashboard
- TESTING_STRATEGY.md - Comprehensive strategy
- docs/TESTING_GUIDE.md - Practical guide
- Playwright Python Docs
- pytest Documentation
- 15 Best Practices for Playwright testing in 2026
- 9 Playwright Best Practices and Pitfalls to Avoid
- Playwright Best Practices (Official)
- pytest-playwright Plugin
- pytest-html - HTML test reports
- pytest-xdist - Parallel execution
- pytest-cov - Code coverage
This implementation provides:
- Systematic regression prevention: Automated tests catch breaks before they happen
- Fast development feedback: Know immediately if changes broke something
- Component isolation: Test each piece independently
- AI-friendly structure: Clear patterns Claude Code can maintain
- Developer confidence: Make changes without fear
Key Achievement: Transformed unpleasant, buggy viewer development into a confident, systematic process with automated safety nets.
Next Steps:
- Run existing tests to establish baseline coverage
- Add tests for new features using TDD workflow
- Set up CI/CD integration
- Expand visual regression testing
- Measure and improve coverage over time
Document Version: 1.0 Date: January 17, 2026 Author: OpenAdapt Team with Claude Code Status: Implementation Complete