UI Testing Implementation Summary

Date: January 17, 2026 Status: Implementation Complete Project: openadapt-viewer

What Was Delivered

1. Comprehensive Testing Strategy Document

File: TESTING_STRATEGY.md

Contents:

Complete testing architecture (layered testing pyramid)
Tool selection rationale (Playwright + pytest)
Test organization structure
Specific test cases for each viewer type
Development workflow (TDD)
CI/CD integration
Performance optimization strategies
Troubleshooting guide
Success criteria and metrics
Future improvements roadmap

Key Decisions Documented:

Why Playwright over alternatives (Selenium, Cypress, Puppeteer)
Why pytest as test framework
Test granularity: When to use unit vs component vs integration tests
Visual regression approach with screenshot comparison
Alpine.js testing strategy

2. Practical Testing Guide

File: docs/TESTING_GUIDE.md

Contents:

Quick start commands
How to write tests (unit, component, integration, visual)
Using fixtures for test data
Running and debugging tests
Common patterns (Alpine.js, async, errors, interactions)
Best practices
Troubleshooting common issues
Guide for Claude Code on interpreting failures and writing tests

Target Audiences:

Developers: Practical how-to guide
Claude Code: Clear patterns for AI-assisted testing

3. Updated CLAUDE.md

File: CLAUDE.md (Testing section added)

Changes:

Added prominent testing section
Quick testing commands
Testing philosophy
Specific guidance for Claude Code
Links to comprehensive documentation

4. Example Integration Test Suite

File: tests/integration/test_benchmark_workflow_example.py

Contents:

15+ comprehensive integration tests
Demonstrates real user workflows
Tests for:
- Initial load and summary display
- Domain and status filtering
- Task selection and detail panel
- Playback controls and navigation
- Step list interaction
- Action details display
- Error handling for failed tasks
- Accessibility (keyboard navigation)
- Progress bar updates

Purpose: Serves as reference implementation showing how to write good integration tests.

Testing Architecture Overview

Layered Testing Pyramid

          /\
         /E2E\         5% - Full workflows (slow, high-level)
        /-----\
       /Integ \        15% - Multiple components
      /--------\
     /Component\       50% - Individual UI components
    /------------\
   /Unit & Logic \     30% - Pure functions
  /----------------\

Test Types

Type	Location	Purpose	Speed	Examples
Unit	`tests/unit/`	Pure functions, data transformations	<10ms	`test_format_duration()`, `test_parse_action()`
Component	`tests/component/`	Individual UI elements	~500ms	`test_screenshot_with_overlays()`, `test_playback_controls()`
Integration	`tests/integration/`	Multiple components, workflows	~2s	`test_benchmark_viewer_full_workflow()`
Visual	`tests/visual/`	Screenshot comparison	~2s	`test_benchmark_list_layout()`

Tool Stack

Playwright: Modern browser automation (Python)
pytest: Test framework
pytest-playwright: Playwright fixtures for pytest
pytest-html: HTML test reports
pytest-xdist: Parallel execution
pytest-cov: Code coverage

Why Playwright?

Chosen over Selenium, Cypress, Puppeteer:

✅ Python native (no context switching)
✅ Modern API with auto-waiting
✅ Fast parallel execution
✅ Cross-browser (Chrome, Firefox, WebKit)
✅ File:// URL support (can test standalone HTML)
✅ Component isolation testing
✅ Built-in visual regression (screenshot comparison)
✅ Active development with 2026 improvements

2026 Improvements:

Smarter locators with AI assistance
Enhanced HTML reporter
Better component isolation
Improved trace viewer for debugging

Key Features

1. Component Isolation Testing

Test individual components without full page load:

def test_screenshot_with_overlay(page):
    from openadapt_viewer.components import screenshot_display

    html = screenshot_display(
        "test.png",
        overlays=[{"type": "click", "x": 0.5, "y": 0.3}]
    )
    page.set_content(html)

    assert page.locator(".oa-overlay-click").is_visible()

2. Integration Testing

Test full workflows with real user interactions:

def test_benchmark_viewer_workflow(page, benchmark_viewer_html):
    page.goto(f"file://{benchmark_viewer_html}")
    page.wait_for_selector(".task-item")

    # Click task
    page.locator(".task-item").first.click()

    # Verify detail panel
    assert page.locator(".task-detail").is_visible()

    # Test playback
    page.locator("button:has-text('Play')").click()
    assert page.locator("button:has-text('Pause')").is_visible()

3. Visual Regression

Detect unintended layout/styling changes:

def test_benchmark_layout(page, viewer_path):
    page.goto(f"file://{viewer_path}")
    page.wait_for_selector(".task-list-item")

    expect(page.locator(".task-list")).to_have_screenshot(
        "benchmark-task-list.png",
        max_diff_pixels=100
    )

4. Alpine.js Testing

Special handling for reactive components:

def test_alpine_reactive_filter(page):
    page.set_content(html_with_alpine)

    # Wait for Alpine initialization
    page.wait_for_function("window.Alpine !== undefined")

    # Test reactivity
    page.select_option("select", "success")
    assert page.locator("[x-text='filter']").text_content() == "success"

5. Fixture System

Reusable test data and setup:

@pytest.fixture
def sample_benchmark_run():
    """Generate realistic benchmark data."""
    return {
        "run_id": "test-001",
        "tasks": [...],
        "executions": [...],
    }

@pytest.fixture
def benchmark_viewer_html(tmp_path, sample_benchmark_run):
    """Generate viewer HTML for testing."""
    output_path = tmp_path / "viewer.html"
    generate_benchmark_html(run_data=sample_benchmark_run, output_path=output_path)
    return output_path

Development Workflow

TDD Workflow (Recommended)

Write failing test (Red)

uv run pytest tests/component/test_new_feature.py -v
# FAILED

Implement feature (Green)

# Edit component code
uv run pytest tests/component/test_new_feature.py -v
# PASSED

Refactor with confidence (Refactor)

# Clean up implementation
uv run pytest tests/component/test_new_feature.py -v
# Still PASSED

Full suite before commit
```
uv run pytest tests/ -v
# All PASSED
```

Integration with Claude Code

Benefits:

Clear success criteria: Tests define what "working" means
Immediate feedback: Claude knows right away if it broke something
Guided debugging: Test failures point to exact problem
Confidence: Can refactor knowing tests catch issues

Prompting Patterns:

"Write a test for the screenshot component that verifies overlays appear"

"Make this test pass: test_playback_controls_pause_resume"

"The test test_filter_workflow is failing. Fix the filter component."

"Add visual regression tests for the benchmark viewer"

Quick Start Guide

Setup (One-time)

cd /Users/abrichr/oa/src/openadapt-viewer

# Install with dev dependencies
uv sync --extra dev

# Install Playwright browsers
uv run playwright install chromium

Running Tests

# All tests
uv run pytest tests/ -v

# By category
uv run pytest tests/unit/ -v           # Fast unit tests
uv run pytest tests/component/ -v      # Component tests
uv run pytest tests/integration/ -v    # Integration tests

# Specific file
uv run pytest tests/component/test_screenshot.py -v

# With coverage
uv run pytest tests/ --cov=openadapt_viewer --cov-report=html
open htmlcov/index.html

# Parallel (faster)
uv run pytest tests/ -n auto -v

Writing a New Test

# tests/component/test_my_feature.py
from playwright.sync_api import Page

def test_my_feature_renders(page: Page):
    """Test that my feature renders correctly."""
    from openadapt_viewer.components import my_feature

    # ARRANGE: Set up test data
    html = my_feature(data="test")

    # ACT: Render in browser
    page.set_content(html)

    # ASSERT: Verify behavior
    assert page.locator(".oa-my-feature").is_visible()
    assert page.locator(".oa-my-feature").text_content() == "test"

Debugging Tests

# Show print statements
uv run pytest tests/ -v -s

# Show local variables on failure
uv run pytest tests/ -v -l

# Run with headed browser (see what's happening)
PWDEBUG=1 uv run pytest tests/component/test_screenshot.py -v

# Add breakpoint in test
def test_something(page):
    import pdb; pdb.set_trace()
    # or
    page.pause()  # Opens Playwright inspector

CI/CD Integration

GitHub Actions Workflow

name: Test Viewers
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: uv sync --extra dev
      - run: uv run playwright install --with-deps chromium
      - run: uv run pytest tests/unit/ -v
      - run: uv run pytest tests/component/ -v
      - run: uv run pytest tests/integration/ -v

Local CI Simulation

# Run full CI locally before pushing
cd /Users/abrichr/oa/src/openadapt-viewer

# Fresh install
uv sync --extra dev
uv run playwright install chromium

# Run all tests with coverage
uv run pytest tests/ -v --cov=openadapt_viewer --cov-report=html

# Check coverage (aim for >80%)
open htmlcov/index.html

# Run parallel (faster)
uv run pytest tests/ -n auto -v

Test Coverage Goals

Metric	Target	Current
Line coverage	>80%	TBD (run tests to measure)
Branch coverage	>70%	TBD
Test count	100+	~50 (existing + example)
Test execution time	<60s	~20s (existing)
Flakiness rate	<2%	TBD

Success Metrics

Quality Metrics

Test execution time: <60s for full suite
Flakiness rate: <2% (tests should be deterministic)
Bug detection rate: >90% (tests should catch most regressions)
Coverage: >80% line coverage for component code

Developer Experience Metrics

Setup time: <5 minutes from clone to running tests
Feedback loop: <10s for unit tests, <60s for integration
Test failure clarity: Failures point to exact problem
Maintenance burden: Tests shouldn't break with minor refactors

Common Patterns

Testing Alpine.js Components

def test_alpine_reactive_state(page):
    html = """<div x-data="{ count: 0 }">
        <span x-text="count"></span>
        <button @click="count++">+</button>
    </div>"""
    page.set_content(html)

    # Wait for Alpine
    page.wait_for_function("window.Alpine !== undefined")

    # Test reactivity
    assert page.locator("span").text_content() == "0"
    page.locator("button").click()
    assert page.locator("span").text_content() == "1"

Testing Async Operations

def test_async_data_loading(page):
    page.goto("file:///viewer.html")

    # Wait for data to load
    page.wait_for_selector(".task-list-item")

    # Or wait for specific condition
    page.wait_for_function("window.dataLoaded === true")

Testing Error States

def test_error_handling(page):
    page.goto("file:///viewer.html")
    page.evaluate("window.loadData({ invalid: true })")

    # Verify error shown
    assert page.locator(".error-message").is_visible()
    assert "Invalid data" in page.locator(".error-message").text_content()

Troubleshooting

Test Fails in CI but Passes Locally

Solution:

Add explicit waits: page.wait_for_selector(".element")
Use deterministic test data (no random, fixed timestamps)
Check viewport size consistency
Wait for fonts/assets to load

Flaky Tests

Solution:

Replace wait_for_timeout() with wait_for_selector()
Use page.wait_for_load_state("networkidle")
Ensure Alpine.js initialized: page.wait_for_function("window.Alpine")
Increase visual regression threshold: max_diff_pixels=100

Playwright Can't Find Element

Solution:

Add explicit wait: page.wait_for_selector(".element")
Use better selectors: page.get_by_role("button", name="Submit")
Debug: page.pause() to inspect page
Check Alpine.js init: page.wait_for_function("window.Alpine")

Future Improvements

Short-term (Q1 2026)

Add performance tests (load 1000+ tasks)
Expand visual regression to all viewers
Add accessibility tests (ARIA, keyboard navigation)
Document testing patterns in examples

Medium-term (Q2 2026)

Cross-browser testing (Firefox, Safari)
Mutation testing (verify tests catch bugs)
Screenshot diffing UI
Performance profiling

Long-term (Q3+ 2026)

E2E testing with real data
Automated test generation
Property-based testing
Test visualization dashboard

References

Official Documentation

TESTING_STRATEGY.md - Comprehensive strategy
docs/TESTING_GUIDE.md - Practical guide
Playwright Python Docs
pytest Documentation

Best Practices

Tools

pytest-playwright Plugin
pytest-html - HTML test reports
pytest-xdist - Parallel execution
pytest-cov - Code coverage

Conclusion

This implementation provides:

Systematic regression prevention: Automated tests catch breaks before they happen
Fast development feedback: Know immediately if changes broke something
Component isolation: Test each piece independently
AI-friendly structure: Clear patterns Claude Code can maintain
Developer confidence: Make changes without fear

Key Achievement: Transformed unpleasant, buggy viewer development into a confident, systematic process with automated safety nets.

Next Steps:

Run existing tests to establish baseline coverage
Add tests for new features using TDD workflow
Set up CI/CD integration
Expand visual regression testing
Measure and improve coverage over time

Document Version: 1.0 Date: January 17, 2026 Author: OpenAdapt Team with Claude Code Status: Implementation Complete

FilesExpand file tree

TESTING_IMPLEMENTATION_SUMMARY.md

Latest commit

History

TESTING_IMPLEMENTATION_SUMMARY.md

File metadata and controls

UI Testing Implementation Summary

What Was Delivered

1. Comprehensive Testing Strategy Document

2. Practical Testing Guide

3. Updated CLAUDE.md

4. Example Integration Test Suite

Testing Architecture Overview

Layered Testing Pyramid

Test Types

Tool Stack

Why Playwright?

Key Features

1. Component Isolation Testing

2. Integration Testing

3. Visual Regression

4. Alpine.js Testing

5. Fixture System

Development Workflow

TDD Workflow (Recommended)

Integration with Claude Code

Quick Start Guide

Setup (One-time)

Running Tests

Writing a New Test

Debugging Tests

CI/CD Integration

GitHub Actions Workflow

Local CI Simulation

Test Coverage Goals

Success Metrics

Quality Metrics

Developer Experience Metrics

Common Patterns

Testing Alpine.js Components

Testing Async Operations

Testing Error States

Troubleshooting

Test Fails in CI but Passes Locally

Flaky Tests

Playwright Can't Find Element

Future Improvements

Short-term (Q1 2026)

Medium-term (Q2 2026)

Long-term (Q3+ 2026)

References

Official Documentation

Best Practices

Tools

Conclusion