DEFAULT_TO_REAL_DATA Policy

Effective Date: January 17, 2026 Status: MANDATORY Scope: All OpenAdapt viewer components

Policy Statement

CRITICAL: All OpenAdapt viewers MUST default to using REAL data from actual recordings.

Fake/sample/synthetic data is ONLY permitted for:

Unit tests (clearly marked)
Explicit demo mode (with clear warnings)
Documentation examples (clearly labeled as synthetic)

Problem Solved

Before this policy:

test_benchmark_refactored.html contained fake sample data
Users couldn't see real system behavior
Demos were unconvincing and misleading
No way to verify actual ML segmentation quality

After this policy:

All viewers default to real nightshift recording
Screenshots show actual macOS System Settings
Actions reflect real user behavior
Episode boundaries from ML segmentation
Duration matches actual recording (6.7 seconds)

Implementation

Default Capture Location

/Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift/
├── capture.db          # 1,561 real events
├── episodes.json       # 2 ML-segmented episodes
├── screenshots/        # 22 real PNG files
├── video.mp4          # Real screen recording
└── audio.flac         # Real audio

Code Changes

1. Real Data Loader (`real_data_loader.py`)

NEW: Loads real capture data from SQLite + episodes.json

from openadapt_viewer.viewers.benchmark.real_data_loader import load_real_capture_data

# Default to nightshift recording
run = load_real_capture_data()

# Or specify different recording
run = load_real_capture_data("/path/to/other/recording")

Features:

Reads from capture.db (SQLite)
Loads episodes from episodes.json
Converts to BenchmarkRun format
Uses real screenshots from recordings
Preserves actual timestamps and durations

2. Generator Updates (`generator.py`)

def generate_benchmark_html(
    data_path: Optional[Path | str] = None,
    output_path: Path | str = "benchmark_viewer.html",
    standalone: bool = False,
    run_data: Optional[BenchmarkRun] = None,
    use_real_data: bool = True,  # DEFAULT: True
) -> str:
    """Generate benchmark viewer.

    POLICY: ALWAYS defaults to real data from nightshift recording.
    Set use_real_data=False ONLY for unit tests with sample data.
    """
    if run_data is not None:
        run = run_data
    elif data_path is not None:
        # Try to load as capture directory first
        try:
            run = load_real_capture_data(data_path)
        except (FileNotFoundError, ValueError, KeyError):
            # Fall back to benchmark data format
            run = load_benchmark_data(data_path)
    else:
        # DEFAULT: Use real data
        if use_real_data:
            run = load_real_capture_data()
        else:
            # ONLY for unit tests
            run = create_sample_data()

3. CLI Updates (`cli.py`)

# Default: nightshift recording
uv run openadapt-viewer benchmark --output viewer.html

# Specific recording
uv run openadapt-viewer benchmark --data /path/to/recording --output viewer.html

Output:

Generating benchmark viewer with REAL nightshift recording data...
Generated: viewer.html

4. Sample Data Warnings (`data.py`)

def create_sample_data(num_tasks: int = 10) -> BenchmarkRun:
    """Create sample benchmark data for testing/demo purposes.

    WARNING: This generates FAKE/SYNTHETIC data.
    POLICY: ONLY use this for unit tests, clearly marked.
    For all other purposes, use load_real_capture_data() from real_data_loader.

    ...
    """

Verification

Automated Verification

Run verification to ensure real data is being used:

python3 -c "
import re

with open('test_benchmark_refactored.html') as f:
    html = f.read()

checks = {
    'Title contains Real Capture': 'Real Capture: Turn Off Night Shift Demo' in html,
    'Model is human_demonstration': 'human_demonstration' in html,
    'Has episode_001': 'episode_001' in html,
    'Has episode_002': 'episode_002' in html,
    'Has Navigate to System Settings': 'Navigate to System Settings' in html,
    'Has Disable Night Shift': 'Disable Night Shift' in html,
    'Has real screenshot paths': 'capture_31807990_step_' in html,
    'Has turn-off-nightshift': 'turn-off-nightshift' in html,
    'Total tasks is 2': 'Total Tasks' in html and '>2</div>' in html,
    'No sample data': 'sample_run' not in html,
    'No synthetic data': 'synthetic' not in html.lower(),
}

for check, result in checks.items():
    status = '✓ PASS' if result else '✗ FAIL'
    print(f'{status}: {check}')

all_passed = all(checks.values())
print('Overall:', 'ALL CHECKS PASSED' if all_passed else 'SOME CHECKS FAILED')
"

Manual Verification

Check Title: "Real Capture: Turn Off Night Shift Demo"
Check Model: "human_demonstration"
Check Tasks: 2 episodes (not 10 fake tasks)
Check Episodes:
- episode_001: "Navigate to System Settings"
- episode_002: "Disable Night Shift"
Check Screenshots: Real paths like capture_31807990_step_0.png
Check Duration: ~6.7 seconds (not random)
Check Metadata:
- recording_id: "turn-off-nightshift"
- source: "real_capture"
- platform: "darwin"
- llm_model: "gpt-4o"
- episode_count: 2

Real Data Contents

Episode 1: Navigate to System Settings

Duration: 3.5 seconds Steps:

Click System Settings icon in dock
Wait for Settings window to open
Click on Displays in sidebar

Screenshots: 3 key frames Boundary Confidence: 0.92 Coherence Score: 0.88

Episode 2: Disable Night Shift

Duration: 3.2 seconds Steps:

Scroll down in Displays settings
Click on Night Shift option
Toggle Night Shift switch to off position

Screenshots: 3 key frames Boundary Confidence: 0.95 Coherence Score: 0.91

Recording Metadata

Recording ID: turn-off-nightshift
Platform: macOS (darwin)
Screen Size: 1920x1080
Total Duration: 6.7 seconds
Total Events: 1,561
- screen.frame: 457
- mouse.move: 1,046
- mouse.down: 13
- mouse.up: 13
- key.down: 16
- key.up: 16
Screenshots: 22 PNG files
ML Segmentation: gpt-4o
Processing: 2026-01-17T12:00:00
Coverage: 100%
Average Confidence: 93.5%

Benefits

1. Authenticity

Viewers show real user behavior
Screenshots are actual macOS UI
Actions match real interactions
Timestamps are accurate

2. ML Validation

Episode boundaries from ML segmentation
Confidence scores visible
Coherence scores tracked
Can verify segmentation quality

3. Convincing Demos

Real "Turn Off Night Shift" task
Actual macOS System Settings
Genuine user workflow
Professional presentation

4. Testing

Verify ML pipeline works end-to-end
Test viewer with real data shapes
Validate screenshot paths
Check timestamp handling

5. Documentation

Examples use real recordings
Screenshots show actual UI
Behavior matches reality
Credible use cases

Migration Guide

For New Viewers

from openadapt_viewer.viewers.benchmark import generate_benchmark_html

# ✓ CORRECT: Use real data by default
generate_benchmark_html(output_path="viewer.html")

# ✓ CORRECT: Specify recording
generate_benchmark_html(
    data_path="/path/to/recording",
    output_path="viewer.html"
)

# ✗ INCORRECT: Don't use sample data for production
generate_benchmark_html(
    output_path="viewer.html",
    use_real_data=False  # Only for unit tests!
)

For Existing Viewers

Replace sample data calls:

# OLD
from openadapt_viewer.viewers.benchmark.data import create_sample_data
run = create_sample_data()

# NEW
from openadapt_viewer.viewers.benchmark.real_data_loader import load_real_capture_data
run = load_real_capture_data()

Update CLI commands:

# OLD
uv run openadapt-viewer benchmark --output viewer.html
# (generated fake data)

# NEW
uv run openadapt-viewer benchmark --output viewer.html
# (generates real nightshift data)

Update tests:

# Unit tests can still use sample data
def test_viewer_with_sample_data():
    run = create_sample_data(num_tasks=5)  # OK for tests
    html = generate_benchmark_html(run_data=run, use_real_data=False)
    assert "task_001" in html

For Documentation

Label synthetic examples:

# Example with Synthetic Data (for illustration only)

Note: This example uses synthetic data for simplicity.
In production, always use real capture data.

Prefer real examples:

# Example with Real Data

This example uses the nightshift recording from openadapt-capture.

Enforcement

Code Reviews

All PRs must verify:

No sample data in production code paths
Real data used by default
Sample data clearly marked for tests
Documentation uses real examples

CI/CD

Automated checks:

# Check no sample_run in production HTML
grep -q "sample_run" viewer.html && exit 1

# Check real data markers present
grep -q "real_capture" viewer.html || exit 1
grep -q "human_demonstration" viewer.html || exit 1

Testing

Unit tests MUST specify use_real_data=False explicitly:

def test_with_sample_data():
    # Explicit opt-in to fake data
    run = create_sample_data(num_tasks=3)
    html = generate_benchmark_html(run_data=run, use_real_data=False)
    assert len(run.tasks) == 3

Future Work

Additional Real Recordings

Add more real recordings for different tasks:

Browser automation tasks
File management operations
System configuration changes
Application workflows

Catalog Integration

from openadapt_viewer.catalog import get_catalog

# Discover all available recordings
catalog = get_catalog()
recordings = catalog.get_all_recordings()

# Load specific recording by name
run = load_real_capture_data(catalog.get_recording("turn-off-nightshift").path)

Recording Selection UI

Add dropdown in viewer to switch between recordings:

<select id="recording-selector">
  <option value="turn-off-nightshift" selected>Turn Off Night Shift</option>
  <option value="other-recording">Other Recording</option>
</select>

References

Real Data Loader: src/openadapt_viewer/viewers/benchmark/real_data_loader.py
Generator: src/openadapt_viewer/viewers/benchmark/generator.py
Sample Data (deprecated for production): src/openadapt_viewer/viewers/benchmark/data.py
CLI: src/openadapt_viewer/cli.py
Nightshift Recording: /Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift/
Episodes: /Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift/episodes.json

Questions & Support

For questions about this policy:

Check this document first
Review the real_data_loader.py code
Examine the nightshift recording structure
Test with uv run openadapt-viewer benchmark

Summary

OLD BEHAVIOR (UNACCEPTABLE):

uv run openadapt-viewer benchmark --output viewer.html
# Generated fake sample data with random tasks

NEW BEHAVIOR (CORRECT):

uv run openadapt-viewer benchmark --output viewer.html
# Generates real nightshift recording with 2 episodes, 6.7s duration, 22 screenshots

POLICY: ALWAYS use real data by default. Sample data ONLY for unit tests, clearly marked.

FilesExpand file tree

DEFAULT_TO_REAL_DATA.md

Latest commit

History