Effective Date: January 17, 2026 Status: MANDATORY Scope: All OpenAdapt viewer components
CRITICAL: All OpenAdapt viewers MUST default to using REAL data from actual recordings.
Fake/sample/synthetic data is ONLY permitted for:
- Unit tests (clearly marked)
- Explicit demo mode (with clear warnings)
- Documentation examples (clearly labeled as synthetic)
Before this policy:
test_benchmark_refactored.htmlcontained fake sample data- Users couldn't see real system behavior
- Demos were unconvincing and misleading
- No way to verify actual ML segmentation quality
After this policy:
- All viewers default to real nightshift recording
- Screenshots show actual macOS System Settings
- Actions reflect real user behavior
- Episode boundaries from ML segmentation
- Duration matches actual recording (6.7 seconds)
/Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift/
├── capture.db # 1,561 real events
├── episodes.json # 2 ML-segmented episodes
├── screenshots/ # 22 real PNG files
├── video.mp4 # Real screen recording
└── audio.flac # Real audio
NEW: Loads real capture data from SQLite + episodes.json
from openadapt_viewer.viewers.benchmark.real_data_loader import load_real_capture_data
# Default to nightshift recording
run = load_real_capture_data()
# Or specify different recording
run = load_real_capture_data("/path/to/other/recording")Features:
- Reads from
capture.db(SQLite) - Loads episodes from
episodes.json - Converts to BenchmarkRun format
- Uses real screenshots from recordings
- Preserves actual timestamps and durations
def generate_benchmark_html(
data_path: Optional[Path | str] = None,
output_path: Path | str = "benchmark_viewer.html",
standalone: bool = False,
run_data: Optional[BenchmarkRun] = None,
use_real_data: bool = True, # DEFAULT: True
) -> str:
"""Generate benchmark viewer.
POLICY: ALWAYS defaults to real data from nightshift recording.
Set use_real_data=False ONLY for unit tests with sample data.
"""
if run_data is not None:
run = run_data
elif data_path is not None:
# Try to load as capture directory first
try:
run = load_real_capture_data(data_path)
except (FileNotFoundError, ValueError, KeyError):
# Fall back to benchmark data format
run = load_benchmark_data(data_path)
else:
# DEFAULT: Use real data
if use_real_data:
run = load_real_capture_data()
else:
# ONLY for unit tests
run = create_sample_data()# Default: nightshift recording
uv run openadapt-viewer benchmark --output viewer.html
# Specific recording
uv run openadapt-viewer benchmark --data /path/to/recording --output viewer.htmlOutput:
Generating benchmark viewer with REAL nightshift recording data...
Generated: viewer.html
def create_sample_data(num_tasks: int = 10) -> BenchmarkRun:
"""Create sample benchmark data for testing/demo purposes.
WARNING: This generates FAKE/SYNTHETIC data.
POLICY: ONLY use this for unit tests, clearly marked.
For all other purposes, use load_real_capture_data() from real_data_loader.
...
"""Run verification to ensure real data is being used:
python3 -c "
import re
with open('test_benchmark_refactored.html') as f:
html = f.read()
checks = {
'Title contains Real Capture': 'Real Capture: Turn Off Night Shift Demo' in html,
'Model is human_demonstration': 'human_demonstration' in html,
'Has episode_001': 'episode_001' in html,
'Has episode_002': 'episode_002' in html,
'Has Navigate to System Settings': 'Navigate to System Settings' in html,
'Has Disable Night Shift': 'Disable Night Shift' in html,
'Has real screenshot paths': 'capture_31807990_step_' in html,
'Has turn-off-nightshift': 'turn-off-nightshift' in html,
'Total tasks is 2': 'Total Tasks' in html and '>2</div>' in html,
'No sample data': 'sample_run' not in html,
'No synthetic data': 'synthetic' not in html.lower(),
}
for check, result in checks.items():
status = '✓ PASS' if result else '✗ FAIL'
print(f'{status}: {check}')
all_passed = all(checks.values())
print('Overall:', 'ALL CHECKS PASSED' if all_passed else 'SOME CHECKS FAILED')
"- Check Title: "Real Capture: Turn Off Night Shift Demo"
- Check Model: "human_demonstration"
- Check Tasks: 2 episodes (not 10 fake tasks)
- Check Episodes:
- episode_001: "Navigate to System Settings"
- episode_002: "Disable Night Shift"
- Check Screenshots: Real paths like
capture_31807990_step_0.png - Check Duration: ~6.7 seconds (not random)
- Check Metadata:
recording_id: "turn-off-nightshift"source: "real_capture"platform: "darwin"llm_model: "gpt-4o"episode_count: 2
Duration: 3.5 seconds Steps:
- Click System Settings icon in dock
- Wait for Settings window to open
- Click on Displays in sidebar
Screenshots: 3 key frames Boundary Confidence: 0.92 Coherence Score: 0.88
Duration: 3.2 seconds Steps:
- Scroll down in Displays settings
- Click on Night Shift option
- Toggle Night Shift switch to off position
Screenshots: 3 key frames Boundary Confidence: 0.95 Coherence Score: 0.91
- Recording ID: turn-off-nightshift
- Platform: macOS (darwin)
- Screen Size: 1920x1080
- Total Duration: 6.7 seconds
- Total Events: 1,561
- screen.frame: 457
- mouse.move: 1,046
- mouse.down: 13
- mouse.up: 13
- key.down: 16
- key.up: 16
- Screenshots: 22 PNG files
- ML Segmentation: gpt-4o
- Processing: 2026-01-17T12:00:00
- Coverage: 100%
- Average Confidence: 93.5%
- Viewers show real user behavior
- Screenshots are actual macOS UI
- Actions match real interactions
- Timestamps are accurate
- Episode boundaries from ML segmentation
- Confidence scores visible
- Coherence scores tracked
- Can verify segmentation quality
- Real "Turn Off Night Shift" task
- Actual macOS System Settings
- Genuine user workflow
- Professional presentation
- Verify ML pipeline works end-to-end
- Test viewer with real data shapes
- Validate screenshot paths
- Check timestamp handling
- Examples use real recordings
- Screenshots show actual UI
- Behavior matches reality
- Credible use cases
from openadapt_viewer.viewers.benchmark import generate_benchmark_html
# ✓ CORRECT: Use real data by default
generate_benchmark_html(output_path="viewer.html")
# ✓ CORRECT: Specify recording
generate_benchmark_html(
data_path="/path/to/recording",
output_path="viewer.html"
)
# ✗ INCORRECT: Don't use sample data for production
generate_benchmark_html(
output_path="viewer.html",
use_real_data=False # Only for unit tests!
)-
Replace sample data calls:
# OLD from openadapt_viewer.viewers.benchmark.data import create_sample_data run = create_sample_data() # NEW from openadapt_viewer.viewers.benchmark.real_data_loader import load_real_capture_data run = load_real_capture_data()
-
Update CLI commands:
# OLD uv run openadapt-viewer benchmark --output viewer.html # (generated fake data) # NEW uv run openadapt-viewer benchmark --output viewer.html # (generates real nightshift data)
-
Update tests:
# Unit tests can still use sample data def test_viewer_with_sample_data(): run = create_sample_data(num_tasks=5) # OK for tests html = generate_benchmark_html(run_data=run, use_real_data=False) assert "task_001" in html
-
Label synthetic examples:
# Example with Synthetic Data (for illustration only) Note: This example uses synthetic data for simplicity. In production, always use real capture data. -
Prefer real examples:
# Example with Real Data This example uses the nightshift recording from openadapt-capture.
All PRs must verify:
- No sample data in production code paths
- Real data used by default
- Sample data clearly marked for tests
- Documentation uses real examples
Automated checks:
# Check no sample_run in production HTML
grep -q "sample_run" viewer.html && exit 1
# Check real data markers present
grep -q "real_capture" viewer.html || exit 1
grep -q "human_demonstration" viewer.html || exit 1Unit tests MUST specify use_real_data=False explicitly:
def test_with_sample_data():
# Explicit opt-in to fake data
run = create_sample_data(num_tasks=3)
html = generate_benchmark_html(run_data=run, use_real_data=False)
assert len(run.tasks) == 3Add more real recordings for different tasks:
- Browser automation tasks
- File management operations
- System configuration changes
- Application workflows
from openadapt_viewer.catalog import get_catalog
# Discover all available recordings
catalog = get_catalog()
recordings = catalog.get_all_recordings()
# Load specific recording by name
run = load_real_capture_data(catalog.get_recording("turn-off-nightshift").path)Add dropdown in viewer to switch between recordings:
<select id="recording-selector">
<option value="turn-off-nightshift" selected>Turn Off Night Shift</option>
<option value="other-recording">Other Recording</option>
</select>- Real Data Loader:
src/openadapt_viewer/viewers/benchmark/real_data_loader.py - Generator:
src/openadapt_viewer/viewers/benchmark/generator.py - Sample Data (deprecated for production):
src/openadapt_viewer/viewers/benchmark/data.py - CLI:
src/openadapt_viewer/cli.py - Nightshift Recording:
/Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift/ - Episodes:
/Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift/episodes.json
For questions about this policy:
- Check this document first
- Review the real_data_loader.py code
- Examine the nightshift recording structure
- Test with
uv run openadapt-viewer benchmark
OLD BEHAVIOR (UNACCEPTABLE):
uv run openadapt-viewer benchmark --output viewer.html
# Generated fake sample data with random tasksNEW BEHAVIOR (CORRECT):
uv run openadapt-viewer benchmark --output viewer.html
# Generates real nightshift recording with 2 episodes, 6.7s duration, 22 screenshotsPOLICY: ALWAYS use real data by default. Sample data ONLY for unit tests, clearly marked.