feat(SR-173): Visual Debug Report -- HTML+SVG overlay per step (closes #178)#184
Merged
Conversation
This was referenced May 13, 2026
Implements the per-step Visual Debug Report described in issue #178. What this PR delivers ===================== - survey/observability/visual_debug.py (NEW) Self-contained HTML+SVG renderer with embedded JPEG screenshot and inline SVG overlay (target bbox + click crosshair). Non-blocking dispatcher backed by a bounded ThreadPoolExecutor (drop-on-overflow, NEVER blocks the LangGraph hot path). Deterministic sampling via blake2b(step_id). Atomic writes via tmp + os.replace. - survey/runner_policy.py (NEW) Central, immutable, env-driven RunnerPolicy. Per-environment presets (prod = 10 % sampling + 100 % on failure; staging/dev = 100 %). - survey/observability/__init__.py (re-exports the new public API) - survey/safe_executor.py (PATCHED) Optional visual_debug_dispatcher + visual_debug_frame_builder kwargs. Failure-isolated hook -- executor stability never depends on debug pipeline being healthy. - tests/test_visual_debug.py (NEW) 12 tests, 12/12 green on Py 3.13. One test per coordinate-misalignment bug class (iframe-offset, DPR-mismatch, scroll-stale, z-index-overlay) plus invariants (determinism, atomicity, backpressure, end-to-end). - scripts/build_daily_visual_report.py (NEW) Daily aggregator: builds index.html with OK/FAIL filters; optional Vercel-Blob upload via $BLOB_READ_WRITE_TOKEN. - survey-cli/AGENTS.md (UPDATED) Adds the SR-173 brain-file section (Datei-Landkarte, Design- Entscheidungen mit Begründung, Public-API, NIEMALS-Regeln, Test-Matrix, Operations, Roadmap-Hooks for SR-167 / SR-168). State-of-the-art deviations from the briefing (documented inline) ================================================================= 1. ThreadPoolExecutor instead of asyncio.create_task. safe_executor.SurveyFlowExecutor is synchronous (its docstring states: "synchronous websocket ... matches LangGraph node execution"). There is no running event loop. The non-blocking invariant from the briefing is preserved -- and in fact strengthened: BoundedSemaphore guarantees hard drop-on-overflow, whereas asyncio tasks can queue indefinitely. 2. runner_policy.py is created NEW. The briefing said "edit" but the file did not exist on main. 3. Protocol-based shims for VerificationResult (SR-167 / #173) and AttestationResult (SR-168 / #174) -- those PRs aren't on main yet. When they merge, a 1-line import swap completes integration. 4. Point/Box/ElementRef are introduced in visual_debug.py, not in snapshot.py (YAGNI -- single caller today; promote later if a second caller emerges). Tests ===== cd survey-cli && pytest tests/test_visual_debug.py -v 12 passed in 0.49s on Python 3.13. Closes #178 (SR-173). Related: #172 (SR-172 reliability tracker), #173 (SR-167), #174 (SR-168).
The Protocol-Shim TODOs and AGENTS.md brain-section referenced #173/#174 as the placeholder issue numbers for SR-167/SR-168, but the canonical numbers are: - SR-167 -> issue #167 (Post-Action Verifier Node) - SR-168 -> issue #168 (Triple-Channel Attestation) #173 is actually SR-159 Path Doctrine. Fix all references in: - survey/observability/visual_debug.py (Protocol docstrings + TODOs) - survey/runner_policy.py (related-issues block) - survey-cli/AGENTS.md (SR-173 brain-section) No runtime change. Tests still 12/12 green.
f7e9408 to
f29bf43
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SR-173 -- Visual Debug Report (HTML + SVG-Overlay per Step)
Closes #178.
TL;DR
Per-step HTML+SVG-Overlay-Report über das Action-Screenshot. Macht die vier Koordinaten-Misalignment-Bug-Klassen in 5 Sekunden statt 15 Minuten sichtbar:
frame_offsetTranslation).scrollY=300, Click 200 ms später beiscrollY=400.Jede Bug-Klasse hat einen eigenen Unit-Test (
tests/test_visual_debug.py). 12 / 12 grün auf Python 3.13 (0.49 s).Dateien
survey-cli/survey/observability/visual_debug.pysurvey-cli/survey/runner_policy.pyRunnerPolicy(STEALTH_ENV,VISUAL_DEBUG_*)survey-cli/survey/observability/__init__.pysurvey-cli/survey/safe_executor.pysurvey-cli/tests/test_visual_debug.pyscripts/build_daily_visual_report.pysurvey-cli/AGENTS.mdState-of-the-art Entscheidungen (begründete Abweichungen vom Briefing)
ThreadPoolExecutor, NICHTasyncio.create_task.safe_executor.SurveyFlowExecutorist sync (Modul-Docstring: "synchronous websocket ... matches LangGraph node execution"). Es gibt keinen laufenden Event-Loop. Ein boundedThreadPoolExecutor+BoundedSemaphoreist die korrekte Primitive: non-blocking submit, drop-on-overflow (NIEMALS blockieren), atexit-clean. Die Non-Blocking-Garantie aus SR-173: Visual Debug Report (HTML + SVG-Overlay per Step) #178 ist nicht nur erfüllt -- sie ist härter (asyncio-Tasks können bei saturiertem Loop unbegrenzt queuen; unser Semaphore cappt hart beimax_queue).runner_policy.pyist NEU, nichtedit. Die Datei existierte aufmainnicht.VerificationResult(SR-167 / feat(governance): SR-159 — path doctrine + AGENTS.md + path-guard workflow (#159) #173) undAttestationResult(SR-168 / test(probe): SR-159 — path-guard failure demonstration (do not merge) #174). Diese PRs sind noch nicht inmain.runtime_checkable Protocolmit identischer Field-Shape erlaubt: heute kompilieren + testen; nach Merge der Dependencies ist es ein 1-Zeilen-Import-Swap, kein Runtime-Change.Point/Box/ElementRefinvisual_debug.py, nicht insnapshot.py. YAGNI -- aktuell ein Single-Caller. Promote-TODO ist inline dokumentiert.blake2b-Sampling, nichtrandom.random(). Deterministisch prostep_id-- Retries auf denselben Step liefern dieselbe Sample-Entscheidung; kein Double-Counting in Dashboards.Performance / Kosten
Operations
Test-Lauf
Compliance
AGENTS.md-Konvention)..md-Dateien erzeugt -- alles inline in den Quellen + Brain-Sektion insurvey-cli/AGENTS.md.print()-Debug-Statements im Hot Path (logger-only).<final>.<uuid>.tmp+os.replace.slots=True) -- thread-safe by construction, kein Locking.Roadmap-Hooks (nach Merge)
VerificationResultLike→VerificationResult(1-Zeilen-Swap, inline TODO markiert).AttestationResultLike.