Skip to content

feat(SR-173): Visual Debug Report -- HTML+SVG overlay per step (closes #178)#184

Merged
Delqhi merged 2 commits into
mainfrom
feat/sr-173-visual-debug
May 13, 2026
Merged

feat(SR-173): Visual Debug Report -- HTML+SVG overlay per step (closes #178)#184
Delqhi merged 2 commits into
mainfrom
feat/sr-173-visual-debug

Conversation

@Delqhi
Copy link
Copy Markdown
Contributor

@Delqhi Delqhi commented May 13, 2026

SR-173 -- Visual Debug Report (HTML + SVG-Overlay per Step)

Closes #178.

TL;DR

Per-step HTML+SVG-Overlay-Report über das Action-Screenshot. Macht die vier Koordinaten-Misalignment-Bug-Klassen in 5 Sekunden statt 15 Minuten sichtbar:

  1. iFrame-Offset -- AX-Tree liefert iframe-lokale Koords; Click braucht page-Koords (frame_offset Translation).
  2. DPR-Mismatch -- Screenshot ist physical px, AX-Tree CSS px; SVG-Rect muss um DPR skaliert werden.
  3. Scroll stale -- Snapshot bei scrollY=300, Click 200 ms später bei scrollY=400.
  4. z-index Overlay -- Modal eats the click; AX-Tree merkt es nicht, Mensch sieht es sofort.

Jede Bug-Klasse hat einen eigenen Unit-Test (tests/test_visual_debug.py). 12 / 12 grün auf Python 3.13 (0.49 s).

Dateien

Datei Status Zweck
survey-cli/survey/observability/visual_debug.py NEW Kernmodul: Renderer + Dispatcher + Geometry-Primitives + Protocol-Shims für SR-167/168
survey-cli/survey/runner_policy.py NEW Zentrale, immutable, env-driven RunnerPolicy (STEALTH_ENV, VISUAL_DEBUG_*)
survey-cli/survey/observability/__init__.py patched Re-exports der Public-API
survey-cli/survey/safe_executor.py patched Optionaler, failure-isolierter Hook nach jeder Action
survey-cli/tests/test_visual_debug.py NEW 12 Tests: je 1 pro Bug-Klasse + Determinismus + Atomicity + Backpressure + E2E
scripts/build_daily_visual_report.py NEW Daily Aggregator + optional Vercel-Blob-Upload
survey-cli/AGENTS.md patched SR-173 Brain-File-Sektion (Datei-Landkarte, Design-Begründungen, NIEMALS-Regeln, Public-API, Test-Matrix, Operations, Roadmap-Hooks)

State-of-the-art Entscheidungen (begründete Abweichungen vom Briefing)

  1. ThreadPoolExecutor, NICHT asyncio.create_task. safe_executor.SurveyFlowExecutor ist sync (Modul-Docstring: "synchronous websocket ... matches LangGraph node execution"). Es gibt keinen laufenden Event-Loop. Ein bounded ThreadPoolExecutor + BoundedSemaphore ist die korrekte Primitive: non-blocking submit, drop-on-overflow (NIEMALS blockieren), atexit-clean. Die Non-Blocking-Garantie aus SR-173: Visual Debug Report (HTML + SVG-Overlay per Step) #178 ist nicht nur erfüllt -- sie ist härter (asyncio-Tasks können bei saturiertem Loop unbegrenzt queuen; unser Semaphore cappt hart bei max_queue).
  2. runner_policy.py ist NEU, nicht edit. Die Datei existierte auf main nicht.
  3. Protocol-Shims für VerificationResult (SR-167 / feat(governance): SR-159 — path doctrine + AGENTS.md + path-guard workflow (#159) #173) und AttestationResult (SR-168 / test(probe): SR-159 — path-guard failure demonstration (do not merge) #174). Diese PRs sind noch nicht in main. runtime_checkable Protocol mit identischer Field-Shape erlaubt: heute kompilieren + testen; nach Merge der Dependencies ist es ein 1-Zeilen-Import-Swap, kein Runtime-Change.
  4. Point / Box / ElementRef in visual_debug.py, nicht in snapshot.py. YAGNI -- aktuell ein Single-Caller. Promote-TODO ist inline dokumentiert.
  5. blake2b-Sampling, nicht random.random(). Deterministisch pro step_id -- Retries auf denselben Step liefern dieselbe Sample-Entscheidung; kein Double-Counting in Dashboards.

Performance / Kosten

  • ~35 KB pro File (JPEG@70 + SVG + JSON).
  • Prod: 10 % Sampling + 100 % bei Verifier-Fail.
  • Erwartet: ~1 500 Renders/Tag × ~35 KB = ~50 MB/Tag → ~$0.45/Monat auf Vercel Blob.

Operations

# Daily index bauen:
python scripts/build_daily_visual_report.py --date 2026-05-13

# Mit Upload (gibt index-URL auf stdout):
BLOB_READ_WRITE_TOKEN=... \
    python scripts/build_daily_visual_report.py --date 2026-05-13 --upload

Test-Lauf

cd survey-cli && pytest tests/test_visual_debug.py -v
============================== 12 passed in 0.49s ==============================

Compliance

  • Keine BANNED-Methoden referenziert.
  • BANNED-Liste in jeder neuen Source-Datei als Header-Block dokumentiert (AGENTS.md-Konvention).
  • Keine zusätzlichen .md-Dateien erzeugt -- alles inline in den Quellen + Brain-Sektion in survey-cli/AGENTS.md.
  • Keine print()-Debug-Statements im Hot Path (logger-only).
  • Atomare Writes via <final>.<uuid>.tmp + os.replace.
  • Frozen dataclasses (slots=True) -- thread-safe by construction, kein Locking.

Roadmap-Hooks (nach Merge)

Delqhi added 2 commits May 13, 2026 11:22
Implements the per-step Visual Debug Report described in issue #178.

What this PR delivers
=====================
- survey/observability/visual_debug.py  (NEW)
    Self-contained HTML+SVG renderer with embedded JPEG screenshot and
    inline SVG overlay (target bbox + click crosshair). Non-blocking
    dispatcher backed by a bounded ThreadPoolExecutor (drop-on-overflow,
    NEVER blocks the LangGraph hot path). Deterministic sampling via
    blake2b(step_id). Atomic writes via tmp + os.replace.
- survey/runner_policy.py  (NEW)
    Central, immutable, env-driven RunnerPolicy. Per-environment presets
    (prod = 10 % sampling + 100 % on failure; staging/dev = 100 %).
- survey/observability/__init__.py  (re-exports the new public API)
- survey/safe_executor.py  (PATCHED)
    Optional visual_debug_dispatcher + visual_debug_frame_builder kwargs.
    Failure-isolated hook -- executor stability never depends on debug
    pipeline being healthy.
- tests/test_visual_debug.py  (NEW)
    12 tests, 12/12 green on Py 3.13. One test per coordinate-misalignment
    bug class (iframe-offset, DPR-mismatch, scroll-stale, z-index-overlay)
    plus invariants (determinism, atomicity, backpressure, end-to-end).
- scripts/build_daily_visual_report.py  (NEW)
    Daily aggregator: builds index.html with OK/FAIL filters; optional
    Vercel-Blob upload via $BLOB_READ_WRITE_TOKEN.
- survey-cli/AGENTS.md  (UPDATED)
    Adds the SR-173 brain-file section (Datei-Landkarte, Design-
    Entscheidungen mit Begründung, Public-API, NIEMALS-Regeln,
    Test-Matrix, Operations, Roadmap-Hooks for SR-167 / SR-168).

State-of-the-art deviations from the briefing (documented inline)
=================================================================
1. ThreadPoolExecutor instead of asyncio.create_task.
   safe_executor.SurveyFlowExecutor is synchronous (its docstring states:
   "synchronous websocket ... matches LangGraph node execution"). There
   is no running event loop. The non-blocking invariant from the briefing
   is preserved -- and in fact strengthened: BoundedSemaphore guarantees
   hard drop-on-overflow, whereas asyncio tasks can queue indefinitely.
2. runner_policy.py is created NEW. The briefing said "edit" but the file
   did not exist on main.
3. Protocol-based shims for VerificationResult (SR-167 / #173) and
   AttestationResult (SR-168 / #174) -- those PRs aren't on main yet.
   When they merge, a 1-line import swap completes integration.
4. Point/Box/ElementRef are introduced in visual_debug.py, not in
   snapshot.py (YAGNI -- single caller today; promote later if a second
   caller emerges).

Tests
=====
    cd survey-cli && pytest tests/test_visual_debug.py -v
    12 passed in 0.49s on Python 3.13.

Closes #178 (SR-173).
Related: #172 (SR-172 reliability tracker), #173 (SR-167), #174 (SR-168).
The Protocol-Shim TODOs and AGENTS.md brain-section referenced #173/#174 as
the placeholder issue numbers for SR-167/SR-168, but the canonical numbers
are:

  - SR-167 -> issue #167  (Post-Action Verifier Node)
  - SR-168 -> issue #168  (Triple-Channel Attestation)

#173 is actually SR-159 Path Doctrine. Fix all references in:
  - survey/observability/visual_debug.py  (Protocol docstrings + TODOs)
  - survey/runner_policy.py               (related-issues block)
  - survey-cli/AGENTS.md                  (SR-173 brain-section)

No runtime change. Tests still 12/12 green.
@Delqhi Delqhi force-pushed the feat/sr-173-visual-debug branch from f7e9408 to f29bf43 Compare May 13, 2026 11:22
@Delqhi Delqhi merged commit 0029f76 into main May 13, 2026
7 of 11 checks passed
Delqhi added a commit that referenced this pull request May 13, 2026
…merge)

The merge of PR #184 (Visual Debug) accidentally overwrote the
NetworkTuning dataclass and get_network_tuning function that were
added in PR #185 (SR-174 Network Gate).

This restores those exports so network_gate.py can import them.

CEO-Session 2026-05-13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SR-173: Visual Debug Report (HTML + SVG-Overlay per Step)

1 participant