Skip to content

Commit 8e2e630

Browse files
QA pass: supersede findings.md + dashboard log-scan defense for P0-2 + CLI spec (#366)
* chore(docs): supersede findings.md with punch-list.md The 2026-05-14 QA session was run in parallel from two worktrees. Each produced its own QA artifact: - findings.md (224 lines, 9 items) — from this worktree - punch-list.md (502 lines, 25+ items) — from a sibling worktree punch-list.md is strictly more comprehensive — every finding in findings.md has a counterpart in punch-list.md, but punch-list.md catches several critical defects findings.md missed (most notably P0-2: failed runs incorrectly show as completed with green chips). Delete findings.md so the punch-list is the unambiguous source of truth. Future QA cycles add to punch-list.md (or supersede with a new dated punch-list under docs/specs/ops-dashboard-qa-YYYY-MM-DD/). Companion artifacts live in docs/specs/ops-dashboard-polish/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ops): defense-in-depth log scan for exit-0 runs that emitted failure signals Tackles P0-2 of the 2026-05-14 ops-dashboard QA punch list. Two pieces of work in one PR — they're explicit companions: ## 1. Dashboard side (visible fix) When the SSE stream's "done" event arrives with status= completed and exit_code=0, scan the accumulated log buffer for unambiguous failure-leak signals: - Python traceback header ('Traceback (most recent call last):') - Workflow voice-layer 'What Went Wrong' / 'This one didn't go as planned' block - Line-anchored Python exception class ('XError:' / 'XException:') If any matches, downgrade the chip from chip-ok (green) to chip-warn (yellow), update the status label to 'completed with errors (<signal>)', and add a data-tooltip pointing users at the punch-list reference. Conservative pattern matching only — no false positives on docstrings or success-state output that happens to discuss errors. Buffer is bounded at 256 KB (enough for typical runs, wraps for very long-running workflows). ## 2. CLI side (spec) Companion spec at docs/specs/workflow-failure-exit-propagation/ documenting the upstream fix: 'attune workflow run' should exit non-zero when the workflow's WorkflowResult.success is False or when an uncaught exception is swallowed at the CLI boundary. Phase 1 (requirements) + decisions matrix drafted. Design / tasks deferred to execution session. Pre-committed exit-code contract: - 0 : success - 1 : planned failure (WorkflowResult.success is False) - 2 : unplanned failure (uncaught exception) - 3 : CLI-level error (workflow not found, bad path) Once the CLI fix lands, the dashboard log-scan defense becomes redundant — decisions.md says keep for one release cycle then remove. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ops): P1-2 spec detail renders markdown + P1-5 Recent strip excludes current run Two of the P1 items from the 2026-05-14 QA punch list. ## P1-2 — Spec detail page renders markdown spec_detail_page now renders each phase's content via markdown-it-py (CommonMark mode, html=False) and passes the rendered HTML to the template under `rendered` instead of the raw `contents` dict. Template wraps the HTML in <div class="spec-phase-body markdown-body"> with `|safe`. Security: `html=False` means raw <script> tags are escaped to text rather than evaluated. Spec content is repo-author- controlled but the property holds anyway. Fallback path: if markdown-it raises, render an empty body rather than 500ing. CSS: new .markdown-body selector with rules for h1-h4, p, ul, ol, code, pre, table, blockquote, hr, a — minimal styling that matches the dashboard's existing visual language. ## P1-5 — Recent strip excludes current run runner.js's setupRecentRuns now reads an optional data-exclude-run-id attribute on its container. When present, the matching run is filtered out before rendering. run_view.html sets the attribute to {{ run.id }} so the strip shows OTHER recent runs of the same workflow rather than including the run the user is already looking at. Tested: P1-2 — /specs/ignored-tests now renders <h1>, <h2>, <code>, etc. instead of raw '# heading' text. ## Status of remaining P1 items P1-1 — resolved by #358 (already merged). P1-3 — already fixed on main (.scope-custom:not([hidden])). P1-4 — needs a spec; will draft separately per Patrick's confirmation that Memory and Sessions pages are planned- but-unimplemented (interpretation A from the punch list). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent b48c006 commit 8e2e630

9 files changed

Lines changed: 378 additions & 241 deletions

File tree

docs/specs/ops-dashboard-qa-2026-05-14/findings.md

Lines changed: 0 additions & 224 deletions
This file was deleted.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Spec: Workflow Failure Exit Propagation — Decisions
2+
3+
> Pre-committed decisions. Updates to this file after implementation
4+
> begins require a follow-up PR with rationale.
5+
6+
---
7+
8+
## Decision matrix
9+
10+
| Decision | Choice | Rationale |
11+
|---|---|---|
12+
| Exit code on planned failure (`WorkflowResult.success is False`) | `1` | Conventional "non-zero means failure"; distinguishes from CLI errors. |
13+
| Exit code on unplanned failure (uncaught exception) | `2` | Lets shell scripts branch — `2` is "process crashed", `1` is "process said no". Conventional in many CLIs (e.g. grep). |
14+
| Exit code on CLI-level error (workflow not found, bad path) | `3` | Reserved for argument/config errors, separable from workflow execution problems. |
15+
| Backwards-compat for shell scripts relying on the previous (buggy) exit-0 on failure | **Not preserved.** | The exit-0-on-failure state is the bug, not a contract. Preserving it would perpetuate the original problem from the dashboard's P0-2 finding. |
16+
| Dashboard log-scan workaround (PR landed alongside spec draft) | Keep for one release cycle after this spec lands; mark for removal in the release after. | Defense in depth while CLI fix rolls out; safe to retire once telemetry confirms exit codes are correct. |
17+
| Workflow voice-layer "What Went Wrong" block emission | Continues to coexist with exit-1 / exit-2. | Block is informational for users; exit code is the contract for scripts. They're independent. |
18+
| Test coverage requirement | Table-driven test in `tests/unit/cli/test_workflow_exit_codes.py` covering all four exit codes. | Mirrors test-quality-program rubric's "tests must name the branch they cover." |
19+
20+
---
21+
22+
## Open questions (resolve during design phase)
23+
24+
1. **`_run_workflow_with_exit_code()` helper location.** Could live
25+
in `cli_minimal.py` or in a new `cli_commands/_exit_codes.py`.
26+
The latter is cleaner long-term; the former is one less file
27+
to wire up. Decide during design.
28+
29+
2. **Logging interaction.** When exit code is `2`, the traceback is
30+
already printed to stderr by Python's default exception handler.
31+
Do we also want a structured `--json` mode for CI consumers?
32+
Defer to design.
33+
34+
3. **Migration message.** Should the CHANGELOG entry include a
35+
one-line shell snippet for shop scripts that need to adapt?
36+
E.g. `if attune workflow run X || [ $? -eq 1 ]; then …`. Decide
37+
when CHANGELOG is drafted.
38+
39+
---
40+
41+
## Decision-change log
42+
43+
> Append entries here when a decision above is revised. Reference
44+
> the PR that revised it.
45+
46+
- 2026-05-14 — Initial decisions captured during spec draft.
47+
Triggered by ops-dashboard QA punch list item P0-2; companion
48+
defense-in-depth landed in dashboard log-scan PR (see
49+
`src/attune/ops/static/js/run_view.js` `detectLogErrorLeak`).
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Spec: Workflow Failure Exit Propagation
2+
3+
> Make `attune workflow run` exit with a non-zero status when the
4+
> underlying workflow's `WorkflowResult.success` is False (or when
5+
> an uncaught exception inside the SDK adapter is swallowed at the
6+
> CLI boundary).
7+
8+
---
9+
10+
## Phase 1: Requirements
11+
12+
**Status**: draft
13+
14+
### Problem statement
15+
16+
The CLI dispatcher (`attune workflow run <name>`) currently returns
17+
exit code `0` on workflows that internally failed. Evidence:
18+
19+
1. The ops dashboard's QA punch list (2026-05-14, P0-2) caught this:
20+
`/runs/f8ed53713add/view` rendered with a **green "completed (exit 0)"**
21+
chip despite the log containing a `claude_agent_sdk` Exception
22+
plus the workflow-emitted "What Went Wrong" voice-layer block.
23+
2. Patrick hit the same shape this session running
24+
`perf-audit` from the dashboard — the workflow's SDK call
25+
raised, the CLI swallowed it, the dashboard reported success.
26+
3. The Specs page QA findings (B1, B2 — fixed) and the dashboard
27+
log-scan workaround (PR #N — temporary) all exist *because*
28+
the CLI side doesn't propagate failure.
29+
30+
The dashboard log-scan is a defense-in-depth bandage: it scans for
31+
`Traceback`, `What Went Wrong`, and `<Class>Error:` tokens to flag
32+
chip-warn instead of chip-ok. Real fix is at the source — `attune
33+
workflow run` should exit non-zero when the work failed.
34+
35+
### Scope
36+
37+
**In scope:**
38+
39+
- Audit every code path in `src/attune/cli_minimal.py` (and any other
40+
CLI entry point that exposes `workflow run`) that calls a workflow's
41+
`execute()` and discards the `WorkflowResult.success` field when
42+
setting the process exit code.
43+
- Define the exit-code contract:
44+
- `0` — workflow ran AND `WorkflowResult.success is True`
45+
- `1` — workflow ran AND `WorkflowResult.success is False` (planned
46+
failure)
47+
- `2` — workflow ran AND raised an uncaught exception (unplanned
48+
failure) — distinguish from `1` so scripts can branch
49+
- `3` — CLI-level error (workflow not found, bad path, etc.)
50+
- Update the workflow-result-formatting voice-layer block emission
51+
so the "What Went Wrong" section *requires* a non-zero exit (it's
52+
currently emitted on failed workflows even when the CLI exits 0,
53+
which is the broken state).
54+
- Ensure dashboard-side: when the new contract lands, the ops
55+
dashboard log-scan can be downgraded from P0-priority defense to
56+
a P3 nice-to-have (or removed if it becomes redundant).
57+
- Add a test for each exit-code path (table-driven test in
58+
`tests/unit/cli/test_workflow_exit_codes.py`).
59+
60+
**Out of scope:**
61+
62+
- Restructuring how workflows report failure internally
63+
(`WorkflowResult.success`, error message format, etc.) — that's
64+
upstream.
65+
- Changing dashboard-side rendering — the existing chip color
66+
scheme is fine once exit codes are correct.
67+
- Migration path for tools that may currently rely on the buggy
68+
exit-0 behavior — explicit non-goal; this fix is correctness
69+
before backwards-compat.
70+
71+
### Acceptance criteria
72+
73+
1. `attune workflow run security-audit --path /nonexistent/path`
74+
exits **3** (CLI-level error).
75+
2. A workflow that internally returns `WorkflowResult(success=False,
76+
...)` exits **1**.
77+
3. A workflow whose `execute()` raises an uncaught exception exits
78+
**2** with the traceback printed to stderr.
79+
4. Existing passing workflows exit **0** with no behavior change.
80+
5. The ops dashboard's `/runs/<id>/view` log-scan-fallback fires on
81+
ZERO runs after the fix lands (because no exit-0 run carries
82+
failure signals anymore — defense in depth becomes redundant).
83+
6. The CHANGELOG documents the new exit codes as a behavior change.
84+
85+
### Non-goals / explicitly deferred
86+
87+
- **Auto-retry on exit code 2.** The SDK has its own retry logic;
88+
CLI shouldn't second-guess it.
89+
- **Splitting WorkflowResult.success into multiple failure
90+
categories.** Out of scope; this spec accepts the boolean field
91+
as-is.
92+
93+
### Decision matrix (pre-committed)
94+
95+
Per the `feedback_color_and_hover_affordances` / "decision matrices
96+
survive contact with data" lesson, these are decided up front:
97+
98+
| Decision | Choice |
99+
|---|---|
100+
| Exit code on planned failure | `1` |
101+
| Exit code on unplanned failure (uncaught exception) | `2` |
102+
| Exit code on CLI-level error | `3` |
103+
| Backwards-compat for shell scripts relying on exit-0 on failure | **Not preserved.** Failure must propagate. CHANGELOG notes the change. |
104+
| Dashboard log-scan workaround | Keep for one release after this lands; remove in the release after. |
105+
| Workflow voice-layer "What Went Wrong" block | Required to coexist with exit-1 (and exit-2). Block is informational; exit code is the contract. |
106+
107+
### Risks
108+
109+
| Risk | Severity | Mitigation |
110+
|---|---|---|
111+
| CI scripts using `attune workflow run` in shell pipelines may break (`set -e` would now propagate failure) | Low–Medium | This is the CORRECT behavior. CHANGELOG note + a deprecation cycle wasn't justified — the buggy state IS the bug. |
112+
| Differentiating exit-1 (planned) vs exit-2 (unplanned) requires the CLI to catch and classify all SDK exceptions cleanly | Medium | Build a single `_run_workflow_with_exit_code()` helper that wraps the call; centralize the classification logic. |
113+
| Dashboard's existing log-scan fallback could mask the issue post-fix (false-positive on legit error-message-discussing workflows) | Low | Plan to remove the log-scan in a follow-up release once the new exit codes are observed in telemetry. |

0 commit comments

Comments
 (0)