From 2b7bcd82f87c21f3063af252d9c7cc5ba326a800 Mon Sep 17 00:00:00 2001 From: Erik LaBianca Date: Sun, 31 May 2026 10:04:05 -0400 Subject: [PATCH] test: fix two pre-existing `just test` failures 1. helix-e3c7d0e4 (Epic: per-runtime integration tests) was missing a `` block in its description. The bead is still open, but all 5 child runtime beads (INT-CC/INT-CX/INT-CP/INT-GN/INT-DD) are closed. Add a digest summarizing the epic's governing context (principles, concerns, practices, governing artifacts) so `tests/validate-context-digests.sh` passes. 2. tests/validate-demo-fixtures.sh was checking a SHA-256 coherence between docs/demos/helix-concerns/demo.sh's REVIEW_PROMPT heredoc and an agent-dictionary/.json fixture. Commit 9e7f734d ("demos(concerns): add drift fixture + README; drop legacy CLI artifacts") deleted demo.sh and the agent-dictionary entirely in favor of a recorded session.jsonl. The test has been failing on main since that commit and its premise (prompt-hash fixture coherence) no longer applies. Remove the test + justfile wiring. `just test` is now fully green. Co-Authored-By: Claude Opus 4.7 (1M context) --- .ddx/beads.jsonl | 2 +- justfile | 6 +-- tests/validate-demo-fixtures.sh | 89 --------------------------------- 3 files changed, 2 insertions(+), 95 deletions(-) delete mode 100644 tests/validate-demo-fixtures.sh diff --git a/.ddx/beads.jsonl b/.ddx/beads.jsonl index 06d258bb..b587240d 100644 --- a/.ddx/beads.jsonl +++ b/.ddx/beads.jsonl @@ -591,7 +591,7 @@ {"acceptance":"`.claude-plugin/plugin.json` `description` field does not contain the substring 'supervisory autopilot' and matches the canonical wording from FEAT-013 [ADAPT-01]; `skills/helix/SKILL.md` frontmatter `description:` is consistent with plugin.json.","created_at":"2026-05-16T04:11:52.994564533Z","description":"\u003ccontext-digest\u003e\n\u003cprinciples\u003eDesign for Change · Design for Simplicity · Validate Your Work · Make Intent Explicit · Prefer Reversible Decisions\u003c/principles\u003e\n\u003cconcerns\u003eagentskills-spec | multi-runtime-distribution | genie-no-chat-api | docker-screencast-harness\u003c/concerns\u003e\n\u003cpractices\u003eagentskills.io spec: parent dir name equals SKILL.md `name:` · DDx and Claude Code share `.claude-plugin/plugin.json` · Genie expects skill bundle at /Workspace/.assistant/skills/\u003cname\u003e/ with no wrapper subdir · Static install checks free; functional checks gated by TEST_FUNCTIONAL=1 · vhs `.tape` files for terminal screencast reproducibility\u003c/practices\u003e\n\u003cgoverning\u003eFEAT-013 ADAPT-01 · .claude-plugin/plugin.json · skills/helix/SKILL.md\u003c/governing\u003e\n\u003c/context-digest\u003e\n\nThe plugin manifest description still reads 'HELIX development control system — supervisory autopilot for AI-assisted software delivery'. This predates the collapse to content + one skill. Replace with: 'HELIX methodology, artifact catalog, and routing skill for AI-assisted development.'\n\nVerify `skills/helix/SKILL.md` frontmatter `description:` already matches; if not, align it. This description string is propagated to multiple adapter surfaces (Claude Code marketplace, Copilot instructions, Genie SKILL.md).","events_attachment":"helix-e362ecf1/events.jsonl","id":"helix-e362ecf1","issue_type":"task","labels":["helix","activity:build","kind:implementation","area:packaging"],"parent":"helix-91f51ff8","priority":0,"schema_version":1,"spec-id":"FEAT-013","status":"closed","title":"Fix .claude-plugin/plugin.json description (drop 'supervisory autopilot' wording)","updated_at":"2026-05-16T04:22:44.915692162Z"} {"acceptance":"workflows/EXECUTION.md explicitly states that actionable review findings are filed with review-finding plus scope-appropriate area:* labels, and git diff --check passes.","claimed-at":"2026-04-10T23:10:51Z","claimed-machine":"eitri","claimed-pid":"1258214","created_at":"2026-04-10T20:01:04.38975514Z","description":"Review finding from fresh-eyes review.\nFile: workflows/EXECUTION.md:367\nCategory: drift\nSeverity: medium\nDescription: The execution contract still summarizes post-implementation review findings as tracker issues with only the review-finding label, but the fresh-eyes review action now requires each actionable finding to carry at least one scope-appropriate area:* label. This leaves workflows/EXECUTION.md inconsistent with the governing review workflow and with the reviewed bead acceptance text that says the workflow documentation should reflect the requirement deterministically.\nSuggested fix: Update the helix run and Fresh-Eyes Review sections in workflows/EXECUTION.md so they explicitly state that review-filed findings include review-finding plus scope-appropriate area:* labels derived from the reviewed bead or scope.","id":"helix-e393791b","issue_type":"task","labels":["helix","activity:build","review-finding","area:workflow","area:docs"],"notes":"\u003cmeasure-results\u003e\n \u003ctimestamp\u003e2026-04-10T23:17:39Z\u003c/timestamp\u003e\n \u003cstatus\u003ePASS\u003c/status\u003e\n \u003cacceptance\u003e\n \u003ccriterion name='execution contract documents review-finding area labels' status='pass' evidence='Updated workflows/EXECUTION.md in both the helix run summary and Fresh-Eyes Review section to require review-finding plus scope-appropriate area:* labels derived from the reviewed bead or scope.'/\u003e\n \u003ccriterion name='diff hygiene' status='pass' evidence='git diff --check'/\u003e\n \u003c/acceptance\u003e\n \u003cgates\u003e\n \u003cgate concern='workflow-docs' command='git diff --check' status='pass'/\u003e\n \u003cgate concern='workflow-contract' command='bash tests/helix-cli.sh' status='pass'/\u003e\n \u003c/gates\u003e\n \u003cratchets/\u003e\n\u003c/measure-results\u003e","owner":"erik","priority":0,"schema_version":1,"spec-id":"workflows/EXECUTION.md","status":"closed","title":"drift: document area labels for review findings in execution contract","updated_at":"2026-04-10T23:18:08.451478757Z"} {"acceptance":"docs/README.md no longer states that shared workflow references live under `docs/resources/`; the file points readers to `workflows/references/` for authoritative HELIX workflow references or clearly distinguishes `docs/resources/` as non-authoritative background material; git diff --check passes","claimed-at":"2026-04-11T04:45:50Z","claimed-machine":"eitri","claimed-pid":"2885056","created_at":"2026-04-10T23:08:01.453453279Z","description":"\u003ccontext-digest\u003e\n\u003cprinciples\u003eDesign for Change · Design for Simplicity · Validate Your Work · Make Intent Explicit · Prefer Reversible Decisions\u003c/principles\u003e\n\u003cconcerns\u003ehugo-hextra | demo-asciinema\u003c/concerns\u003e\n\u003cpractices\u003eDefine the site's target audience and what content they need · Theme version: Hextra v0.12.1 — pinned in website/go.mod · Identify which workflows need demo reels — prioritize the \"first 5 minutes\" experience · Current demos: helix-quickstart (full lifecycle), helix-concerns (drift detection), helix-evolve (requirement threading), helix-experiment (metric-driven optimization)\u003c/practices\u003e\n\u003cgoverning\u003edocs/README.md — The documentation structure mirrors the HELIX workflow phases, creating a logical progression from problem definition through deployment. This organization ensures that:\u003c/governing\u003e\n\u003c/context-digest\u003e\n\nReview finding from fresh-eyes review.\nFile: docs/README.md:73\nCategory: correctness\nSeverity: medium\nDescription: The commit rewrites the legacy-path guidance, but the new sentence now says shared workflow references live under `docs/resources/`. In this repository the authoritative HELIX workflow references are under `workflows/references/`, while `docs/resources/` contains background/reference reading. A reader following the updated README is still pointed at the wrong source-of-truth location for workflow reference material.\nSuggested fix: Update the sentence to point shared workflow references at `workflows/references/`, and describe `docs/resources/` as supporting background material if that distinction is intended.","id":"helix-e3988823","issue_type":"task","labels":["helix","activity:build","review-finding","area:docs","area:workflow"],"notes":"\u003cmeasure-results\u003e\n \u003ctimestamp\u003e2026-04-11T04:49:00Z\u003c/timestamp\u003e\n \u003cstatus\u003ePASS\u003c/status\u003e\n \u003cacceptance\u003e\n \u003ccriterion name='workflow-reference-location-corrected' status='pass' evidence='docs/README.md now points authoritative shared workflow references at workflows/references/ and distinguishes docs/resources/ as supporting background material.'/\u003e\n \u003ccriterion name='diff-hygiene' status='pass' evidence='git diff --check passed after the docs/README.md update.'/\u003e\n \u003c/acceptance\u003e\n \u003cgates\u003e\n \u003cgate concern='workflow-docs' command='git diff --check' status='pass'/\u003e\n \u003c/gates\u003e\n \u003cratchets/\u003e\n\u003c/measure-results\u003e","owner":"erik","priority":0,"schema_version":1,"spec-id":"docs/README.md","status":"closed","title":"correctness: point docs README workflow references at workflows/references","updated_at":"2026-04-11T04:48:14.146945762Z"} -{"acceptance":"All 5 children closed. Each runtime has:\n- tests/workflows/\u003cruntime\u003e/run-scenarios.sh exits 0 with runtime credentials present (creds mounted, not a specific API-key env var).\n- An L2 activation assertion backed by a negative control: running the same scenario with the skill uninstalled exits NONZERO. No child gates solely on a free-text keyword grep.\n- Chat-capable runtimes: tests/workflows/\u003cruntime\u003e/evals/routing.jsonl present and exercised.\n- tests/workflows/\u003cruntime\u003e/recordings/ with a committed recording (gif/cast/webm).\n- CI: PR CI runs L1 (verify-skill-layout) only; tag-push CI runs the functional harness with creds.\n- docs/install/\u003cruntime\u003e.md links the recording as a screencast.","created_at":"2026-05-21T13:29:01.485963679Z","description":"Operationalize \"HELIX installs easily, works as a skill, and the basic workflow works\" as five per-runtime integration tests, each producing a screencast for the install docs and release notes.\n\n**Test contract (v2 — supersedes prose-grep).** A test must assert that the HELIX skill *actually activated*, using the strongest signal the runtime exposes — never a free-text keyword grep of the model's prose. Rationale: a model will mention \"frame\"/\"align\"/\"review\" without the skill ever loading. Verified 2026-05-22 that `claude -p \"/helix frame ...\"` answered by improvising with subagents while the helix Skill tool never fired — a prose grep would have falsely PASSED.\n\nLayers every child implements:\n- L1 Packaging (static, PR CI): tests/install/shared/verify-skill-layout.sh (already exists).\n- L2 Activation (functional, creds-gated): prove the skill engaged via a structured/behavioral signal — claude-code/codex parse the structured event stream (`--output-format stream-json` / codex equivalent) for a tool-use event naming the `helix` skill; ddx uses `ddx work --once --json` routing decision + execution-evidence artifacts; copilot/genie use a structured exit/DOM signal where the surface exposes one. Every L2 assertion MUST be backed by a negative control: the same scenario with the skill uninstalled must FAIL. A test that still passes without the skill installed is invalid — that is the prose-grep failure mode. No bare keyword grep as the sole gate.\n- L3 Routing evals (deterministic): chat-capable runtimes ship tests/workflows/\u003cruntime\u003e/evals/routing.jsonl (phrase -\u003e expected mode), mirroring .claude/skills/ddx/evals/routing.jsonl.\n- Evidence (not a gate): a committed recording (vhs gif / Playwright webm). TUI/PTY realism belongs here, not in the assertion. Do NOT drive the runtime via tmux/PTY screen-scraping for the pass/fail gate.\n\n**Auth.** Tests use the operator's real runtime credentials mounted into the container (claude/codex OAuth token files, GITHUB_TOKEN, Genie DBAUTH cookie) — NOT a hard dependency on a specific API-key env var. PR CI runs L1 only; tag-push functional CI runs L2+L3 with creds present.\n\n**Scope per child** (narrow for v1; full TP-014 coverage is FEAT-014's separate scope):\n1. Install verification — runtime CLI starts, skill is discoverable (L1)\n2. Skill activation — proven via L2 structured signal + negative control\n3. One end-to-end workflow scenario — TP-014 SCN-01 (sparse intent -\u003e vision + PRD)\n4. Recording captured and committed\n\n**Out of scope** (defer to FEAT-014): other TP-014 scenarios; cross-runtime parity benchmarking; new skill modes.\n\n**Why these 5 children exist** (vs reusing the 25 phantom-closed FEAT-014 children): those 25 closed under prior worker runs whose commits were in a local divergence no longer on origin/main; the artifacts they claimed don't exist on disk. Re-doing the work cleanly here is simpler than reopening 25 stale beads.","id":"helix-e3c7d0e4","issue_type":"epic","labels":["helix","phase:test","activity:test","kind:test","area:testing","area:packaging"],"priority":1,"schema_version":1,"spec-id":"helix.installer-integration-tests","status":"open","title":"Epic: per-runtime integration tests (install + skill + one workflow) with screencast capture","updated_at":"2026-05-22T20:33:15.903893635Z"} +{"acceptance":"All 5 children closed. Each runtime has:\n- tests/workflows/\u003cruntime\u003e/run-scenarios.sh exits 0 with runtime credentials present (creds mounted, not a specific API-key env var).\n- An L2 activation assertion backed by a negative control: running the same scenario with the skill uninstalled exits NONZERO. No child gates solely on a free-text keyword grep.\n- Chat-capable runtimes: tests/workflows/\u003cruntime\u003e/evals/routing.jsonl present and exercised.\n- tests/workflows/\u003cruntime\u003e/recordings/ with a committed recording (gif/cast/webm).\n- CI: PR CI runs L1 (verify-skill-layout) only; tag-push CI runs the functional harness with creds.\n- docs/install/\u003cruntime\u003e.md links the recording as a screencast.","created_at":"2026-05-21T13:29:01.485963679Z","description":"\u003ccontext-digest\u003e\n\u003cprinciples\u003eValidate Your Work · Runtime Neutrality · Design for Simplicity\u003c/principles\u003e\n\u003cconcerns\u003einstaller-coverage | screencast-evidence | runtime-parity | activation-signal-rigor\u003c/concerns\u003e\n\u003cpractices\u003eL1 packaging static check + L2 structured-activation signal + L3 routing evals; negative control mandatory (skill-uninstalled run must FAIL); evidence recording (vhs gif / Playwright webm) is not the gate; never bare keyword grep as the sole assertion\u003c/practices\u003e\n\u003cgoverning\u003edocs/helix/03-test/test-plans/TP-014-helix-workflow-coverage.md · docs/install/\u003cruntime\u003e.md · child beads INT-CC/INT-CX/INT-CP/INT-GN/INT-DD\u003c/governing\u003e\n\u003c/context-digest\u003e\n\nOperationalize \"HELIX installs easily, works as a skill, and the basic workflow works\" as five per-runtime integration tests, each producing a screencast for the install docs and release notes.\n\n**Test contract (v2 — supersedes prose-grep).** A test must assert that the HELIX skill *actually activated*, using the strongest signal the runtime exposes — never a free-text keyword grep of the model's prose. Rationale: a model will mention \"frame\"/\"align\"/\"review\" without the skill ever loading. Verified 2026-05-22 that `claude -p \"/helix frame ...\"` answered by improvising with subagents while the helix Skill tool never fired — a prose grep would have falsely PASSED.\n\nLayers every child implements:\n- L1 Packaging (static, PR CI): tests/install/shared/verify-skill-layout.sh (already exists).\n- L2 Activation (functional, creds-gated): prove the skill engaged via a structured/behavioral signal — claude-code/codex parse the structured event stream (`--output-format stream-json` / codex equivalent) for a tool-use event naming the `helix` skill; ddx uses `ddx work --once --json` routing decision + execution-evidence artifacts; copilot/genie use a structured exit/DOM signal where the surface exposes one. Every L2 assertion MUST be backed by a negative control: the same scenario with the skill uninstalled must FAIL. A test that still passes without the skill installed is invalid — that is the prose-grep failure mode. No bare keyword grep as the sole gate.\n- L3 Routing evals (deterministic): chat-capable runtimes ship tests/workflows/\u003cruntime\u003e/evals/routing.jsonl (phrase -\u003e expected mode), mirroring .claude/skills/ddx/evals/routing.jsonl.\n- Evidence (not a gate): a committed recording (vhs gif / Playwright webm). TUI/PTY realism belongs here, not in the assertion. Do NOT drive the runtime via tmux/PTY screen-scraping for the pass/fail gate.\n\n**Auth.** Tests use the operator's real runtime credentials mounted into the container (claude/codex OAuth token files, GITHUB_TOKEN, Genie DBAUTH cookie) — NOT a hard dependency on a specific API-key env var. PR CI runs L1 only; tag-push functional CI runs L2+L3 with creds present.\n\n**Scope per child** (narrow for v1; full TP-014 coverage is FEAT-014's separate scope):\n1. Install verification — runtime CLI starts, skill is discoverable (L1)\n2. Skill activation — proven via L2 structured signal + negative control\n3. One end-to-end workflow scenario — TP-014 SCN-01 (sparse intent -\u003e vision + PRD)\n4. Recording captured and committed\n\n**Out of scope** (defer to FEAT-014): other TP-014 scenarios; cross-runtime parity benchmarking; new skill modes.\n\n**Why these 5 children exist** (vs reusing the 25 phantom-closed FEAT-014 children): those 25 closed under prior worker runs whose commits were in a local divergence no longer on origin/main; the artifacts they claimed don't exist on disk. Re-doing the work cleanly here is simpler than reopening 25 stale beads.","id":"helix-e3c7d0e4","issue_type":"epic","labels":["helix","phase:test","activity:test","kind:test","area:testing","area:packaging"],"priority":1,"schema_version":1,"spec-id":"helix.installer-integration-tests","status":"open","title":"Epic: per-runtime integration tests (install + skill + one workflow) with screencast capture","updated_at":"2026-05-22T20:33:15.903893635Z"} {"acceptance":"Homepage hero displays the generated document-spine helix image in light and dark modes; current text and details link remain unchanged; worked example graph remains separate and unchanged.","claimed-at":"2026-05-12T14:50:03Z","claimed-machine":"eitri","claimed-pid":"2688323","created_at":"2026-05-12T14:49:32.623353723Z","description":"Replace the current homepage hero loop graphic with the generated document-spine helix image variants, using light/dark responsive treatment while preserving the worked example graph.","execute-loop-heartbeat-at":"2026-05-12T14:50:03.325721327Z","id":"helix-e412b1ca","issue_type":"task","labels":["helix","activity:build","website"],"owner":"erik","priority":1,"schema_version":1,"spec-id":"HOME-HERO-IMAGE","status":"closed","title":"Use document-spine hero images on homepage","updated_at":"2026-05-12T14:51:06.831224196Z"} {"acceptance":"Current HELIX docs and live repo state have been reviewed for 'security-monitoring'; a decision is recorded to restore, supersede, or retire it; if restored, the minimum prompt/template bar for 'security-monitoring' is explicit enough to guide reintroduction; if retired or superseded, the canonical replacement or stale references are identified","claimed-at":"2026-04-10T03:33:12Z","claimed-machine":"eitri","claimed-pid":"234196","closing_commit_sha":"485faf1fff86e65291432115740980749bd0c835","created_at":"2026-04-09T21:13:58.726126122Z","description":"\u003ccontext-digest\u003e\n\u003cprinciples\u003eMake Intent Explicit · Design for Change · Validate Your Work\u003c/principles\u003e\n\u003cconcerns\u003eworkflow | docs\u003c/concerns\u003e\n\u003cpractices\u003eCompare deleted artifact intent to current HELIX contract and live repo state before restoring anything · Prefer one canonical artifact per real responsibility · Restore only artifacts whose prompt/template bar is clear and justified\u003c/practices\u003e\n\u003cgoverning\u003eworkflows/conventions.md\u003c/governing\u003e\n\u003c/context-digest\u003e\n\nReview the deleted artifact type 'security-monitoring'. Determine whether its original intent still exists in current HELIX, whether that intent is already covered by another canonical artifact or tracker primitive, and whether 'security-monitoring' should be restored, superseded, or retired. If restoration is warranted, define the minimum acceptable prompt and template bar for 'security-monitoring' so the artifact is not reintroduced as another thin stub.\n\nIntent to validate: Security-focused monitoring setup and alerting guidance for deployed systems.","id":"helix-e44d5282","issue_type":"task","labels":["helix","activity:design","kind:docs","area:artifacts"],"notes":"\u003cmeasure-results\u003e\n \u003ctimestamp\u003e2026-04-10T03:49:35Z\u003c/timestamp\u003e\n \u003cstatus\u003ePASS\u003c/status\u003e\n \u003cdecision\u003eSUPERSEDE\u003c/decision\u003e\n \u003cacceptance\u003e\n \u003ccriterion name='security-monitoring reviewed against live HELIX contract' status='pass' evidence='Compared workflows/conventions.md, workflows/phases/05-deploy/GATE.yaml, workflows/phases/05-deploy/README.md, docs/helix/05-deploy/monitoring-setup.md, and the deleted stub history before deciding.'/\u003e\n \u003ccriterion name='canonical replacement identified' status='pass' evidence='Recorded monitoring-setup as the single deploy-phase artifact that now owns security alerts, audit signals, escalation paths, and compliance-relevant monitoring.'/\u003e\n \u003ccriterion name='stale references removed' status='pass' evidence='Removed security-monitoring from workflows/conventions.md and workflows/phases/05-deploy/GATE.yaml, and documented the supersession decision in docs/helix/05-deploy/monitoring-setup.md.'/\u003e\n \u003c/acceptance\u003e\n \u003cgates\u003e\n \u003cgate concern='workflow-docs' command='rg -n \"security-monitoring\" workflows docs .agents skills' status='pass'/\u003e\n \u003cgate concern='workflow-docs' command='git diff --check' status='pass'/\u003e\n \u003cgate concern='workflow-docs' command='bash tests/validate-skills.sh' status='pass'/\u003e\n \u003cgate concern='workflow-docs' command='lefthook run pre-commit' status='pass'/\u003e\n \u003c/gates\u003e\n\u003c/measure-results\u003e","owner":"erik","parent":"helix-fef22846","priority":0,"schema_version":1,"spec-id":"workflows/conventions.md","status":"closed","title":"Review deleted artifact type: security-monitoring","updated_at":"2026-04-10T03:52:04.610973203Z"} {"acceptance":"All review passes complete; findings filed as beads with scope-appropriate area labels; AGENTS.md updated if needed","claimed-at":"2026-04-11T03:03:47Z","claimed-machine":"eitri","claimed-pid":"2440859","created_at":"2026-04-11T03:03:41.325930979Z","description":"\u003ccontext-digest\u003e\n\u003cprinciples\u003eDesign for Change · Design for Simplicity · Validate Your Work · Make Intent Explicit · Prefer Reversible Decisions\u003c/principles\u003e\n\u003cconcerns\u003ehugo-hextra | demo-asciinema\u003c/concerns\u003e\n\u003cpractices\u003eDefine the site's target audience and what content they need · Theme version: Hextra v0.12.1 — pinned in website/go.mod · Identify which workflows need demo reels — prioritize the \"first 5 minutes\" experience · Current demos: helix-quickstart (full lifecycle), helix-concerns (drift detection), helix-evolve (requirement threading), helix-experiment (metric-driven optimization)\u003c/practices\u003e\n\u003cadrs\u003eADR-001 HELIX Supervisory Control Model — Supervision must remain live to concurrent local operator activity. A running helix-run session must treat tracker and governing-artifact changes as new control input at safe boundaries. It may not assume that the selected issue is still valid at claim time or close time just because it was valid earlier in the loop. Why: HELIX must preserve bounded execution, authority order, tracker-first work management, and direct interactive operation. It must also reduce orchestration burden by autonomously selecting the least-powerful sufficient next action when authority is available. · ADR-002 HELIX Tracker Write Safety Model — The tracker will not pretend to provide arbitrary multi-writer transactional semantics. Instead, it will define the supported local execution model, require explicit detection or prevention of silent lost updates, and make malformed tracker state a surfaced failure rather than something the rest of HELIX must guess around. Why: HELIX needs a conservative local tracker that is safe enough for agent-driven issue refinement and concurrent local supervision. The tracker must surface malformed state explicitly, define what concurrency/conflict guarantees are supported, and make metadata mutation available through first-class commands instead of direct file edits.\u003c/adrs\u003e\n\u003cgoverning\u003e84e05d418c8004d736606d54c3f93d800c79f09b\u003c/governing\u003e\n\u003c/context-digest\u003e\n\nFresh-eyes review of implementation commit 84e05d418c8004d736606d54c3f93d800c79f09b for bead helix-9efef95b. Review target: commit:84e05d418c8004d736606d54c3f93d800c79f09b","id":"helix-e52f390a","issue_type":"task","labels":["helix","kind:review","kind:planning","action:review","area:cli","area:testing","area:workflow"],"notes":"\u003cmeasure-results\u003e\n \u003ctimestamp\u003e2026-04-11T03:08:40Z\u003c/timestamp\u003e\n \u003cstatus\u003eISSUES_FOUND\u003c/status\u003e\n \u003cfindings total=\"2\" filed=\"2\" critical=\"0\" high=\"1\" medium=\"1\" low=\"0\"/\u003e\n \u003cagents-md-updated\u003eNO\u003c/agents-md-updated\u003e\n \u003clearnings-filed\u003e0\u003c/learnings-filed\u003e\n\u003c/measure-results\u003e","owner":"erik","priority":0,"schema_version":1,"spec-id":"84e05d418c8004d736606d54c3f93d800c79f09b","status":"closed","title":"review: commit 84e05d4 helix-9efef95b","updated_at":"2026-04-11T03:08:46.325393205Z"} diff --git a/justfile b/justfile index e93b4a0f..39ee5dfb 100644 --- a/justfile +++ b/justfile @@ -1,7 +1,7 @@ # HELIX development tasks # Run all tests -test: test-deploy-artifacts test-state-rules test-skills test-context-digests test-demo-fixtures +test: test-deploy-artifacts test-state-rules test-skills test-context-digests # Serve the HELIX microsite at the canonical local review URL. website-serve: @@ -23,10 +23,6 @@ test-skills: test-context-digests: bash tests/validate-context-digests.sh -# Validate demo prompt fixtures -test-demo-fixtures: - bash tests/validate-demo-fixtures.sh - # Run all tests and check for stale references check: test lint diff --git a/tests/validate-demo-fixtures.sh b/tests/validate-demo-fixtures.sh deleted file mode 100644 index ba7ed5cd..00000000 --- a/tests/validate-demo-fixtures.sh +++ /dev/null @@ -1,89 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -repo_root="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" -demo_script="$repo_root/docs/demos/helix-concerns/demo.sh" -fixture_dir="$repo_root/docs/demos/helix-concerns/agent-dictionary" - -fail() { - printf 'demo fixture validation failed: %s\n' "$*" >&2 - exit 1 -} - -assert_fixture_matches_demo_prompt() { - local script_path="$1" - local agent_dictionary_dir="$2" - local expected_command="$3" - local stale_command="$4" - - python3 - "$script_path" "$agent_dictionary_dir" "$expected_command" "$stale_command" <<'PYEOF' -import hashlib -import json -from pathlib import Path -import sys - -script_path = Path(sys.argv[1]) -fixture_dir = Path(sys.argv[2]) -expected_command = sys.argv[3] -stale_command = sys.argv[4] - -start_marker = "agent_run <<'REVIEW_PROMPT'" -end_marker = "REVIEW_PROMPT" - -script_lines = script_path.read_text(encoding="utf-8").splitlines() -capturing = False -found_end = False -prompt_lines = [] -for line in script_lines: - if capturing: - if line.strip() == end_marker: - found_end = True - break - prompt_lines.append(line) - elif line.strip() == start_marker: - capturing = True - -if not capturing or not found_end or not prompt_lines: - raise SystemExit("failed to extract REVIEW_PROMPT heredoc from helix-concerns demo") - -expected_prompt = "\n".join(prompt_lines) -prompt_line_set = {line.strip() for line in prompt_lines} -expected_hash = hashlib.sha256(expected_prompt.encode("utf-8")).hexdigest()[:16] -fixture_path = fixture_dir / f"{expected_hash}.json" - -if not fixture_path.is_file(): - raise SystemExit( - f"missing fixture for extracted review prompt hash: {fixture_path.name}" - ) - -with fixture_path.open(encoding="utf-8") as fh: - payload = json.load(fh) - -prompt = payload.get("prompt") -prompt_hash = payload.get("prompt_hash") -prompt_len = payload.get("prompt_len") - -if prompt != expected_prompt: - raise SystemExit("fixture prompt text does not match the demo REVIEW_PROMPT body") -if prompt_hash != expected_hash: - raise SystemExit("fixture prompt_hash does not match the demo REVIEW_PROMPT hash") -if prompt_len != len(expected_prompt): - raise SystemExit("fixture prompt_len does not match the demo REVIEW_PROMPT length") -if hashlib.sha256(prompt.encode("utf-8")).hexdigest()[:16] != expected_hash: - raise SystemExit("fixture prompt hash is not the SHA-256 truncation of the prompt text") -if expected_command not in prompt_line_set: - raise SystemExit("REVIEW_PROMPT body does not include the labeled review-finding command") -if stale_command in prompt_line_set: - raise SystemExit("REVIEW_PROMPT body still includes the stale unlabeled review-finding command") -PYEOF -} - -expected_command='ddx bead create "drift: " --type task --labels helix,activity:build,review-finding,area:testing' -stale_command='ddx bead create "drift: " --type task --labels helix,activity:build,review-finding' - -[[ ! -e "$repo_root/docs/demos/helix-concerns/agent-dictionary/e049bf7ab8d7b559.json" ]] \ - || fail "stale helix-concerns replay fixture should be removed after prompt hash changes" - -assert_fixture_matches_demo_prompt "$demo_script" "$fixture_dir" "$expected_command" "$stale_command" - -printf 'validated demo fixtures\n'