Skip to content

Commit d8b7a65

Browse files
8bitAlexclaudeMr. MeeseeksCopilot
authored
feat: raid doctor runs verify entries with self-heal (closes #42) (#87)
* feat: raid doctor runs verify entries with self-heal (closes #42) Every verify: entry on the active profile and per-repo raid.yaml files now runs as part of `raid doctor` and produces a finding: - first-try pass → ok - failure → onFail → pass → warn (remediated by onFail) - failure that can't be healed → error RunVerify now returns (VerifyOutcome, error) so doctor can distinguish a clean pass from a successful self-heal — the latter is a warning so the user knows something silently fixed itself. The remediated state maps to the doctor warn severity; failures don't short-circuit subsequent entries, so one run reports the full picture. The cmd/doctor JSON shape is unchanged — verify findings flow through the existing Finding type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot review on doctor verify - doctor: load per-repo raid.yaml verify entries before running them. Previously the doctor path never invoked buildRepo/ExtractRepo on the repo, so verify blocks defined only in raid.yaml were silently skipped (BuildSingleRepoProfile keeps only name/path/branch; ExtractProfile only sees what's in the wrapping profile). - verify: fix VerifyOutcome doc comment — describe the real tri-state enum rather than implying failed is a separate "fourth state". - docs: clarify that verify tasks/onFail can be any task type (HTTP, Git, Template, Prompt/Confirm, SetVar, …), not just shell. Updates doctor.mdx warning, whats-new.mdx release note, and the embedded verifyArray schema description (which still said verify entries were inert / doctor integration was future work). - tests: replace the pre-populated repo.Verify test with one that exercises the load-from-raid.yaml path, and add a merge case that covers both profile-level and raid.yaml verify entries. Co-Authored-By: Copilot <copilot@github.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Mr. Meeseeks <alex.salerno+meeseeks@me.com> Co-authored-by: Copilot <copilot@github.com>
1 parent a13ba7f commit d8b7a65

11 files changed

Lines changed: 416 additions & 43 deletions

File tree

schemas/raid-defs.schema.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -636,7 +636,7 @@
636636
},
637637
"verifyArray": {
638638
"type": "array",
639-
"description": "Declarative precondition checks. Each entry runs `tasks:` to assert a dependency or environmental precondition; if any task exits non-zero and `onFail:` is provided, raid runs the remediation once and re-runs `tasks:` exactly once. Verify entries are inert at the CLI today — `raid doctor` (#42) surfaces them as findings. Keep checks small and fast: each entry is run on every doctor invocation.",
639+
"description": "Declarative precondition checks. Each entry runs `tasks:` to assert a dependency or environmental precondition; if any task exits non-zero and `onFail:` is provided, raid runs the remediation once and re-runs `tasks:` exactly once. `raid doctor` runs every verify entry on the active profile and per-repo `raid.yaml` files and surfaces each as a finding (ok / warn / error). Keep checks small and fast: each entry is run on every doctor invocation.",
640640
"items": {
641641
"type": "object",
642642
"properties": {

site/docs/references/schema.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ install:
211211

212212
Declarative precondition checks. Each entry runs `tasks:` to assert that a dependency or environmental requirement is in place. An optional `onFail:` remediation gets exactly one chance to fix things — if remediation succeeds, raid re-runs `tasks:` once; if that pass succeeds the verify is reported as remediated, otherwise it fails.
213213

214-
Verify entries are accepted on both profiles and per-repo `raid.yaml` files. They share execution context with `install:` — the active environment, raid vars, and task options all apply. The schema accepts verify entries today; a future release will surface them through `raid doctor` (and, longer-term, a dedicated `raid verify` command).
214+
Verify entries are accepted on both profiles and per-repo `raid.yaml` files. They share execution context with `install:` — the active environment, raid vars, and task options all apply. `raid doctor` runs every verify entry and surfaces each as a finding: a first-try pass is an `ok` finding, a successful self-heal is a `warn` (the precondition holds now, but didn't before — worth knowing), and a failure is an `error` carrying the underlying task error.
215215

216216
```yaml
217217
verify:

site/docs/usage/doctor.mdx

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,23 @@ Checks include:
2727
- Referenced task groups exist
2828
- Environment names are unique
2929
- Custom command names don't shadow built-in commands
30+
- Every [`verify:`](../references/schema#verify) entry on the profile and per-repo `raid.yaml` files
31+
32+
## Verify entries
33+
34+
If your profile or any repo's `raid.yaml` defines a [`verify:`](../references/schema#verify) block, doctor runs every entry and surfaces each as its own finding:
35+
36+
| Outcome | Severity | What it means |
37+
|---|---|---|
38+
| First-try pass | `ok` | The verify's `tasks:` exited cleanly on the first try. |
39+
| Remediated | `warn` | `tasks:` failed, the optional `onFail:` block ran, and the re-run of `tasks:` then passed. The precondition holds *now* — but it didn't before, so doctor warns you to investigate why. |
40+
| Failed | `error` | Either no `onFail:` was defined, `onFail:` itself failed, or the retry of `tasks:` failed again. The underlying task error is included in the finding's message. |
41+
42+
Doctor runs verify tasks with the same execution context as `install:` — the active environment, raid vars, and task options all apply. A failure on one verify entry does not prevent subsequent entries from running, so you see the full health picture in a single pass.
43+
44+
:::warning Verify tasks execute real work
45+
Doctor invokes the actual `tasks:` and `onFail:` blocks you've defined — that's any task type raid supports (shell commands, HTTP, Git, Template, Prompt/Confirm, SetVar, …), not just shell. Non-shell tasks can still make network calls, mutate files, or prompt for input. Keep verify checks small, fast, and side-effect-light (a `node --version` probe, a `test -f`, a `Wait` against a local port). Heavy bootstrap work belongs in `install:`.
46+
:::
3047

3148
## When to run it
3249

site/docs/whats-new.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ User-visible changes per release, latest first. For full commit history see the
1111

1212
## 0.14.0 — upcoming
1313

14-
**Declarative `verify:` blocks.** Profiles and per-repo `raid.yaml`s now accept a top-level `verify:` list. Each entry runs `tasks:` to assert a precondition (a tool is installed, a port is reachable, a credentials file exists), and an optional `onFail:` remediation gets exactly one chance to fix things — if it succeeds, raid re-runs `tasks:` once and the verify is reported as remediated; otherwise it surfaces as a structured `VERIFY_FAILED` error. Verify entries share execution context with `install:` tasks (active env + raid vars + task options). `raid doctor` integration to surface verify entries as health-report findings follows in [#42](https://github.com/8bitAlex/raid/issues/42). See [Schema → Verify](/docs/references/schema#verify). Closes [#38](https://github.com/8bitAlex/raid/issues/38).
14+
**`raid doctor` runs `verify:` entries with self-heal.** Every `verify:` entry on the active profile and per-repo `raid.yaml` files now runs as part of `raid doctor` and produces its own finding: `ok` for a first-try pass, `warn` when the optional `onFail:` block recovered a failing precondition (the verify holds *now*, but didn't before — worth knowing), and `error` when the precondition can't be made to hold. Failures don't short-circuit subsequent entries, so a single `raid doctor` run reports the full picture. Doctor invokes the actual `tasks:` and `onFail:` blocks — any task type raid supports (shell, HTTP, Git, Template, Prompt/Confirm, SetVar, …), not just shell. Keep verify checks small and fast, since they'll run every time you (or CI, or an agent) checks raid's health. See [Doctor → Verify entries](/docs/usage/doctor#verify-entries). Closes [#42](https://github.com/8bitAlex/raid/issues/42).
15+
16+
**Declarative `verify:` blocks.** Profiles and per-repo `raid.yaml`s now accept a top-level `verify:` list. Each entry runs `tasks:` to assert a precondition (a tool is installed, a port is reachable, a credentials file exists), and an optional `onFail:` remediation gets exactly one chance to fix things — if it succeeds, raid re-runs `tasks:` once and the verify is reported as remediated; otherwise it surfaces as a structured `VERIFY_FAILED` error. Verify entries share execution context with `install:` tasks (active env + raid vars + task options). `raid doctor` integration is now wired (see entry above). See [Schema → Verify](/docs/references/schema#verify). Closes [#38](https://github.com/8bitAlex/raid/issues/38).
1517

1618
**Shared `options:` block on every task type.** A new `options:` block on the base task definition composes uniformly across every task type (and on user-defined commands) so cross-cutting fields don't have to be re-declared per type. The initial field, `showExeTime: bool`, prints a dim line to stderr after a task or command completes with the elapsed time: `task-name complete in 1.2s`. Omitting `options` (or any field within it) leaves current behavior unchanged, so the addition is fully backwards compatible. Additional fields (`quiet`, `timeout`, …) will ship additively. Closes [#54](https://github.com/8bitAlex/raid/issues/54).
1719

src/internal/lib/doctor.go

Lines changed: 76 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ func checkProfile() []Finding {
8888
})
8989
}
9090
findings = append(findings, Finding{Severity: SeverityOK, Check: "profile schema", Message: "valid (single-repo)"})
91+
// In single-repo mode, verify entries live on the synthesized
92+
// repo, not the wrapper profile — checkRepo picks them up.
9193
for _, repo := range fullProfile.Repositories {
9294
findings = append(findings, checkRepo(repo)...)
9395
}
@@ -113,6 +115,8 @@ func checkProfile() []Finding {
113115
})
114116
}
115117

118+
findings = append(findings, checkVerify("verify", fullProfile.Verify)...)
119+
116120
if len(fullProfile.Repositories) == 0 {
117121
return append(findings, Finding{
118122
Severity: SeverityWarn,
@@ -177,12 +181,81 @@ func checkRepo(repo Repo) []Finding {
177181
Message: err.Error(),
178182
Suggestion: fmt.Sprintf("fix %s to match the repo schema", raidFile),
179183
})
180-
} else {
184+
return findings
185+
}
186+
findings = append(findings, Finding{
187+
Severity: SeverityOK,
188+
Check: fmt.Sprintf("repo/%s raid.yaml", repo.Name),
189+
Message: "valid",
190+
})
191+
192+
// Merge verify entries from the per-repo raid.yaml. The profile-level
193+
// Repo only carries what's in the wrapping profile (or, for
194+
// BuildSingleRepoProfile, just name/path/branch), so without this
195+
// merge per-repo verify blocks would be silently skipped.
196+
repoConfig, err := ExtractRepo(repo.Path)
197+
if err != nil {
181198
findings = append(findings, Finding{
182-
Severity: SeverityOK,
199+
Severity: SeverityError,
183200
Check: fmt.Sprintf("repo/%s raid.yaml", repo.Name),
184-
Message: "valid",
201+
Message: err.Error(),
185202
})
203+
return findings
204+
}
205+
repo.Verify = append(repo.Verify, repoConfig.Verify...)
206+
207+
findings = append(findings, checkVerify(fmt.Sprintf("repo/%s verify", repo.Name), repo.Verify)...)
208+
return findings
209+
}
210+
211+
// checkVerify runs each verify entry and converts the outcome into a
212+
// finding. label is the check-name prefix ("verify" for profile-level,
213+
// "repo/<name> verify" for repo-level). The entry's Name is appended
214+
// so each finding has a unique, human-readable label.
215+
//
216+
// Outcomes map to severities:
217+
// - VerifyOutcomeOK → SeverityOK
218+
// - VerifyOutcomeRemediated → SeverityWarn (the verify holds now, but
219+
// it didn't on the first try — worth surfacing so the user knows
220+
// something silently fixed itself)
221+
// - VerifyOutcomeFailed → SeverityError
222+
//
223+
// Failures don't short-circuit subsequent entries — doctor reports every
224+
// verify so the user sees the full picture in one pass.
225+
func checkVerify(label string, entries []Verify) []Finding {
226+
var findings []Finding
227+
for _, v := range entries {
228+
if v.IsZero() {
229+
continue
230+
}
231+
check := fmt.Sprintf("%s/%s", label, v.Name)
232+
outcome, err := RunVerify(v)
233+
switch outcome {
234+
case VerifyOutcomeOK:
235+
findings = append(findings, Finding{
236+
Severity: SeverityOK,
237+
Check: check,
238+
Message: "passed",
239+
})
240+
case VerifyOutcomeRemediated:
241+
findings = append(findings, Finding{
242+
Severity: SeverityWarn,
243+
Check: check,
244+
Message: "remediated by onFail",
245+
Suggestion: "investigate why the precondition wasn't already in place",
246+
})
247+
case VerifyOutcomeFailed:
248+
msg := "failed"
249+
if err != nil {
250+
msg = err.Error()
251+
}
252+
findings = append(findings, Finding{
253+
Severity: SeverityError,
254+
Check: check,
255+
Message: msg,
256+
Suggestion: "fix the underlying dependency or update the verify block to match reality",
257+
})
258+
}
186259
}
187260
return findings
188261
}

src/internal/lib/doctor_test.go

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,234 @@ func TestCheckProfile_singleRepoInvalid(t *testing.T) {
330330
}
331331
}
332332

333+
// --- checkVerify ---
334+
335+
func TestCheckVerify_passingEntryProducesOKFinding(t *testing.T) {
336+
entries := []Verify{
337+
{
338+
Name: "echo-works",
339+
Tasks: []Task{{Type: Shell, Cmd: "exit 0"}},
340+
},
341+
}
342+
findings := checkVerify("verify", entries)
343+
if len(findings) != 1 {
344+
t.Fatalf("checkVerify: got %d findings, want 1", len(findings))
345+
}
346+
f := findings[0]
347+
if f.Severity != SeverityOK {
348+
t.Errorf("severity = %v, want SeverityOK", f.Severity)
349+
}
350+
if f.Check != "verify/echo-works" {
351+
t.Errorf("check = %q, want %q", f.Check, "verify/echo-works")
352+
}
353+
}
354+
355+
func TestCheckVerify_remediatedEntryProducesWarnFinding(t *testing.T) {
356+
marker := filepath.Join(t.TempDir(), "fix")
357+
entries := []Verify{
358+
{
359+
Name: "needs-fix",
360+
Tasks: []Task{{Type: Shell, Cmd: "test -f " + marker}},
361+
OnFail: []Task{{Type: Shell, Cmd: "touch " + marker}},
362+
},
363+
}
364+
findings := checkVerify("verify", entries)
365+
if len(findings) != 1 {
366+
t.Fatalf("checkVerify: got %d findings, want 1", len(findings))
367+
}
368+
f := findings[0]
369+
if f.Severity != SeverityWarn {
370+
t.Errorf("severity = %v, want SeverityWarn for remediated outcome", f.Severity)
371+
}
372+
if f.Suggestion == "" {
373+
t.Error("remediated finding should carry a suggestion")
374+
}
375+
}
376+
377+
func TestCheckVerify_failedEntryProducesErrorFinding(t *testing.T) {
378+
entries := []Verify{
379+
{
380+
Name: "broken",
381+
Tasks: []Task{{Type: Shell, Cmd: "exit 1"}},
382+
},
383+
}
384+
findings := checkVerify("verify", entries)
385+
if len(findings) != 1 {
386+
t.Fatalf("checkVerify: got %d findings, want 1", len(findings))
387+
}
388+
f := findings[0]
389+
if f.Severity != SeverityError {
390+
t.Errorf("severity = %v, want SeverityError", f.Severity)
391+
}
392+
if f.Suggestion == "" {
393+
t.Error("failed finding should carry a suggestion")
394+
}
395+
}
396+
397+
func TestCheckVerify_skipsZeroEntries(t *testing.T) {
398+
// Zero-value verify (from a stray YAML list item) should be ignored
399+
// rather than producing a "passed" finding for an empty check.
400+
findings := checkVerify("verify", []Verify{{}, {Name: "real", Tasks: []Task{{Type: Shell, Cmd: "exit 0"}}}})
401+
if len(findings) != 1 {
402+
t.Fatalf("checkVerify: got %d findings, want 1 (zero entry skipped)", len(findings))
403+
}
404+
if findings[0].Check != "verify/real" {
405+
t.Errorf("check = %q, want %q", findings[0].Check, "verify/real")
406+
}
407+
}
408+
409+
func TestCheckVerify_failureDoesNotShortCircuitSubsequent(t *testing.T) {
410+
// A failing verify must not prevent later entries from running —
411+
// doctor needs to surface every finding in a single pass.
412+
entries := []Verify{
413+
{Name: "first-fails", Tasks: []Task{{Type: Shell, Cmd: "exit 1"}}},
414+
{Name: "second-passes", Tasks: []Task{{Type: Shell, Cmd: "exit 0"}}},
415+
}
416+
findings := checkVerify("verify", entries)
417+
if len(findings) != 2 {
418+
t.Fatalf("checkVerify: got %d findings, want 2", len(findings))
419+
}
420+
if findings[0].Severity != SeverityError {
421+
t.Errorf("first finding severity = %v, want SeverityError", findings[0].Severity)
422+
}
423+
if findings[1].Severity != SeverityOK {
424+
t.Errorf("second finding severity = %v, want SeverityOK", findings[1].Severity)
425+
}
426+
}
427+
428+
func TestCheckVerify_emptyEntriesIsNoOp(t *testing.T) {
429+
findings := checkVerify("verify", nil)
430+
if len(findings) != 0 {
431+
t.Errorf("expected 0 findings, got %d", len(findings))
432+
}
433+
}
434+
435+
// --- checkProfile with verify entries ---
436+
437+
func TestCheckProfile_runsProfileLevelVerify(t *testing.T) {
438+
setupTestConfig(t)
439+
440+
dir := t.TempDir()
441+
path := filepath.Join(dir, "profile.yaml")
442+
// Profile-level verify with one passing and one failing entry.
443+
body := `name: vp
444+
verify:
445+
- name: ok
446+
tasks:
447+
- type: Shell
448+
cmd: exit 0
449+
- name: broken
450+
tasks:
451+
- type: Shell
452+
cmd: exit 1
453+
`
454+
if err := os.WriteFile(path, []byte(body), 0644); err != nil {
455+
t.Fatal(err)
456+
}
457+
if err := AddProfile(Profile{Name: "vp", Path: path}); err != nil {
458+
t.Fatal(err)
459+
}
460+
if err := SetProfile("vp"); err != nil {
461+
t.Fatal(err)
462+
}
463+
464+
findings := checkProfile()
465+
var sawOK, sawError bool
466+
for _, f := range findings {
467+
if f.Check == "verify/ok" && f.Severity == SeverityOK {
468+
sawOK = true
469+
}
470+
if f.Check == "verify/broken" && f.Severity == SeverityError {
471+
sawError = true
472+
}
473+
}
474+
if !sawOK {
475+
t.Error("expected 'verify/ok' OK finding")
476+
}
477+
if !sawError {
478+
t.Error("expected 'verify/broken' error finding")
479+
}
480+
}
481+
482+
// TestCheckRepo_loadsVerifyFromRaidYaml covers the doctor path's
483+
// responsibility for loading verify entries from the per-repo raid.yaml
484+
// itself. The Repo passed in has an empty Verify — production code paths
485+
// like BuildSingleRepoProfile only carry name/path/branch — so doctor
486+
// must read raid.yaml to surface its verify findings.
487+
func TestCheckRepo_loadsVerifyFromRaidYaml(t *testing.T) {
488+
dir := t.TempDir()
489+
os.MkdirAll(filepath.Join(dir, ".git"), 0755)
490+
repoYaml := `name: r
491+
branch: main
492+
verify:
493+
- name: hello
494+
tasks:
495+
- type: Shell
496+
cmd: exit 0
497+
`
498+
if err := os.WriteFile(filepath.Join(dir, RaidConfigFileName), []byte(repoYaml), 0644); err != nil {
499+
t.Fatal(err)
500+
}
501+
502+
repo := Repo{
503+
Name: "r",
504+
Path: dir,
505+
URL: "http://example.com/r.git",
506+
}
507+
findings := checkRepo(repo)
508+
var sawVerify bool
509+
for _, f := range findings {
510+
if f.Check == "repo/r verify/hello" && f.Severity == SeverityOK {
511+
sawVerify = true
512+
}
513+
}
514+
if !sawVerify {
515+
t.Errorf("expected 'repo/r verify/hello' OK finding from raid.yaml, got %+v", findings)
516+
}
517+
}
518+
519+
// TestCheckRepo_mergesProfileAndRaidYamlVerify covers the case where the
520+
// profile-level Repo entry has its own verify and the per-repo raid.yaml
521+
// has additional verify entries — both should surface as findings.
522+
func TestCheckRepo_mergesProfileAndRaidYamlVerify(t *testing.T) {
523+
dir := t.TempDir()
524+
os.MkdirAll(filepath.Join(dir, ".git"), 0755)
525+
repoYaml := `name: r
526+
branch: main
527+
verify:
528+
- name: from-file
529+
tasks:
530+
- type: Shell
531+
cmd: exit 0
532+
`
533+
if err := os.WriteFile(filepath.Join(dir, RaidConfigFileName), []byte(repoYaml), 0644); err != nil {
534+
t.Fatal(err)
535+
}
536+
537+
repo := Repo{
538+
Name: "r",
539+
Path: dir,
540+
URL: "http://example.com/r.git",
541+
Verify: []Verify{{Name: "from-profile", Tasks: []Task{{Type: Shell, Cmd: "exit 0"}}}},
542+
}
543+
findings := checkRepo(repo)
544+
var sawProfile, sawFile bool
545+
for _, f := range findings {
546+
if f.Check == "repo/r verify/from-profile" && f.Severity == SeverityOK {
547+
sawProfile = true
548+
}
549+
if f.Check == "repo/r verify/from-file" && f.Severity == SeverityOK {
550+
sawFile = true
551+
}
552+
}
553+
if !sawProfile {
554+
t.Errorf("expected 'repo/r verify/from-profile' OK finding, got %+v", findings)
555+
}
556+
if !sawFile {
557+
t.Errorf("expected 'repo/r verify/from-file' OK finding, got %+v", findings)
558+
}
559+
}
560+
333561
// severitySet returns a set of all severities present in findings.
334562
func severitySet(findings []Finding) map[Severity]bool {
335563
out := make(map[Severity]bool, len(findings))

0 commit comments

Comments
 (0)