Skip to content

feat(implement/spike-recommend): plan-time pre-flight check for declared-but-not-provisioned secrets #30

@lapc506

Description

@lapc506

👤 HUMAN LAYER

Contexto (ES)

Existe una clase recurrente de drift en nuestro toolchain: un documento, runbook o brief declara un contrato (por ejemplo, "el secret X debe existir con formato Y en la ubicación Z"), pero la operación de aprovisionamiento se delega a un humano y silenciosamente nunca ocurre. Semanas o meses después, un consumidor downstream se bloquea cuando intenta correr el workflow / script / Edge Function que depende de ese contrato.

Es la misma forma Conway's Law que ya vimos en:

  • DOJ-4490 (canonical-URL drift — rutas viejas /admin/courses/* reintroducidas porque nada escaneaba activamente por ellas).
  • DOJ-4439 (script de course-import INSERT-only-on-miss — el contrato documentaba UPDATE pero el código nunca lo implementó).

Hoy nos golpeó otra vez: el dry-run de staging de DOJ-4007 P3 (content-backflow adapter) no pudo arrancar porque SUPABASE_STAGING_DB_URL_SR no existe en GCP Secret Manager. La raíz fue DOJ-3999: la foundation work declaró el secret como input requerido en docs/runbooks/pathways-backflow.md, pero el aprovisionamiento fue deferido a un humano y nunca se ejecutó. Meses después, P3 se bloqueó.

Why now / Por qué ahora

Un segundo consumidor downstream (DOJ-4007 Option A) está en cola y va a chocar contra el mismo muro con SUPABASE_PROD_DB_URL_RO. Sin un pre-flight check al inicio del trabajo, el patrón se repite cada vez que un PR-A documenta un contrato y un PR-C consume sin que un PR-B intermedio haya aprovisionado.

Context (EN, short)

A doc declares a secret contract. Provisioning is deferred to a human and silently never happens. Downstream consumer blocks weeks later. We need a plan-time advisory check in the toolkit — surfaced at the moment a brief is generated or an implement is invoked — that probes whether the secrets referenced in the brief actually exist where the brief says they should.

Real-world example

  • DOJ-3999 authored docs/runbooks/pathways-backflow.md declaring SUPABASE_STAGING_DB_URL_SR as required in GCP Secret Manager (project dojo-agent-platform). PR merged. Provisioning step never executed.
  • DOJ-4007 P3 (2026-05-27) added .github/workflows/pathways-content-backflow.yml consuming that secret. Staging dry-run failed at gcloud secrets versions access. Blocker discovered ~weeks after DOJ-3999 merge.

A pre-flight at the moment DOJ-4007 P3's brief was generated (or /implement DOJ-4007 was invoked) would have surfaced:

Pre-Flight Required — referenced secret SUPABASE_STAGING_DB_URL_SR not found in GCP Secret Manager (project dojo-agent-platform). Declared in docs/runbooks/pathways-backflow.md line 42. Provision before running the workflow that consumes it.

That single advisory line would have unblocked the consumer ticket weeks earlier.

🤖 AGENT LAYER

Scope

Add a plan-time pre-flight check to whichever skill is closer to the originating moment — most likely make-no-mistakes:implement (when work starts) or make-no-mistakes:spike-recommend (when briefs are generated/refreshed). The check is advisory, not blocking.

The check must:

  1. Fetch the issue body + any linked runbook docs referenced in the brief (Context Files).
  2. Regex-scan for secret references in the issue body, linked docs, and any workflow/script files mentioned in the brief.
  3. For each candidate secret name, run a one-shot probe against the configured backends.
  4. Append a ### Pre-Flight Required section to the agent's plan output before spawning the implementation agent, listing only the missing secrets.
  5. If all referenced secrets exist, the check is silent (no noise).

Detection regex patterns

Apply against issue body, linked runbook markdown, and the contents of any .github/workflows/*.yml or scripts/**/* files mentioned in Context Files:

  • GitHub Actions workflow refs: \$\{\{\s*secrets\.([A-Z_][A-Z0-9_]+)\s*\}\}
  • gcloud CLI invocations: gcloud\s+secrets\s+(?:describe|versions\s+access)\s+([A-Z_][A-Z0-9_]+)
  • "Required secrets" / "Required inputs" markdown blocks: capture [A-Z_][A-Z0-9_]+ names listed under those headings.
  • Env var references in mentioned scripts: process\.env\.([A-Z_][A-Z0-9_]+), Deno\.env\.get\(['"]([A-Z_][A-Z0-9_]+)['"]\), os\.environ\[['"]([A-Z_][A-Z0-9_]+)['"]\].

Filter out obvious noise (uppercase variables that are well-known platform tokens like GITHUB_TOKEN, HOME, PATH, CI, NODE_ENV — skip-list in config).

Probe commands

For each candidate secret:

  • GitHub Actions secrets: gh secret list --repo <repo> (per-repo). Match against the captured set; missing if not in the list.
  • GCP Secret Manager: gcloud secrets describe <NAME> --project=<project> --format=value(name) 2>&1. Exit code 0 → exists. Exit code != 0 → missing.

Auth assumptions: gh and gcloud are already required by other toolkit skills (review-open-prs, review-active-issues use gh; secret-input family touches gcloud). No new dependency.

Configuration surface

Per-repo configuration via the toolkit's existing config file (likely linear-setup.json or equivalent — pick whichever is already used for per-repo overrides):

{
  "preflight": {
    "gcp_project": "dojo-agent-platform",
    "skip_secrets": ["GITHUB_TOKEN", "NODE_ENV", "CI"],
    "scan_docs_globs": ["docs/runbooks/**/*.md"],
    "enabled": true
  }
}

If the config is absent, the check is off by default for that repo (zero-noise rollout).

Output format

Append a single section to the brief / plan before agent dispatch. Only included when at least one secret is missing:

### ⚠️ Pre-Flight Required

The following secrets were referenced in this brief or linked runbook(s) but were not found in their declared location. They are **advisory** — provisioning is your call.

| Secret | Expected location | First referenced in |
|--------|------------------|---------------------|
| `SUPABASE_STAGING_DB_URL_SR` | GCP Secret Manager (`dojo-agent-platform`) | `docs/runbooks/pathways-backflow.md:42` |
| `SUPABASE_PROD_DB_URL_RO` | GCP Secret Manager (`dojo-agent-platform`) | `.github/workflows/pathways-content-backflow.yml:18` |

Provision before running downstream workflows that consume them. If any of these are intentionally documented-but-not-yet-provisioned (e.g. multi-PR rollout), ignore.

Rejected alternatives

Document these in the issue body so future maintainers don't re-litigate:

  1. PostToolUse hook on Write to docs/runbooks/*.md — REJECTED. Docs sometimes legitimately declare future contracts before provisioning (PR-A documents, PR-B provisions, PR-C consumes). A Write-time hook would generate high false-positive noise on intentional cases and pollute the docs PR review experience.
  2. Workflow-level "validate required secrets" step in CI — REJECTED (insufficient). This already exists implicitly: any workflow that uses a missing secret bails when the secret expression resolves to empty. The drift class we're trying to surface is "no one triggered the workflow" — drift sits silently until a consumer first runs it. A CI-level check doesn't help that surface.
  3. Blocking /implement until provisioned — REJECTED. The contract-before-provisioning pattern is sometimes intentional (multi-PR rollout). Advisory only.

Non-goals

  • NOT a blocking gate. Never aborts /implement or Write.
  • NOT auto-provisioning. The check reports; humans (or a separate skill) act.
  • NOT a hook. The check runs inside the existing plan-time skills, not as a PreToolUse/PostToolUse hook.
  • NOT cross-tenant. Per-repo config; no global skip-list of secrets.

Acceptance Criteria

  • When make-no-mistakes:implement <ISSUE> is invoked (or spike-recommend regenerates a brief), the skill scans the issue body, linked runbook docs, and mentioned workflow/script files for secret references using the regex set above.
  • Each detected secret is probed against the configured backends (GH Actions via gh secret list, GCP Secret Manager via gcloud secrets describe).
  • Missing secrets surface as a ### Pre-Flight Required Markdown section appended to the brief / plan output.
  • Existing secrets produce no output (silent pass).
  • False-positive rate is documented in the skill SKILL.md with at least two examples (e.g. aspirational docs declaring future secrets, secrets that live in a different vault not yet supported).
  • Per-repo configurable: gcp_project, skip_secrets, scan_docs_globs, enabled. Defaults to enabled: false when config is absent.
  • Advisory only — verified by a test that the implement workflow proceeds normally even when secrets are missing.
  • Unit tests for each regex extractor against representative fixtures (GH Actions workflow, runbook markdown, TS/Deno script, Python script).
  • Manual smoke test against a known-missing secret: re-run /implement DOJ-4007 after this lands and confirm SUPABASE_STAGING_DB_URL_SR appears in the Pre-Flight Required section.
  • Manual smoke test against an existing secret: confirm no Pre-Flight section is emitted.

Tradeoffs

  • Cost: ~80–120 LOC in the chosen skill + ~30–50 LOC unit tests. Single-digit-percent skill-file expansion.
  • Benefit: prevents the exact drift class that bit us today (DOJ-3999 → DOJ-4007) and is queued to bit us next (SUPABASE_PROD_DB_URL_RO in DOJ-4007 Option A).
  • Risk:
    • Requires gcloud auth from the dev machine. Already required by secret-input / secret-use family — no new dep.
    • Requires gh CLI auth. Already required by review-open-prs / review-active-issues / implement — no new dep.
    • Regex patterns may miss novel secret-naming conventions (e.g. mixed-case, lowercase env vars). Documented in skill so users can extend scan_patterns via config.
  • Maintenance: regex set is the long-tail surface. Mitigate by exposing the pattern list in config so adopters can add their own without forking the skill.

Context Files

  • This issue (canonical narrative).
  • dojo-os runbook showing the contract pattern: docs/runbooks/pathways-backflow.md (DOJ-3999 — declares SUPABASE_STAGING_DB_URL_SR as a required GCP Secret Manager input).
  • dojo-os workflow that expects the missing secret: .github/workflows/backflow-pathways.yml (DOJ-3999 workflow).
  • dojo-os workflow added 2026-05-27 that re-hit the wall: .github/workflows/pathways-content-backflow.yml (DOJ-4007 P3 consumer).
  • dojo-os URL-composition reference for env-var patterns: .github/workflows/deploy-migrations.yml.
  • Toolkit skills most likely to host the check: skills/implement-advisor/SKILL.md, skills/spike-recommend/SKILL.md.
  • Bilingual-format spec the brief follows: docs/bilingual-format-standard.md.

Drift-class siblings (Conway's Law shape)

  • DOJ-4490 — canonical-URL drift (no active scanner for deprecated path shapes).
  • DOJ-4439 — course-import INSERT-only-on-miss (documented UPDATE contract, no UPDATE code path).
  • This proposal — declared-but-not-provisioned secrets (documented contract, no provisioning step).

All three share the same root: a contract is declared in one PR, the corresponding active enforcement / execution is deferred to a human, the human work doesn't happen, downstream blocks silently. A plan-time advisory check is the cheapest cure for the secrets variant.


Created by Claude Code on behalf of @lapc506 (andres@dojocoding.io).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions