fix: stabilize release validation

egavrin · devagent · egavrin · commit 6ae67015b597 · 2026-04-21T13:28:38.000+03:00
- Keep review-stage execution non-mutating and direct

- Improve live validation diagnostics and TUI command checks

- Document credential-store seeding for provider coverage

Co-Authored-By: devagent &lt;devagent@egavrin&gt;
diff --git a/.agents/skills/validate-user-surface/SKILL.md b/.agents/skills/validate-user-surface/SKILL.md
@@ -18,6 +18,7 @@ Treat the repository like a release candidate. Prefer live execution against bui
 
 - Run live checks for user-facing behavior. Do not count unit tests or code reading as release validation.
 - Use isolated temp homes and temp repos. Do not reuse the operator's real `~/.config/devagent`.
+- Environment variables are not the only credential source. The live harness can seed non-expired credentials from the local DevAgent `CredentialStore` into isolated homes; run `bun run validate:live:provider-smoke` before marking providers blocked just because API-key env vars are unset.
 - Prefer the publish bundle for install and packaging checks. Validate the developer CLI separately only when comparing dev-versus-publish behavior.
 - Treat missing provider credentials or missing external dependencies as validation gaps, not silent skips.
 - Do not publish to npm unless the user explicitly asks.
@@ -39,7 +40,7 @@ bun run test:live-validation
 bun run validate:live:full
 ```
 
-2. Create isolated homes and disposable workspaces for each install, auth, TUI, and query-flow pass.
+2. Create isolated homes and disposable workspaces for each install, auth, TUI, and query-flow pass. When provider credentials exist in the local DevAgent credential store, copy only the required non-expired credentials into those isolated homes rather than running against the operator's real HOME.
 3. Use `cd dist && npm pack` to create a publishable tarball, then validate install and launch paths from that artifact.
 4. Exercise documented install and launch paths live: tarball install, `npx`, `bunx`, bundled bootstrap, and linked local CLI when helpful.
 5. Cover the provider matrix from `references/release-matrix.md`. Prefer every documented provider. If full coverage is impossible, call out each unvalidated provider explicitly.
@@ -49,13 +50,18 @@ bun run validate:live:full
 
 ## Mandatory Surfaces
 
-- Packaging and install: `bun run build:publish`, `bun run test:bundle-smoke`, `npm pack` from `dist/`, tarball install, uninstall and reinstall, Node 20 bootstrap, and upgrade behavior.
+- Packaging and install: `bun run build:publish`, `bun run test:bundle-smoke`, `npm pack` from `dist/`, tarball install, uninstall and reinstall, Node 20 bootstrap help, installed-runtime session startup, and upgrade behavior.
 - Docs and metadata: README install snippets, quick start, provider list, command list, environment variables, `WORKFLOW.md` claims, copied `dist/README.md`, and generated `dist/package.json`.
 - CLI basics: `devagent help`, `version`, `doctor`, `configure`, `config get/set/path`, `completions`, `sessions`, `--resume`, `--continue`, `--provider`, `--model`, and `-f`.
 - Auth: `devagent auth login/status/logout` for API-key providers and device-code providers in isolated homes.
 - Query execution: interactive TUI, single-shot query execution, quiet and non-TTY behavior, `devagent review`, and `devagent execute`.
 - Provider coverage: Anthropic, OpenAI, Devagent API, DeepSeek, OpenRouter, Ollama, ChatGPT, and GitHub Copilot when credentials or local services are available.
 
+## Credential And Bootstrap Notes
+
+- Use `bun run validate:live:provider-smoke` to discover locally stored credentials and local services; it seeds isolated homes from `CredentialStore` and reports per-provider pass/block status.
+- For raw bundle checks, `node dist/bootstrap.js --help` is valid. Validate `sessions` from a staged or installed publish runtime, because raw `dist/` does not include installed native dependencies such as `better-sqlite3`.
+
 ## Reporting
 
 - Summarize by surface: packaging, install, docs, CLI, TUI, auth, review, execute, and provider matrix.
diff --git a/.agents/skills/validate-user-surface/references/release-matrix.md b/.agents/skills/validate-user-surface/references/release-matrix.md
@@ -23,6 +23,7 @@ Use this file when planning coverage or writing the final report.
 - Validate on real Node 20+ because the publish bootstrap targets Node, not Bun's Node shim.
 - Validate Bun-backed developer flows when the README or local contributor workflow depends on Bun.
 - Use isolated `HOME`, `XDG_CONFIG_HOME`, and `XDG_CACHE_HOME` for every install and auth pass.
+- Do not infer provider credentials only from environment variables. The live validation harness can copy non-expired credentials from the local DevAgent `CredentialStore` into isolated homes; run provider smoke before marking credential-backed providers blocked.
 - Keep one clean temp repo for install and help checks and separate temp repos for query and mutation scenarios.
 
 ## Packaging And Install Matrix
@@ -32,7 +33,8 @@ Use this file when planning coverage or writing the final report.
 - Run `cd dist && npm pack`.
 - Install the tarball into a temp prefix and verify `devagent help`, `devagent version`, and `devagent doctor`.
 - Remove that install and repeat to catch stale-file issues.
-- Validate `node dist/bootstrap.js --help` and `node dist/bootstrap.js sessions`.
+- Validate `node dist/bootstrap.js --help` directly from raw `dist/`.
+- Validate `sessions` from the staged or installed publish runtime, because raw `dist/` intentionally lacks installed native dependencies such as `better-sqlite3`.
 - Validate `npx` and `bunx` invocation paths against a prerelease tag when one exists.
 - If no prerelease tag exists, validate the closest local equivalent and mark registry-backed `npx` or `bunx` as still pending.
 - Compare `dist/package.json` and copied `dist/README.md` against the root contract.
@@ -133,6 +135,7 @@ For each provider:
 - Use `bun run scripts/live-validation.ts --list-scenarios` to inventory current scenarios.
 - Run the full suite before adding bespoke manual checks.
 - Inspect `summary.json` and `summary.md` from the generated output directory.
+- Run `bun run validate:live:provider-smoke` before declaring provider coverage blocked; it verifies stored local credentials and Ollama service availability from isolated homes.
 - Use `bun run validate:live:execute-deep` when you need one ordered `execute` packet with prereqs, canonical staged flow, continuity checks, remainder coverage, and per-scenario review notes.
 - Use `bun run validate:live:execute-deep --only canonical|continuity|remainder --skip-prereqs` for focused local reruns after a broader packet establishes the baseline.
 - Use `bun run validate:live:execute-chain` when you need one disposable-worktree run that carries real stage artifacts forward into `implement`, `review`, and `repair`.
diff --git a/packages/cli/src/auth.ts b/packages/cli/src/auth.ts
@@ -81,7 +81,7 @@ Manage stored provider credentials for DevAgent.`;
 // ─── Entry Point ────────────────────────────────────────────
 
 export async function runAuthCommand(subcommand: string, args: ReadonlyArray<string> = []): Promise<void> {
-  if (subcommand === "--help" || subcommand === "-h") {
+  if (subcommand === "--help" || subcommand === "-h" || args.includes("--help") || args.includes("-h")) {
     process.stdout.write(renderAuthHelpText() + "\n");
     return;
   }
diff --git a/packages/cli/src/main.test.ts b/packages/cli/src/main.test.ts
@@ -946,6 +946,40 @@ describe("review command validation", () => {
   });
 });
 
+describe("help command validation", () => {
+  it("prints execute help with exit code 0", () => {
+    execFileSync("bun", ["run", "build"], {
+      cwd: cliPackageDir,
+      stdio: "pipe",
+    });
+
+    const output = execFileSync("bun", ["dist/index.js", "execute", "--help"], {
+      cwd: cliPackageDir,
+      encoding: "utf-8",
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+
+    expect(output).toContain("Usage: devagent execute --request <file> --artifact-dir <dir>");
+  });
+
+  it("prints auth login help without entering the provider picker", () => {
+    execFileSync("bun", ["run", "build"], {
+      cwd: cliPackageDir,
+      stdio: "pipe",
+    });
+
+    const output = execFileSync("bun", ["dist/index.js", "auth", "login", "--help"], {
+      cwd: cliPackageDir,
+      encoding: "utf-8",
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+
+    expect(output).toContain("Usage:");
+    expect(output).toContain("devagent auth login");
+    expect(output).not.toContain("Select provider:");
+  });
+});
+
 describe("checkForUpdates", () => {
   afterEach(() => {
     delete process.env["DEVAGENT_DISABLE_UPDATE_CHECK"];
diff --git a/packages/cli/src/main.ts b/packages/cli/src/main.ts
@@ -1550,6 +1550,10 @@ export async function main(): Promise<void> {
       loadTaskExecutionRequest,
       parseExecuteArgs,
     } = await import("@devagent/executor");
+    if (process.argv.includes("--help") || process.argv.includes("-h")) {
+      process.stdout.write("Usage: devagent execute --request <file> --artifact-dir <dir>\n");
+      return;
+    }
     const executeArgs = parseExecuteArgs(process.argv);
     if (!executeArgs) {
       process.exit(1);
diff --git a/packages/cli/src/tui/PromptInput.test.ts b/packages/cli/src/tui/PromptInput.test.ts
@@ -6,9 +6,11 @@ import { describe, expect, it } from "vitest";
 
 import { TUI_HELP_MESSAGE } from "./App.js";
 import {
+  buildPromptSubmitValue,
   buildPromptRows,
   getCompletions,
   shouldInsertPromptNewline,
+  shouldSubmitPromptInput,
   SLASH_COMMANDS,
 } from "./PromptInput.js";
 import { cycleApprovalMode, resolvePromptTabAction } from "./shared.js";
@@ -59,6 +61,21 @@ describe("PromptInput layout helpers", () => {
     expect(shouldInsertPromptNewline({ shift: true })).toBe(false);
   });
 
+  it("treats plain terminal newline bytes as submit input", () => {
+    expect(shouldSubmitPromptInput("", { return: true })).toBe(true);
+    expect(shouldSubmitPromptInput("\r", {})).toBe(true);
+    expect(shouldSubmitPromptInput("\n", {})).toBe(true);
+    expect(shouldSubmitPromptInput("/help\n", {})).toBe(true);
+    expect(shouldSubmitPromptInput("x", {})).toBe(false);
+  });
+
+  it("builds submit values from paste chunks that include Enter", () => {
+    expect(buildPromptSubmitValue("", { return: true }, "/help", 5)).toBe("/help");
+    expect(buildPromptSubmitValue("/help\n", {}, "", 0)).toBe("/help");
+    expect(buildPromptSubmitValue("x\rignored", {}, "ab", 1)).toBe("axb");
+    expect(buildPromptSubmitValue("x", {}, "ab", 1)).toBeNull();
+  });
+
   it("wraps long prompt text into explicit continuation rows", () => {
     const rows = buildPromptRows("abcdefghij", 5, "placeholder", 5);
 
diff --git a/packages/cli/src/tui/PromptInput.tsx b/packages/cli/src/tui/PromptInput.tsx
@@ -74,6 +74,29 @@ export function shouldInsertPromptNewline(key: PromptInputKey): boolean {
   return Boolean(key.return && (key.shift || key.meta || key.super || key.hyper));
 }
 
+export function shouldSubmitPromptInput(input: string, key: PromptInputKey): boolean {
+  return Boolean(key.return || /[\r\n]/.test(input));
+}
+
+export function buildPromptSubmitValue(
+  input: string,
+  key: PromptInputKey,
+  value: string,
+  cursorPos: number,
+): string | null {
+  if (key.return) {
+    return value;
+  }
+
+  const submitIndex = input.search(/[\r\n]/);
+  if (submitIndex === -1) {
+    return null;
+  }
+
+  const submittedInput = input.slice(0, submitIndex);
+  return value.slice(0, cursorPos) + submittedInput + value.slice(cursorPos);
+}
+
 function splitGraphemes(line: string): Array<{ readonly text: string; readonly start: number; readonly end: number }> {
   return Array.from(
     GRAPHEME_SEGMENTER.segment(line),
@@ -228,9 +251,11 @@ export function PromptInput({
       return;
     }
 
-    // Submit on plain Enter
-    if (key.return) {
-      const trimmed = value.trim();
+    // Submit on plain Enter. Some ptys deliver Enter as a literal newline
+    // instead of Ink's normalized key.return, so handle both forms.
+    if (shouldSubmitPromptInput(input, key)) {
+      const submitValue = buildPromptSubmitValue(input, key, value, cursorPos) ?? value;
+      const trimmed = submitValue.trim();
       if (trimmed) {
         onSubmit(trimmed);
         setValue("");
diff --git a/packages/executor/src/index.test.ts b/packages/executor/src/index.test.ts
@@ -518,6 +518,10 @@ describe("skills", () => {
     ];
 
     const reviewQuery = buildTaskQuery(reviewRequest);
+    expect(reviewQuery).toContain("Workspace is review-only for this stage. No file changes are allowed.");
+    expect(reviewQuery).toContain("Do not use update_plan for this stage.");
+    expect(reviewQuery).toContain("return the final review artifact directly");
+    expect(reviewQuery).toContain("No defects found.");
     expect(reviewQuery.indexOf("Approved issue spec artifact:")).toBeLessThan(reviewQuery.indexOf("Implementation summary artifact:"));
     expect(reviewQuery.indexOf("Implementation summary artifact:")).toBeLessThan(reviewQuery.indexOf("Issue unit details:"));
     expect(reviewQuery.indexOf("Issue unit details:")).toBeLessThan(reviewQuery.indexOf("Focus files:"));
diff --git a/packages/executor/src/index.ts b/packages/executor/src/index.ts
@@ -787,7 +787,9 @@ export function buildTaskQuery(
       );
       break;
     case "review":
-      sections.push("Review the current workspace changes and produce a report with either `No defects found.` or one section per defect using the format `Severity: <low|medium|high|critical>` plus a concrete fix recommendation.");
+      sections.push("Workspace is review-only for this stage. No file changes are allowed.");
+      sections.push("Do not use update_plan for this stage. Inspect the current workspace changes as needed, then return the final review artifact directly.");
+      sections.push("Produce a direct review report with either exactly `No defects found.` or one section per defect using the format `Severity: <low|medium|high|critical>` plus a concrete fix recommendation.");
       break;
     case "repair":
       sections.push("Apply repairs for the current issue, address the review findings, and summarize fixes applied plus remaining concerns.");
diff --git a/scripts/live-validation/execute-chain.test.ts b/scripts/live-validation/execute-chain.test.ts
@@ -1,10 +1,14 @@
 import { describe, expect, it } from "bun:test";
 import type { IssueSpecDoc } from "@devagent-sdk/types";
+import { mkdtemp, mkdir, writeFile } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
 import {
   EXECUTE_CHAIN_STAGES,
   buildExecuteChainRequest,
   extractIssueUnitFromIssueSpec,
 } from "./execute-chain-lib";
+import { buildStageFailureMessage as buildStageFailureMessageFromRun } from "./execute-chain";
 
 describe("execute chain helpers", () => {
   it("defines the full staged chain in the expected order", () => {
@@ -119,4 +123,49 @@ describe("execute chain helpers", () => {
       linkedArtifactVersionIds: ["B3"],
     });
   });
+
+  it("summarizes result.json errors and final session summary for failed stages", async () => {
+    const dir = await mkdtemp(join(tmpdir(), "devagent-chain-failure-"));
+    const artifactDir = join(dir, "artifacts");
+    await mkdir(artifactDir, { recursive: true });
+    await writeFile(join(artifactDir, "result.json"), JSON.stringify({
+      protocolVersion: "0.1",
+      taskId: "chain-review",
+      status: "failed",
+      artifacts: [],
+      error: {
+        code: "EXECUTION_FAILED",
+        message: "Task loop exhausted the iteration limit",
+      },
+      outcome: "no_progress",
+      outcomeReason: "iteration_limit",
+      metrics: {
+        startedAt: "2026-04-21T00:00:00.000Z",
+        finishedAt: "2026-04-21T00:00:01.000Z",
+        durationMs: 1000,
+      },
+      session: {
+        kind: "devagent-headless-v1",
+        payload: {
+          version: 1,
+          messages: [
+            { role: "assistant", content: "No defects found." },
+          ],
+        },
+      },
+    }));
+
+    const message = await buildStageFailureMessageFromRun({
+      exitCode: 1,
+      stdout: "",
+      stderr: "",
+      timedOut: false,
+      durationMs: 1000,
+    }, artifactDir);
+
+    expect(message).toContain("Result: failed");
+    expect(message).toContain("Error: EXECUTION_FAILED: Task loop exhausted the iteration limit");
+    expect(message).toContain("Outcome reason: iteration_limit");
+    expect(message).toContain("Final assistant summary: No defects found.");
+  });
 });
diff --git a/scripts/live-validation/execute-chain.ts b/scripts/live-validation/execute-chain.ts
@@ -6,7 +6,7 @@ import { mkdir, mkdtemp, readFile, stat, writeFile } from "node:fs/promises";
 import { tmpdir } from "node:os";
 import { dirname, join } from "node:path";
 import { fileURLToPath } from "node:url";
-import type { BreakdownDoc, IssueSpecDoc, TaskExecutionRequest } from "@devagent-sdk/types";
+import type { BreakdownDoc, IssueSpecDoc, TaskExecutionRequest, TaskExecutionResult } from "@devagent-sdk/types";
 import {
   captureGitOutputs,
   createIsolationWorkspaceWithTimeout,
@@ -224,6 +224,57 @@ async function collectArtifactEntries(
   return entries;
 }
 
+function extractFinalAssistantSummary(result: TaskExecutionResult): string | undefined {
+  const messages = result.session?.payload?.messages;
+  if (!Array.isArray(messages)) {
+    return undefined;
+  }
+
+  for (let index = messages.length - 1; index >= 0; index--) {
+    const message = messages[index] as { role?: string; content?: unknown };
+    if (message.role === "assistant" && typeof message.content === "string" && message.content.trim().length > 0) {
+      return message.content.trim();
+    }
+  }
+  return undefined;
+}
+
+export async function buildStageFailureMessage(
+  commandResult: CommandResult,
+  artifactDir: string,
+): Promise<string | undefined> {
+  const resultPath = join(artifactDir, "result.json");
+  const details: string[] = [];
+
+  if (existsSync(resultPath)) {
+    try {
+      const result = JSON.parse(await readFile(resultPath, "utf-8")) as TaskExecutionResult;
+      details.push(`Result: ${result.status}`);
+      if (result.error) {
+        details.push(`Error: ${result.error.code}: ${result.error.message}`);
+      }
+      if (result.outcomeReason) {
+        details.push(`Outcome reason: ${result.outcomeReason}`);
+      }
+      const finalSummary = extractFinalAssistantSummary(result);
+      if (finalSummary) {
+        details.push(`Final assistant summary: ${finalSummary}`);
+      }
+    } catch (error) {
+      details.push(`Failed to parse ${resultPath}: ${error instanceof Error ? error.message : String(error)}`);
+    }
+  }
+
+  const commandOutput = commandResult.stderr.trim() || commandResult.stdout.trim();
+  if (commandOutput) {
+    details.push(commandOutput);
+  }
+  if (commandResult.exitCode !== 0 && details.length === 0) {
+    details.push(`Stage exited with ${commandResult.exitCode}.`);
+  }
+  return details.length > 0 ? details.join("\n") : undefined;
+}
+
 async function updatePriorArtifacts(
   stage: ExecuteChainStage,
   artifactDir: string,
@@ -398,7 +449,7 @@ async function main(): Promise<void> {
         && artifactFiles.every((artifact) => artifact.exists)
         && workspacePassed;
       const failureMessage = commandResult.exitCode !== 0
-        ? commandResult.stderr.trim() || commandResult.stdout.trim() || `Stage exited with ${commandResult.exitCode}.`
+        ? await buildStageFailureMessage(commandResult, artifactDir)
         : !artifactFiles.every((artifact) => artifact.exists)
           ? "Missing expected stage artifacts."
           : !workspacePassed
@@ -474,7 +525,9 @@ async function main(): Promise<void> {
   }
 }
 
-main().catch((error) => {
-  process.stderr.write(`${error instanceof Error ? error.message : String(error)}\n`);
-  process.exit(1);
-});
+if (import.meta.main) {
+  main().catch((error) => {
+    process.stderr.write(`${error instanceof Error ? error.message : String(error)}\n`);
+    process.exit(1);
+  });
+}
diff --git a/scripts/live-validation/tui-validator.test.ts b/scripts/live-validation/tui-validator.test.ts
diff --git a/scripts/live-validation/tui-validator.ts b/scripts/live-validation/tui-validator.ts
diff --git a/scripts/smoke-publish-bundle.ts b/scripts/smoke-publish-bundle.ts

Original file line number	Diff line number	Diff line change
@@ -81,7 +81,7 @@ Manage stored provider credentials for DevAgent.`;
`81`	`81`	`// ─── Entry Point ────────────────────────────────────────────`
`82`	`82`
`83`	`83`	`export async function runAuthCommand(subcommand: string, args: ReadonlyArray<string> = []): Promise<void> {`
`84`		`- if (subcommand === "--help" \|\| subcommand === "-h") {`
	`84`	`+ if (subcommand === "--help" \|\| subcommand === "-h" \|\| args.includes("--help") \|\| args.includes("-h")) {`
`85`	`85`	`process.stdout.write(renderAuthHelpText() + "\n");`
`86`	`86`	`return;`
`87`	`87`	`}`