Skip to content

Commit 6ae6701

Browse files
egavrindevagent
andcommitted
fix: stabilize release validation
- Keep review-stage execution non-mutating and direct - Improve live validation diagnostics and TUI command checks - Document credential-store seeding for provider coverage Co-Authored-By: devagent <devagent@egavrin>
1 parent 2087054 commit 6ae6701

14 files changed

Lines changed: 290 additions & 36 deletions

File tree

.agents/skills/validate-user-surface/SKILL.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Treat the repository like a release candidate. Prefer live execution against bui
1818

1919
- Run live checks for user-facing behavior. Do not count unit tests or code reading as release validation.
2020
- Use isolated temp homes and temp repos. Do not reuse the operator's real `~/.config/devagent`.
21+
- Environment variables are not the only credential source. The live harness can seed non-expired credentials from the local DevAgent `CredentialStore` into isolated homes; run `bun run validate:live:provider-smoke` before marking providers blocked just because API-key env vars are unset.
2122
- Prefer the publish bundle for install and packaging checks. Validate the developer CLI separately only when comparing dev-versus-publish behavior.
2223
- Treat missing provider credentials or missing external dependencies as validation gaps, not silent skips.
2324
- Do not publish to npm unless the user explicitly asks.
@@ -39,7 +40,7 @@ bun run test:live-validation
3940
bun run validate:live:full
4041
```
4142

42-
2. Create isolated homes and disposable workspaces for each install, auth, TUI, and query-flow pass.
43+
2. Create isolated homes and disposable workspaces for each install, auth, TUI, and query-flow pass. When provider credentials exist in the local DevAgent credential store, copy only the required non-expired credentials into those isolated homes rather than running against the operator's real HOME.
4344
3. Use `cd dist && npm pack` to create a publishable tarball, then validate install and launch paths from that artifact.
4445
4. Exercise documented install and launch paths live: tarball install, `npx`, `bunx`, bundled bootstrap, and linked local CLI when helpful.
4546
5. Cover the provider matrix from `references/release-matrix.md`. Prefer every documented provider. If full coverage is impossible, call out each unvalidated provider explicitly.
@@ -49,13 +50,18 @@ bun run validate:live:full
4950

5051
## Mandatory Surfaces
5152

52-
- Packaging and install: `bun run build:publish`, `bun run test:bundle-smoke`, `npm pack` from `dist/`, tarball install, uninstall and reinstall, Node 20 bootstrap, and upgrade behavior.
53+
- Packaging and install: `bun run build:publish`, `bun run test:bundle-smoke`, `npm pack` from `dist/`, tarball install, uninstall and reinstall, Node 20 bootstrap help, installed-runtime session startup, and upgrade behavior.
5354
- Docs and metadata: README install snippets, quick start, provider list, command list, environment variables, `WORKFLOW.md` claims, copied `dist/README.md`, and generated `dist/package.json`.
5455
- CLI basics: `devagent help`, `version`, `doctor`, `configure`, `config get/set/path`, `completions`, `sessions`, `--resume`, `--continue`, `--provider`, `--model`, and `-f`.
5556
- Auth: `devagent auth login/status/logout` for API-key providers and device-code providers in isolated homes.
5657
- Query execution: interactive TUI, single-shot query execution, quiet and non-TTY behavior, `devagent review`, and `devagent execute`.
5758
- Provider coverage: Anthropic, OpenAI, Devagent API, DeepSeek, OpenRouter, Ollama, ChatGPT, and GitHub Copilot when credentials or local services are available.
5859

60+
## Credential And Bootstrap Notes
61+
62+
- Use `bun run validate:live:provider-smoke` to discover locally stored credentials and local services; it seeds isolated homes from `CredentialStore` and reports per-provider pass/block status.
63+
- For raw bundle checks, `node dist/bootstrap.js --help` is valid. Validate `sessions` from a staged or installed publish runtime, because raw `dist/` does not include installed native dependencies such as `better-sqlite3`.
64+
5965
## Reporting
6066

6167
- Summarize by surface: packaging, install, docs, CLI, TUI, auth, review, execute, and provider matrix.

.agents/skills/validate-user-surface/references/release-matrix.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Use this file when planning coverage or writing the final report.
2323
- Validate on real Node 20+ because the publish bootstrap targets Node, not Bun's Node shim.
2424
- Validate Bun-backed developer flows when the README or local contributor workflow depends on Bun.
2525
- Use isolated `HOME`, `XDG_CONFIG_HOME`, and `XDG_CACHE_HOME` for every install and auth pass.
26+
- Do not infer provider credentials only from environment variables. The live validation harness can copy non-expired credentials from the local DevAgent `CredentialStore` into isolated homes; run provider smoke before marking credential-backed providers blocked.
2627
- Keep one clean temp repo for install and help checks and separate temp repos for query and mutation scenarios.
2728

2829
## Packaging And Install Matrix
@@ -32,7 +33,8 @@ Use this file when planning coverage or writing the final report.
3233
- Run `cd dist && npm pack`.
3334
- Install the tarball into a temp prefix and verify `devagent help`, `devagent version`, and `devagent doctor`.
3435
- Remove that install and repeat to catch stale-file issues.
35-
- Validate `node dist/bootstrap.js --help` and `node dist/bootstrap.js sessions`.
36+
- Validate `node dist/bootstrap.js --help` directly from raw `dist/`.
37+
- Validate `sessions` from the staged or installed publish runtime, because raw `dist/` intentionally lacks installed native dependencies such as `better-sqlite3`.
3638
- Validate `npx` and `bunx` invocation paths against a prerelease tag when one exists.
3739
- If no prerelease tag exists, validate the closest local equivalent and mark registry-backed `npx` or `bunx` as still pending.
3840
- Compare `dist/package.json` and copied `dist/README.md` against the root contract.
@@ -133,6 +135,7 @@ For each provider:
133135
- Use `bun run scripts/live-validation.ts --list-scenarios` to inventory current scenarios.
134136
- Run the full suite before adding bespoke manual checks.
135137
- Inspect `summary.json` and `summary.md` from the generated output directory.
138+
- Run `bun run validate:live:provider-smoke` before declaring provider coverage blocked; it verifies stored local credentials and Ollama service availability from isolated homes.
136139
- Use `bun run validate:live:execute-deep` when you need one ordered `execute` packet with prereqs, canonical staged flow, continuity checks, remainder coverage, and per-scenario review notes.
137140
- Use `bun run validate:live:execute-deep --only canonical|continuity|remainder --skip-prereqs` for focused local reruns after a broader packet establishes the baseline.
138141
- Use `bun run validate:live:execute-chain` when you need one disposable-worktree run that carries real stage artifacts forward into `implement`, `review`, and `repair`.

packages/cli/src/auth.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Manage stored provider credentials for DevAgent.`;
8181
// ─── Entry Point ────────────────────────────────────────────
8282

8383
export async function runAuthCommand(subcommand: string, args: ReadonlyArray<string> = []): Promise<void> {
84-
if (subcommand === "--help" || subcommand === "-h") {
84+
if (subcommand === "--help" || subcommand === "-h" || args.includes("--help") || args.includes("-h")) {
8585
process.stdout.write(renderAuthHelpText() + "\n");
8686
return;
8787
}

packages/cli/src/main.test.ts

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -946,6 +946,40 @@ describe("review command validation", () => {
946946
});
947947
});
948948

949+
describe("help command validation", () => {
950+
it("prints execute help with exit code 0", () => {
951+
execFileSync("bun", ["run", "build"], {
952+
cwd: cliPackageDir,
953+
stdio: "pipe",
954+
});
955+
956+
const output = execFileSync("bun", ["dist/index.js", "execute", "--help"], {
957+
cwd: cliPackageDir,
958+
encoding: "utf-8",
959+
stdio: ["ignore", "pipe", "pipe"],
960+
});
961+
962+
expect(output).toContain("Usage: devagent execute --request <file> --artifact-dir <dir>");
963+
});
964+
965+
it("prints auth login help without entering the provider picker", () => {
966+
execFileSync("bun", ["run", "build"], {
967+
cwd: cliPackageDir,
968+
stdio: "pipe",
969+
});
970+
971+
const output = execFileSync("bun", ["dist/index.js", "auth", "login", "--help"], {
972+
cwd: cliPackageDir,
973+
encoding: "utf-8",
974+
stdio: ["ignore", "pipe", "pipe"],
975+
});
976+
977+
expect(output).toContain("Usage:");
978+
expect(output).toContain("devagent auth login");
979+
expect(output).not.toContain("Select provider:");
980+
});
981+
});
982+
949983
describe("checkForUpdates", () => {
950984
afterEach(() => {
951985
delete process.env["DEVAGENT_DISABLE_UPDATE_CHECK"];

packages/cli/src/main.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1550,6 +1550,10 @@ export async function main(): Promise<void> {
15501550
loadTaskExecutionRequest,
15511551
parseExecuteArgs,
15521552
} = await import("@devagent/executor");
1553+
if (process.argv.includes("--help") || process.argv.includes("-h")) {
1554+
process.stdout.write("Usage: devagent execute --request <file> --artifact-dir <dir>\n");
1555+
return;
1556+
}
15531557
const executeArgs = parseExecuteArgs(process.argv);
15541558
if (!executeArgs) {
15551559
process.exit(1);

packages/cli/src/tui/PromptInput.test.ts

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@ import { describe, expect, it } from "vitest";
66

77
import { TUI_HELP_MESSAGE } from "./App.js";
88
import {
9+
buildPromptSubmitValue,
910
buildPromptRows,
1011
getCompletions,
1112
shouldInsertPromptNewline,
13+
shouldSubmitPromptInput,
1214
SLASH_COMMANDS,
1315
} from "./PromptInput.js";
1416
import { cycleApprovalMode, resolvePromptTabAction } from "./shared.js";
@@ -59,6 +61,21 @@ describe("PromptInput layout helpers", () => {
5961
expect(shouldInsertPromptNewline({ shift: true })).toBe(false);
6062
});
6163

64+
it("treats plain terminal newline bytes as submit input", () => {
65+
expect(shouldSubmitPromptInput("", { return: true })).toBe(true);
66+
expect(shouldSubmitPromptInput("\r", {})).toBe(true);
67+
expect(shouldSubmitPromptInput("\n", {})).toBe(true);
68+
expect(shouldSubmitPromptInput("/help\n", {})).toBe(true);
69+
expect(shouldSubmitPromptInput("x", {})).toBe(false);
70+
});
71+
72+
it("builds submit values from paste chunks that include Enter", () => {
73+
expect(buildPromptSubmitValue("", { return: true }, "/help", 5)).toBe("/help");
74+
expect(buildPromptSubmitValue("/help\n", {}, "", 0)).toBe("/help");
75+
expect(buildPromptSubmitValue("x\rignored", {}, "ab", 1)).toBe("axb");
76+
expect(buildPromptSubmitValue("x", {}, "ab", 1)).toBeNull();
77+
});
78+
6279
it("wraps long prompt text into explicit continuation rows", () => {
6380
const rows = buildPromptRows("abcdefghij", 5, "placeholder", 5);
6481

packages/cli/src/tui/PromptInput.tsx

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,29 @@ export function shouldInsertPromptNewline(key: PromptInputKey): boolean {
7474
return Boolean(key.return && (key.shift || key.meta || key.super || key.hyper));
7575
}
7676

77+
export function shouldSubmitPromptInput(input: string, key: PromptInputKey): boolean {
78+
return Boolean(key.return || /[\r\n]/.test(input));
79+
}
80+
81+
export function buildPromptSubmitValue(
82+
input: string,
83+
key: PromptInputKey,
84+
value: string,
85+
cursorPos: number,
86+
): string | null {
87+
if (key.return) {
88+
return value;
89+
}
90+
91+
const submitIndex = input.search(/[\r\n]/);
92+
if (submitIndex === -1) {
93+
return null;
94+
}
95+
96+
const submittedInput = input.slice(0, submitIndex);
97+
return value.slice(0, cursorPos) + submittedInput + value.slice(cursorPos);
98+
}
99+
77100
function splitGraphemes(line: string): Array<{ readonly text: string; readonly start: number; readonly end: number }> {
78101
return Array.from(
79102
GRAPHEME_SEGMENTER.segment(line),
@@ -228,9 +251,11 @@ export function PromptInput({
228251
return;
229252
}
230253

231-
// Submit on plain Enter
232-
if (key.return) {
233-
const trimmed = value.trim();
254+
// Submit on plain Enter. Some ptys deliver Enter as a literal newline
255+
// instead of Ink's normalized key.return, so handle both forms.
256+
if (shouldSubmitPromptInput(input, key)) {
257+
const submitValue = buildPromptSubmitValue(input, key, value, cursorPos) ?? value;
258+
const trimmed = submitValue.trim();
234259
if (trimmed) {
235260
onSubmit(trimmed);
236261
setValue("");

packages/executor/src/index.test.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -518,6 +518,10 @@ describe("skills", () => {
518518
];
519519

520520
const reviewQuery = buildTaskQuery(reviewRequest);
521+
expect(reviewQuery).toContain("Workspace is review-only for this stage. No file changes are allowed.");
522+
expect(reviewQuery).toContain("Do not use update_plan for this stage.");
523+
expect(reviewQuery).toContain("return the final review artifact directly");
524+
expect(reviewQuery).toContain("No defects found.");
521525
expect(reviewQuery.indexOf("Approved issue spec artifact:")).toBeLessThan(reviewQuery.indexOf("Implementation summary artifact:"));
522526
expect(reviewQuery.indexOf("Implementation summary artifact:")).toBeLessThan(reviewQuery.indexOf("Issue unit details:"));
523527
expect(reviewQuery.indexOf("Issue unit details:")).toBeLessThan(reviewQuery.indexOf("Focus files:"));

packages/executor/src/index.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -787,7 +787,9 @@ export function buildTaskQuery(
787787
);
788788
break;
789789
case "review":
790-
sections.push("Review the current workspace changes and produce a report with either `No defects found.` or one section per defect using the format `Severity: <low|medium|high|critical>` plus a concrete fix recommendation.");
790+
sections.push("Workspace is review-only for this stage. No file changes are allowed.");
791+
sections.push("Do not use update_plan for this stage. Inspect the current workspace changes as needed, then return the final review artifact directly.");
792+
sections.push("Produce a direct review report with either exactly `No defects found.` or one section per defect using the format `Severity: <low|medium|high|critical>` plus a concrete fix recommendation.");
791793
break;
792794
case "repair":
793795
sections.push("Apply repairs for the current issue, address the review findings, and summarize fixes applied plus remaining concerns.");

scripts/live-validation/execute-chain.test.ts

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
import { describe, expect, it } from "bun:test";
22
import type { IssueSpecDoc } from "@devagent-sdk/types";
3+
import { mkdtemp, mkdir, writeFile } from "node:fs/promises";
4+
import { tmpdir } from "node:os";
5+
import { join } from "node:path";
36
import {
47
EXECUTE_CHAIN_STAGES,
58
buildExecuteChainRequest,
69
extractIssueUnitFromIssueSpec,
710
} from "./execute-chain-lib";
11+
import { buildStageFailureMessage as buildStageFailureMessageFromRun } from "./execute-chain";
812

913
describe("execute chain helpers", () => {
1014
it("defines the full staged chain in the expected order", () => {
@@ -119,4 +123,49 @@ describe("execute chain helpers", () => {
119123
linkedArtifactVersionIds: ["B3"],
120124
});
121125
});
126+
127+
it("summarizes result.json errors and final session summary for failed stages", async () => {
128+
const dir = await mkdtemp(join(tmpdir(), "devagent-chain-failure-"));
129+
const artifactDir = join(dir, "artifacts");
130+
await mkdir(artifactDir, { recursive: true });
131+
await writeFile(join(artifactDir, "result.json"), JSON.stringify({
132+
protocolVersion: "0.1",
133+
taskId: "chain-review",
134+
status: "failed",
135+
artifacts: [],
136+
error: {
137+
code: "EXECUTION_FAILED",
138+
message: "Task loop exhausted the iteration limit",
139+
},
140+
outcome: "no_progress",
141+
outcomeReason: "iteration_limit",
142+
metrics: {
143+
startedAt: "2026-04-21T00:00:00.000Z",
144+
finishedAt: "2026-04-21T00:00:01.000Z",
145+
durationMs: 1000,
146+
},
147+
session: {
148+
kind: "devagent-headless-v1",
149+
payload: {
150+
version: 1,
151+
messages: [
152+
{ role: "assistant", content: "No defects found." },
153+
],
154+
},
155+
},
156+
}));
157+
158+
const message = await buildStageFailureMessageFromRun({
159+
exitCode: 1,
160+
stdout: "",
161+
stderr: "",
162+
timedOut: false,
163+
durationMs: 1000,
164+
}, artifactDir);
165+
166+
expect(message).toContain("Result: failed");
167+
expect(message).toContain("Error: EXECUTION_FAILED: Task loop exhausted the iteration limit");
168+
expect(message).toContain("Outcome reason: iteration_limit");
169+
expect(message).toContain("Final assistant summary: No defects found.");
170+
});
122171
});

0 commit comments

Comments
 (0)