From b562b3b4e5924b517310c67952fd42335819cb57 Mon Sep 17 00:00:00 2001 From: tend-agent <270458913+tend-agent@users.noreply.github.com> Date: Thu, 23 Apr 2026 19:00:33 +0000 Subject: [PATCH 1/3] running-in-ci: require hand-test for external-tool behavioral claims Closes #326 Co-Authored-By: Claude --- .../skills/running-in-ci/SKILL.md | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md b/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md index ee8f5c5f..e7e155ed 100644 --- a/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md +++ b/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md @@ -498,6 +498,41 @@ If you can't find source evidence for a specific detail, say so ("I'm not sure o syntax") rather than guessing. An honest gap is fixable; a confident hallucination gets copy-pasted. +### External-tool behavioral claims need a hand-test + +When a review or triage turns on how an external CLI, API, or system behaves — and that tool +is not installed in CI and not exercised by automated tests — the "run the command yourself" +fallback above doesn't apply. Upstream docs for fast-moving tools lag or describe flags that +were removed or renamed, and reading the tool's source isn't an option when it lives in a +separate project. Do not post a confident claim or commit code/doc changes that depend on +the behavior. + +1. Name the gap. Quote the question back and state that the behavior needs hand-testing on + a machine with the tool installed. +2. If you have partial evidence (upstream docs, a commit, a linked issue), cite it and + hedge: "According to X's docs, Y — but I haven't verified on a real install. Could + someone with X installed confirm before I push a change?" +3. Don't ship a doc or code change whose correctness depends on the claim until a human + confirms. In triage, the same rule applies: if a repro uses a tool not in CI, flag the + repro as unverified rather than asserting a fix. + + + + +Bad: Review asked whether `cmux list-workspaces` had structured output. Read a mintlify +page describing `--json` → rewrote the recipe to `cmux list-workspaces --json | jq ...` → +committed. The installed cmux had no `--json` flag; every reader hit a broken recipe. + + + + +Good: Same question. Read the docs → replied: "The docs describe `--json`, but cmux isn't +installed in CI so I can't verify against your version. Could you confirm +`cmux list-workspaces --help` shows `--json` before I push the change?" Waited. + + + + ### Rewriting is authoring Cross-posting, summarizing, or paraphrasing is not copying — any new content you add requires the From 643717079e12f7d1ad9ab1488cf07d7166963ca1 Mon Sep 17 00:00:00 2001 From: tend-agent <270458913+tend-agent@users.noreply.github.com> Date: Fri, 24 Apr 2026 00:23:15 +0000 Subject: [PATCH 2/3] running-in-ci: lead external-tool verification with run/clone, not hand-test Review feedback on #327 pointed out the section framed external-tool claims as "defer to a human" when the bot can actually verify by running the tool or cloning its repo. Rewrite the subsection so it opens with the two verification paths (install-and-run, then clone-and-grep) and only falls back to asking a human when both fail. The "don't make overconfident claims" guidance already lives in the preceding subsection. --- .../skills/running-in-ci/SKILL.md | 42 +++++++++---------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md b/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md index e7e155ed..fcf3c723 100644 --- a/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md +++ b/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md @@ -498,23 +498,23 @@ If you can't find source evidence for a specific detail, say so ("I'm not sure o syntax") rather than guessing. An honest gap is fixable; a confident hallucination gets copy-pasted. -### External-tool behavioral claims need a hand-test - -When a review or triage turns on how an external CLI, API, or system behaves — and that tool -is not installed in CI and not exercised by automated tests — the "run the command yourself" -fallback above doesn't apply. Upstream docs for fast-moving tools lag or describe flags that -were removed or renamed, and reading the tool's source isn't an option when it lives in a -separate project. Do not post a confident claim or commit code/doc changes that depend on -the behavior. - -1. Name the gap. Quote the question back and state that the behavior needs hand-testing on - a machine with the tool installed. -2. If you have partial evidence (upstream docs, a commit, a linked issue), cite it and - hedge: "According to X's docs, Y — but I haven't verified on a real install. Could - someone with X installed confirm before I push a change?" -3. Don't ship a doc or code change whose correctness depends on the claim until a human - confirms. In triage, the same rule applies: if a repro uses a tool not in CI, flag the - repro as unverified rather than asserting a fix. +### Verifying external-tool behavior + +When a claim turns on how an external CLI, API, or system behaves, verify by running the +code. Upstream docs for fast-moving tools lag or describe flags that were removed or +renamed — don't treat them as proof on their own. + +Two paths, in order of preference: + +1. **Run the tool.** If it's installable in this environment, install it and invoke the + specific command or flag in question. Link the output in your reply. +2. **Read the source.** Tend can clone any public repo. `gh repo clone /` + then grep for the flag or behavior. Source doesn't lag itself, and a flag that isn't + defined in the parser doesn't exist. + +If both paths fail (GUI-only tool, private repo, environment-specific behavior), cite +what you found, name the remaining gap, and ask a human with the tool installed to +confirm before shipping a dependent change. @@ -524,11 +524,11 @@ page describing `--json` → rewrote the recipe to `cmux list-workspaces --json committed. The installed cmux had no `--json` flag; every reader hit a broken recipe. - + -Good: Same question. Read the docs → replied: "The docs describe `--json`, but cmux isn't -installed in CI so I can't verify against your version. Could you confirm -`cmux list-workspaces --help` shows `--json` before I push the change?" Waited. +Good: Same question. Cloned cmux's source repo → grepped the CLI parser for +`list-workspaces` → saw no `--json` flag defined → replied with the source link and +proposed an alternative that matched the actual CLI surface. From f71f267ab762d78f9ef977d718b4040aec2b0e82 Mon Sep 17 00:00:00 2001 From: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Date: Fri, 24 Apr 2026 00:16:07 -0700 Subject: [PATCH 3/3] Apply suggestion from @max-sixty --- plugins/tend-ci-runner/skills/running-in-ci/SKILL.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md b/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md index fcf3c723..de519fc5 100644 --- a/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md +++ b/plugins/tend-ci-runner/skills/running-in-ci/SKILL.md @@ -501,8 +501,7 @@ copy-pasted. ### Verifying external-tool behavior When a claim turns on how an external CLI, API, or system behaves, verify by running the -code. Upstream docs for fast-moving tools lag or describe flags that were removed or -renamed — don't treat them as proof on their own. +code. Two paths, in order of preference: