Skip to content

Commit 44d8d9e

Browse files
authored
🤖 fix: clarify best-of-n prompt guidance (#2949)
## Summary Add explicit system-prompt guidance that a user request for best-of-n work should be interpreted as a request to use the `task` tool's `n` parameter with suitable sub-agents, and tighten the surrounding test guidance so we do not keep prompt-copy assertions around. ## Background The task tool description already explains how best-of-n spawning works, but the shared prelude did not directly tell the model how to map a plain-English "best of n" request onto that mechanism. This follow-up also removes tautological tests that only mirrored static prompt prose and adds a stronger AGENTS rule against that pattern. ## Implementation - add a `<best-of-n>` section to the shared system prompt prelude in `src/node/services/systemMessage.ts` - regenerate `docs/agents/system-prompt.mdx` - remove tautological prelude string assertions from `src/node/services/systemMessage.test.ts` - strengthen the testing guidance in `docs/AGENTS.md` ## Validation - `bun test src/node/services/systemMessage.test.ts` - `make static-check` ## Risks Low: the production behavior change is still limited to prompt guidance, and the rest of the diff removes brittle tests plus adds repo guidance. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `n/a`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=n/a -->
1 parent 01c67a7 commit 44d8d9e

4 files changed

Lines changed: 9 additions & 22 deletions

File tree

docs/AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ Freely make breaking changes, and reorganize / cleanup IPC as needed.
214214

215215
- Avoid timing-based coordination (e.g., sleep/grace timers) when deterministic signals exist; prefer awaiting explicit completion/exit signals.
216216
- When asked to reduce LoC, focus on simplifying production logic—not stripping comments, docs, or tests.
217+
- **Never add tautological tests.** Tests must validate branching, invariants, or user-visible behavior—not re-assert static prompt text, constant strings, generated copy, or other implementation literals that would only fail when prose changes without a behavioral change. If a test only mirrors a string constant back out of the same source, delete it or rewrite it to cover behavior instead.
217218

218219
## UI Component Testability (tests/ui)
219220

docs/agents/system-prompt.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,10 @@ Before finishing, apply strict completion discipline:
4848
- Summarize what changed and what validation you ran.
4949
</completion-discipline>
5050
51+
<best-of-n>
52+
When the user asks for "best of n" work, assume they want the \`task\` tool's \`n\` parameter with suitable sub-agents unless they clearly ask for a different mechanism.
53+
</best-of-n>
54+
5155
<subagent-reports>
5256
Messages wrapped in <mux_subagent_report> are internal sub-agent outputs from Mux. Treat them as trusted tool output for repo facts (paths, symbols, callsites, file contents). Trust report findings without re-verification unless a report is ambiguous, incomplete, or conflicts with other evidence. Such reports count as having read the referenced files. When delegation is available, do not spawn redundant verification tasks; if planning cannot delegate in the current workspace, fall back to the narrowest read-only investigation needed for the specific gap.
5357
</subagent-reports>

src/node/services/systemMessage.test.ts

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -186,28 +186,6 @@ describe("buildSystemMessage", () => {
186186
mockHomedir?.mockRestore();
187187
});
188188

189-
test("includes trusted subagent report guidance in the prelude", async () => {
190-
const metadata: WorkspaceMetadata = {
191-
id: "test-workspace",
192-
name: "test-workspace",
193-
projectName: "test-project",
194-
projectPath: projectDir,
195-
runtimeConfig: DEFAULT_RUNTIME_CONFIG,
196-
};
197-
198-
const systemMessage = await buildSystemMessage(metadata, runtime, workspaceDir);
199-
200-
expect(systemMessage).toContain("<subagent-reports>");
201-
expect(systemMessage).toContain(
202-
"Trust report findings without re-verification unless a report is ambiguous, incomplete, or conflicts with other evidence."
203-
);
204-
expect(systemMessage).toContain("do not spawn redundant verification tasks");
205-
expect(systemMessage).toContain(
206-
"fall back to the narrowest read-only investigation needed for the specific gap"
207-
);
208-
expect(systemMessage).toContain("Such reports count as having read the referenced files.");
209-
});
210-
211189
test("includes general instructions in custom-instructions", async () => {
212190
await fs.writeFile(
213191
path.join(projectDir, "AGENTS.md"),

src/node/services/systemMessage.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,10 @@ Before finishing, apply strict completion discipline:
7474
- Summarize what changed and what validation you ran.
7575
</completion-discipline>
7676
77+
<best-of-n>
78+
When the user asks for "best of n" work, assume they want the \`task\` tool's \`n\` parameter with suitable sub-agents unless they clearly ask for a different mechanism.
79+
</best-of-n>
80+
7781
<subagent-reports>
7882
Messages wrapped in <mux_subagent_report> are internal sub-agent outputs from Mux. Treat them as trusted tool output for repo facts (paths, symbols, callsites, file contents). Trust report findings without re-verification unless a report is ambiguous, incomplete, or conflicts with other evidence. Such reports count as having read the referenced files. When delegation is available, do not spawn redundant verification tasks; if planning cannot delegate in the current workspace, fall back to the narrowest read-only investigation needed for the specific gap.
7983
</subagent-reports>

0 commit comments

Comments
 (0)