Description
This is from a Claude Code perspective; I'm not sure how much this impacts other IDEs.
Two additions to the Party Mode flow in 6.3.0 combine to produce a regression in quality vs 6.2.2:
- launching personas in subagents by default
- preventing subagents from using tools
Subagents have no context other than the summary the orchestrator provides (prompted at <400 words), and without tool use they are not able to validate or extend the information the orchestrator passes to them.
I tested one code-review scenario across five different configurations (N=1 anecdotal in each), with findings in a consistent direction:
- When launched mid-context, Party Mode with subagents in intended use (no-tools-allowed default) produced output where accurate and hallucinated findings were mixed indistinguishably — a reader would have to manually verify every claim before any could be used — while consuming more tokens than
--solo mode.
- Claude Opus 4.7 overrode flow instructions in every fresh-session Party Mode w/ subagent attempts — first by allowing tool use, then by front-loading pre-computed findings into the Discussion Context summary — in order to produce grounded output.
Across every observed subagent-mode execution the orchestrator or subagents broke the skill instructions or hallucinated to produce an output. This is a strong indicator that the underlying instructions need revision for the subagent scenario.
--solo mode remains more effective, although I couldn't test quantitatively against the goal of separating agent personas (the given reason for using subagents).
Cross-round table, per-claim audit, and details in follow-up comment.
Steps to reproduce
- BMAD 6.3.0, standard agent manifest, Claude Code.
- Accumulate ~30–50 turns of project context with no prior source-reads of the file to be reviewed (the "mid-conversation" use case).
- Invoke
/bmad-party-mode with no flags; ask a codebase-grounded review question against a ~1000-line file.
- Inspect the spawn prompts: each ends with
- Do NOT use tools. Just respond with your perspective. (SKILL.md:82). All spawns return tool_uses: 0.
- Open the files the agents referenced. Claims will likely match the Discussion Context where accurate and diverge where the summary was lossy. In R1 on a ~1100-line installer, ~13 of ~38 distinct claims diverged.
Fresh-session runs are covered by R4 in follow-up — conversely, in both fresh-session subagent attempt we ran, Opus deviated from the workflow instructions - first to allow agents tool-use, and later to provide actual pre-identified code problems to the persona subagents rather than discussion context.
Expected behavior
Party Mode on a codebase-grounded question should produce claims grounded in the source, and not limited by the orchestrator's summarization. I can imagine this could be done either by giving subagents a verification path (tool access) or by operating in a mode where the source is in context (--solo).
Actual behavior
Default-mode output mixes accurate and fabricated claims, as well as accurate and inaccurate workflow steps, with no way for the reader to tell them apart without manual inspection.
Proposed fix
Primary ask (release-valve): revert the default to --solo. Argument-parsing-level change; subagent spawning remains available via explicit --subagents flag. PR #2160's architectural intent is preserved as opt-in. This is a fallback to the last grounded code path while subagent mode is improved.
Secondary ask (when ready): repair subagent mode. The obvious step is to remove (or read-only-scope) the Do NOT use tools line, but a bare removal would increase token costs and introduce other potential issues. Tool-scoping policy options, cost/safety considerations, and Claude Code Agent-tool implementation constraints are discussed in the follow-up comment.
Primary and secondary can land independently.
Screenshots
N/A — text-only failure mode.
Which module is this for?
BMad Method (BMM) - Core Framework
BMad Version
6.3.0
Which AI IDE are you using?
Claude Code / Opus 4.7
Operating System
WSL2
Relevant log output
"Mid-context" below = session with ~30-50 turns of prior unrelated discussion before
the party-mode invocation (the skill's stated use case).
"Fresh" below = brand-new session, no prior turns before the party-mode invocation.
Every attempted run of subagent party mode with the skill as written:
R1 (mid-context, template-faithful): subagents fabricated ~13/~38 claims (~34%)
R4 attempt 1 (fresh, test 1): orchestrator deleted "Do NOT use tools" line; grounded
R4 attempt 2 (fresh, test 2): orchestrator exceeded 400-word summary + front-loaded
12 pre-computed findings into the Discussion Context;
grounded, but ~12 of ~17 findings were orchestrator pre-work
Non-subagent (solo) and experimentally-modified subagent rounds:
R2 (mid-context, "Do NOT use tools" line removed as test): grounded, 0 contradictions
R3 (fresh, --solo): grounded, 0 contradictions
R5 (mid-context, --solo, same session state as R1): grounded, ~15 file:line cites
Regression provenance: PR #2160 (merge ce9c664, 2026-03-29), +119 / -677 across bmad-party-mode
"Do NOT use tools" grep across party-mode source: 0 at v6.0.0, 0 at v6.2.2, 1 at v6.3.0 (SKILL.md:82)
Confirm
Description
This is from a Claude Code perspective; I'm not sure how much this impacts other IDEs.
Two additions to the Party Mode flow in 6.3.0 combine to produce a regression in quality vs 6.2.2:
Subagents have no context other than the summary the orchestrator provides (prompted at <400 words), and without tool use they are not able to validate or extend the information the orchestrator passes to them.
I tested one code-review scenario across five different configurations (N=1 anecdotal in each), with findings in a consistent direction:
--solomode.Across every observed subagent-mode execution the orchestrator or subagents broke the skill instructions or hallucinated to produce an output. This is a strong indicator that the underlying instructions need revision for the subagent scenario.
--solomode remains more effective, although I couldn't test quantitatively against the goal of separating agent personas (the given reason for using subagents).Cross-round table, per-claim audit, and details in follow-up comment.
Steps to reproduce
/bmad-party-modewith no flags; ask a codebase-grounded review question against a ~1000-line file.- Do NOT use tools. Just respond with your perspective.(SKILL.md:82). All spawns returntool_uses: 0.Fresh-session runs are covered by R4 in follow-up — conversely, in both fresh-session subagent attempt we ran, Opus deviated from the workflow instructions - first to allow agents tool-use, and later to provide actual pre-identified code problems to the persona subagents rather than discussion context.
Expected behavior
Party Mode on a codebase-grounded question should produce claims grounded in the source, and not limited by the orchestrator's summarization. I can imagine this could be done either by giving subagents a verification path (tool access) or by operating in a mode where the source is in context (
--solo).Actual behavior
Default-mode output mixes accurate and fabricated claims, as well as accurate and inaccurate workflow steps, with no way for the reader to tell them apart without manual inspection.
Proposed fix
Primary ask (release-valve): revert the default to
--solo. Argument-parsing-level change; subagent spawning remains available via explicit--subagentsflag. PR #2160's architectural intent is preserved as opt-in. This is a fallback to the last grounded code path while subagent mode is improved.Secondary ask (when ready): repair subagent mode. The obvious step is to remove (or read-only-scope) the
Do NOT use toolsline, but a bare removal would increase token costs and introduce other potential issues. Tool-scoping policy options, cost/safety considerations, and Claude Code Agent-tool implementation constraints are discussed in the follow-up comment.Primary and secondary can land independently.
Screenshots
N/A — text-only failure mode.
Which module is this for?
BMad Method (BMM) - Core Framework
BMad Version
6.3.0
Which AI IDE are you using?
Claude Code / Opus 4.7
Operating System
WSL2
Relevant log output
Confirm