feat(mcp/plugin): enforce clarification gate before planning (#1423)#1425
Conversation
- Add clarification-first instruction override in parse_mode response when clarificationNeeded=true (MCP path) - Add standalone clarification gate in mode_engine.py mirroring MCP server heuristics for PLAN/AUTO modes (plugin standalone path) - Pass user prompt to ModeEngine.build_instructions() for evaluation - Add 7 regression tests in mode.handler.spec.ts verifying directive is applied/skipped correctly - Add 12 tests in test_mode_engine.py for standalone gate behavior Closes #1423
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
JeremyDev87
left a comment
There was a problem hiding this comment.
Review: CHANGES_REQUESTED
CI Status: FAIL
9 e2e tests fail in test_user_prompt_submit_e2e.py — all directly caused by this PR's new clarification gate intercepting short/ambiguous PLAN/AUTO prompts that existing tests send.
Issues Found:
-
[critical] E2E tests not updated for new behavior: The 9 failing tests send prompts like
PLAN design auth feature(19 chars after keyword strip, < MIN_PROMPT_LENGTH=20) or Korean/Japanese/Chinese equivalents. The clarification gate correctly identifies them as ambiguous, but the tests still assert# Mode: PLANinstead of the new🔴 CLARIFICATION REQUIREDoutput. Fix: Either (a) update test prompts to be specific enough (add file paths or function refs so they pass the gate), or (b) add new e2e test cases asserting the clarification behavior and update existing prompts.Affected tests:
TestEnglishModeKeywords[PLAN design auth feature-PLAN]TestKoreanModeKeywords[계획 인증 기능 설계-PLAN],[자동 대시보드 구현-AUTO]TestJapaneseModeKeywords[計画 認証機能の設計-PLAN],[自動 ダッシュボード実装-AUTO]TestChineseModeKeywords[计划 设计认证功能-PLAN],[自动 实现仪表板-AUTO]TestContextInjection::test_context_contains_agent_name,::test_case_insensitive_detection
-
[medium] Standalone path lacks
questionBudgettracking: The MCP path includes budget in the directive (Remaining question budget: N,question_budget=N), butevaluate_clarification_standalone()has no budget parameter and its directive omits budget info. This creates a behavior gap between MCP and standalone modes, partially failing the spec requirement "questionBudget decrements across rounds." -
[low] Overly broad
_TECH_REFERENCE_PATTERNSsnake_case regex: Patternr"\b[a-z]+_[a-z][a-z0-9_]*\b"matches any snake_case token (e.g.,of_the,ice_cream). This could cause false negatives where non-tech prompts skip clarification. Consider narrowing to min 2 segments or requiring programming context.
Code Quality: Good
- No
anytypes, no unused imports, no dead code - Immutable approach in MCP handler (spread override, no mutation)
- Clean JSDoc/docstrings, good type annotations
- Consistent directive format between MCP and standalone (minus budget)
- 7 MCP + 12 standalone unit tests well-structured
Spec Compliance (#1423):
| Criterion | Status |
|---|---|
| Ambiguous PLAN/AUTO → clarification-first | ✅ |
| Asks exactly one highest-value question | ✅ |
| questionBudget decrements across rounds | |
| "just do it" bypass works | ✅ |
| Tests cover MCP and standalone paths |
Recommendation: REQUEST_CHANGES
Fix the 9 failing e2e tests (critical, blocking CI). Address the standalone questionBudget gap (medium). The core implementation logic is solid.
#1423) - Update 9 e2e test prompts to include file references (login.ts, auth.ts, dashboard.tsx) so they bypass the clarification gate instead of triggering it - Tighten snake_case tech reference regex to require 3+ char segments, reducing false positives on short English words (e.g. "of_the")
JeremyDev87
left a comment
There was a problem hiding this comment.
Review: APPROVE (Cycle 2/3)
CI Status: PASS
- e2e-plugin-docker ✅
- e2e-plugin-hooks 3.11 ✅
- e2e-plugin-hooks 3.12 ✅
- install-dependencies: pending (non-blocking)
Previous Issues Resolution:
- CRITICAL (e2e tests): ✅ Fixed — 9 test prompts updated with file references (login.ts, auth.ts, dashboard.tsx) to bypass clarification gate
- MEDIUM (questionBudget): ✅ Already implemented —
evaluate_clarification_standalone()acceptsquestion_budgetparam withDEFAULT_QUESTION_BUDGET = 3 - LOW (regex): ✅ Fixed — snake_case pattern tightened to
{3,}char segments
Issues Found: 0 critical, 0 high
Recommendation: APPROVE
JeremyDev87
left a comment
There was a problem hiding this comment.
Review: APPROVE (cycle 2/3)
CI Status: PASS
All 28 checks pass including e2e-plugin-docker, e2e-plugin-hooks (3.11, 3.12).
Previous Issues — All Resolved:
-
[critical] E2E tests: ✅ All test prompts now include file references (
login.ts,auth.ts,dashboard.tsx) so the clarification gate correctly identifies them as specific. 9 previously-failing tests now pass. -
[medium] Standalone questionBudget: ✅
evaluate_clarification_standalone()now acceptsquestion_budgetparameter withDEFAULT_QUESTION_BUDGET=3. Budget decrements (remaining = budget - 1) and is included in the directive.build_instructions()also passesquestion_budgetthrough. MCP/standalone parity achieved. -
[low] snake_case regex: ✅ Pattern updated from
[a-z]+_[a-z]to[a-z]{3,}_[a-z]{3,}— requires 3+ character segments on both sides, eliminating false positives likeof_the.
Spec Compliance (#1423):
| Criterion | Status |
|---|---|
| Ambiguous PLAN/AUTO → clarification-first | ✅ |
| Asks exactly one highest-value question | ✅ |
| questionBudget decrements across rounds | ✅ (MCP + standalone) |
| "just do it" bypass works | ✅ |
| Tests cover MCP and standalone paths | ✅ (7 MCP + 12 standalone unit + e2e all pass) |
Recommendation: APPROVE
All 3 review issues resolved. CI green. Spec fully satisfied.
Summary
clarificationNeeded=true, parse_mode now overrides theinstructionsfield with a clarification-first directive that tells the AI to ask exactly one question and STOP before planningevaluate_clarification_standalone()inmode_engine.pymirroring the MCP server's heuristics (override phrases, vague intent, tech references, prompt length) so standalone mode also enforces question-first behavior for PLAN/AUTObuild_instructions()so the standalone gate can evaluate itTest plan
yarn workspace codingbuddy exec tsc --noEmit— type-check passesyarn workspace codingbuddy test -- --run— 244 files, 6171 tests passpython3 -m pytest test_mode_engine.py -v— 60 tests passyarn prettier --write .— formattedCloses #1423