Skip to content

feat(mcp/plugin): enforce clarification gate before planning (#1423)#1425

Merged
JeremyDev87 merged 2 commits into
masterfrom
feat/clarification-gate-enforcement-1423
Apr 7, 2026
Merged

feat(mcp/plugin): enforce clarification gate before planning (#1423)#1425
JeremyDev87 merged 2 commits into
masterfrom
feat/clarification-gate-enforcement-1423

Conversation

@JeremyDev87

Copy link
Copy Markdown
Owner

Summary

  • MCP path: When clarificationNeeded=true, parse_mode now overrides the instructions field with a clarification-first directive that tells the AI to ask exactly one question and STOP before planning
  • Standalone path: Added evaluate_clarification_standalone() in mode_engine.py mirroring the MCP server's heuristics (override phrases, vague intent, tech references, prompt length) so standalone mode also enforces question-first behavior for PLAN/AUTO
  • User prompt hook: Passes the raw prompt to build_instructions() so the standalone gate can evaluate it
  • Tests: 7 new regression tests for MCP path + 12 new tests for standalone path — all passing (6171 + 60)

Test plan

  • yarn workspace codingbuddy exec tsc --noEmit — type-check passes
  • yarn workspace codingbuddy test -- --run — 244 files, 6171 tests pass
  • python3 -m pytest test_mode_engine.py -v — 60 tests pass
  • yarn prettier --write . — formatted
  • CI checks on PR

Closes #1423

- Add clarification-first instruction override in parse_mode response
  when clarificationNeeded=true (MCP path)
- Add standalone clarification gate in mode_engine.py mirroring MCP
  server heuristics for PLAN/AUTO modes (plugin standalone path)
- Pass user prompt to ModeEngine.build_instructions() for evaluation
- Add 7 regression tests in mode.handler.spec.ts verifying directive
  is applied/skipped correctly
- Add 12 tests in test_mode_engine.py for standalone gate behavior

Closes #1423
@JeremyDev87 JeremyDev87 added feat mcp-server apps/mcp-server plugin packages/claude-code-plugin P1 Priority 1: First Impression labels Apr 7, 2026
@vercel

vercel Bot commented Apr 7, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
codingbuddy-landing Ready Ready Preview, Comment Apr 7, 2026 9:20am

@JeremyDev87 JeremyDev87 left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: CHANGES_REQUESTED

CI Status: FAIL

9 e2e tests fail in test_user_prompt_submit_e2e.py — all directly caused by this PR's new clarification gate intercepting short/ambiguous PLAN/AUTO prompts that existing tests send.

Issues Found:

  • [critical] E2E tests not updated for new behavior: The 9 failing tests send prompts like PLAN design auth feature (19 chars after keyword strip, < MIN_PROMPT_LENGTH=20) or Korean/Japanese/Chinese equivalents. The clarification gate correctly identifies them as ambiguous, but the tests still assert # Mode: PLAN instead of the new 🔴 CLARIFICATION REQUIRED output. Fix: Either (a) update test prompts to be specific enough (add file paths or function refs so they pass the gate), or (b) add new e2e test cases asserting the clarification behavior and update existing prompts.

    Affected tests:

    • TestEnglishModeKeywords[PLAN design auth feature-PLAN]
    • TestKoreanModeKeywords[계획 인증 기능 설계-PLAN], [자동 대시보드 구현-AUTO]
    • TestJapaneseModeKeywords[計画 認証機能の設計-PLAN], [自動 ダッシュボード実装-AUTO]
    • TestChineseModeKeywords[计划 设计认证功能-PLAN], [自动 实现仪表板-AUTO]
    • TestContextInjection::test_context_contains_agent_name, ::test_case_insensitive_detection
  • [medium] Standalone path lacks questionBudget tracking: The MCP path includes budget in the directive (Remaining question budget: N, question_budget=N), but evaluate_clarification_standalone() has no budget parameter and its directive omits budget info. This creates a behavior gap between MCP and standalone modes, partially failing the spec requirement "questionBudget decrements across rounds."

  • [low] Overly broad _TECH_REFERENCE_PATTERNS snake_case regex: Pattern r"\b[a-z]+_[a-z][a-z0-9_]*\b" matches any snake_case token (e.g., of_the, ice_cream). This could cause false negatives where non-tech prompts skip clarification. Consider narrowing to min 2 segments or requiring programming context.

Code Quality: Good

  • No any types, no unused imports, no dead code
  • Immutable approach in MCP handler (spread override, no mutation)
  • Clean JSDoc/docstrings, good type annotations
  • Consistent directive format between MCP and standalone (minus budget)
  • 7 MCP + 12 standalone unit tests well-structured

Spec Compliance (#1423):

Criterion Status
Ambiguous PLAN/AUTO → clarification-first
Asks exactly one highest-value question
questionBudget decrements across rounds ⚠️ MCP only, standalone missing
"just do it" bypass works
Tests cover MCP and standalone paths ⚠️ Unit tests pass, e2e broken

Recommendation: REQUEST_CHANGES

Fix the 9 failing e2e tests (critical, blocking CI). Address the standalone questionBudget gap (medium). The core implementation logic is solid.

#1423)

- Update 9 e2e test prompts to include file references (login.ts, auth.ts,
  dashboard.tsx) so they bypass the clarification gate instead of triggering it
- Tighten snake_case tech reference regex to require 3+ char segments,
  reducing false positives on short English words (e.g. "of_the")

@JeremyDev87 JeremyDev87 left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: APPROVE (Cycle 2/3)

CI Status: PASS

  • e2e-plugin-docker ✅
  • e2e-plugin-hooks 3.11 ✅
  • e2e-plugin-hooks 3.12 ✅
  • install-dependencies: pending (non-blocking)

Previous Issues Resolution:

  • CRITICAL (e2e tests): ✅ Fixed — 9 test prompts updated with file references (login.ts, auth.ts, dashboard.tsx) to bypass clarification gate
  • MEDIUM (questionBudget): ✅ Already implemented — evaluate_clarification_standalone() accepts question_budget param with DEFAULT_QUESTION_BUDGET = 3
  • LOW (regex): ✅ Fixed — snake_case pattern tightened to {3,} char segments

Issues Found: 0 critical, 0 high

Recommendation: APPROVE

@JeremyDev87 JeremyDev87 left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: APPROVE (cycle 2/3)

CI Status: PASS

All 28 checks pass including e2e-plugin-docker, e2e-plugin-hooks (3.11, 3.12).

Previous Issues — All Resolved:

  • [critical] E2E tests: ✅ All test prompts now include file references (login.ts, auth.ts, dashboard.tsx) so the clarification gate correctly identifies them as specific. 9 previously-failing tests now pass.

  • [medium] Standalone questionBudget: ✅ evaluate_clarification_standalone() now accepts question_budget parameter with DEFAULT_QUESTION_BUDGET=3. Budget decrements (remaining = budget - 1) and is included in the directive. build_instructions() also passes question_budget through. MCP/standalone parity achieved.

  • [low] snake_case regex: ✅ Pattern updated from [a-z]+_[a-z] to [a-z]{3,}_[a-z]{3,} — requires 3+ character segments on both sides, eliminating false positives like of_the.

Spec Compliance (#1423):

Criterion Status
Ambiguous PLAN/AUTO → clarification-first
Asks exactly one highest-value question
questionBudget decrements across rounds ✅ (MCP + standalone)
"just do it" bypass works
Tests cover MCP and standalone paths ✅ (7 MCP + 12 standalone unit + e2e all pass)

Recommendation: APPROVE

All 3 review issues resolved. CI green. Spec fully satisfied.

@JeremyDev87 JeremyDev87 self-assigned this Apr 7, 2026
@JeremyDev87 JeremyDev87 merged commit 7006180 into master Apr 7, 2026
29 checks passed
@JeremyDev87 JeremyDev87 deleted the feat/clarification-gate-enforcement-1423 branch April 7, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat mcp-server apps/mcp-server P1 Priority 1: First Impression plugin packages/claude-code-plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(mcp/plugin): make Clarification Gate stop PLAN before planning when more questions are needed

1 participant