Skip to content

merge-guard mint ergonomics: silent no-mint on format mismatch + multi-prompt approval (UX hardening) #1052

Description

@michael-wojcik

Summary

The merge-guard binding is sound and dual-independently verified (#1042 live-probe, convergent PASS). This issue is about the separate, real problem the verification surfaced in practice: the mint-side approval UX fails silently and is hard to get right. A correct, canonical approval template does exist (pact-plugin/commands/peer-review.md:437-442), but any deviation from it mints no token, gives no diagnostic, and just re-prompts — so a single merge can take several approval prompts and still not land, with nothing telling the approver why.

Scope guardrail: this is a MINT-SIDE ERGONOMICS issue. Do not weaken the set-equality binding (the #1042 fix) to resolve it. The goal is to make legitimate approval reliable and self-correcting, not to loosen what gets bound.

Evidence (live, this session)

While merging a trivial 6-line docs PR after closing #1042, the operator was prompted to approve three times before it landed:

  1. Decision prompt (merge vs. leave) — approved.
  2. Focused approval, command in the option label without backticks (Approve: gh pr merge <N> --squash) — silently minted nothingNO_TOKEN.
  3. Focused approval, command in the label with backticks (Approve \gh pr merge ``) — also failed in an earlier variant; the merge only succeeded once the command was correctly anchored.

Root of the fumble: the orchestrator improvised command-in-label instead of using the documented peer-review.md form (short "Yes, merge" label + command in the description). The guard's refuse message —

Merge/close/force-push/branch-delete requires user approval. Re-approve via AskUserQuestion with the literal command embedded in the selected option ...

— never mentions the canonical template, the backtick requirement, the no-extra-flags rule, or the decline-veto wordset, so it does not steer the approver back to the form that works. Even the orchestrator that had just certified this hook got it wrong twice.

Sister friction (same session, #1037 class): the is_dangerous_command self-block fired on benign read-only diagnostics three times — a security-engineer grep | … and an inline python -c that merely quoted gh pr merge, and a lead grep "gh pr merge" | head. Each was a non-destructive read mis-flagged as a compound destructive op. Same false-positive family.

Root cause — current mint contract (empirically reproduced against installed v4.4.44)

The canonical, documented form (peer-review.md:437-442) mints reliably: SINGLE-select, one question, a short affirmative label "Yes, merge", and the command in that option's description in backticks (Run `gh pr merge <N>` to merge this PR), with <N> the only number anywhere.

Deviations that silently produce NO-MINT (each reproduced on the real merge_guard_post):

  1. Command in the LABEL instead of the description. The mint resolves the selected option by byte-equality between answers[question] and the option label; a genuine click on a long command-as-label can return an answer that doesn't byte-round-trip (truncation/normalization) → no match → no mint. (Backticks are also required for op/target extraction.) The description-based template sidesteps this entirely.
  2. multiSelect: true → never mints (multiSelect is refused as a mint source, and the veto then scans every option).
  3. A decline/defer word in the SELECTED affirmative option's own label or description (the merge_guard: decline-veto false-positives on command-literal tokens (--no-* flags, branch names) #1049 wordset, whole-word, case-insensitive): cancel|abort|reject|decline|deny|stop|no|nope|never|skip|don't|later|wait|hold|pause|reviewing|review|postpone|defer|pending|once|after + phrases not yet / review first / hold off / for now. (An unselected "Cancel" sibling does not veto in single-select — confirmed.) Note --no-verify as an approval self-vetoes because the substring no matches — whole-word matching is insufficient there.

Why it matters

A security control this easy to get silently wrong trains operators to fight it, work around it, or disable it — which is worse than a slightly-weaker-but-usable gate. The binding being correct doesn't help if legitimate approval is unreliable and undiagnosable.

Proposed improvements (acceptance criteria)

  • Self-teaching refuse message. On a NO_TOKEN/mismatch denial, the message states (or links) the canonical peer-review.md approval template: single-select, short "Yes, merge" label, command in the description in backticks, <N> the only number.
  • Non-silent mint failure. When an AskUserQuestion approval was issued but did not mint, emit a diagnostic naming why (command in label not description / byte-mismatch / multiSelect / decline-veto hit / multiple (op,target) / no options) instead of nothing.
  • One prompt per merge. Collapse decision + mint into a single approval so a normal merge is one prompt, not a re-prompt loop.
  • Co-resolve the two sibling prose-substring false-positives (each has its own issue + proposed fix; same structure-over-prose root): merge_guard: decline-veto false-positives on command-literal tokens (--no-* flags, branch names) #1049 — the decline-veto vetoing a mint because a veto-word substring (no in --no-verify, branch names) appears in the approval command literal (scope to the selected option; move past substring to whole-structure matching). merge-guard: is_dangerous_command false-positive — gh issue comment body quoting a dangerous command is rejected (substring-scan brittleness) #1037is_dangerous_command's substring scan blocking a benign command line that merely quotes a dangerous literal — a gh issue comment / git commit body, or a grep for gh pr merge (classify by the leading operation verb instead of substring-scanning the whole line). Both point the same way as this issue: structure over prose.
  • (Optional) Copy-pasteable helper. Surface the canonical approval block (or an allowlisted exact-command path) so the approver isn't reverse-engineering the parser.

Context / provenance

Surfaced during the #1042 post-install live-probe. The probe verified the binding thoroughly via the real entry points, but both verifiers disclosed the genuine user-click mint seam could only be tested with a faithful synthetic frame (the real token is approver-session-local) — so the genuine-click path was never exercised end-to-end until this session, which is where the ergonomics gap surfaced.

Related: #1042 (binding — verified sound, closed), #1037 (is_dangerous_command substring-scan false-positive), #1049 (decline-veto false-positive on command-literal tokens), #1031 / #1032 (auth-token mint/read symmetry). Canonical approval template: pact-plugin/commands/peer-review.md:437-442.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions