You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PACT's current teachback protocol requires dispatched specialists to send a teachback (a restatement of their understanding of the assigned task) before beginning implementation. In practice, the protocol is advisory, not blocking: the teammate sends the teachback and proceeds immediately, without waiting for the orchestrator to approve or correct.
This creates a well-documented failure mode: rote-ritual teachback. When there is no consequence for sending a shallow teachback, and no consequence for proceeding without orchestrator feedback, teammates treat the teachback as a compliance checkbox. The orchestrator is supposed to validate teachbacks against the dispatch prompt, but the validation window is short (the teammate is already working) and orchestrators routinely skip the validation entirely. The result: misunderstandings disguised as agreement go undetected until TEST phase or peer review, at which point the wasted work is significant.
The existing handoff_gate hook demonstrates that mechanical enforcement of a gate is effective — teammates cannot bypass it by intent or accident. A similar mechanism for teachback would eliminate the discretion gap at its source.
Current state
What exists
pact-plugin/hooks/shared/variety_scorer.py — deterministic variety scoring module with canonical thresholds:
validate_dimension(value, name) — enforces int ∈ [1, 4]
score_variety(novelty, scope, uncertainty, risk) -> int — returns sum ∈ [4, 16]
LEARNING_II_MIN_MATCHES = 5 and LEARNING_II_MAX_BUMP = 1 — Learning II parameters baked in
Feature-level variety scoring — formalized in /PACT:orchestrate, /PACT:comPACT, /PACT:plan-mode command entry points. The feature task can carry metadata.variety_score (convention, not enforced).
Learning II calibration — secretary gathers calibration metrics during HANDOFF processing, records per-domain drift in pact-memory. Scoring accuracy improves over time via deutero-learning feedback loop.
handoff_gate hook — mechanical enforcement precedent. Teammates cannot escape HANDOFF requirements by ignoring them; the hook intercepts at the tool layer.
What's missing
Per-agent-task variety scoring — feature tasks can carry metadata.variety_score, but agent sub-tasks do not. There is no protocol requiring it, no tooling validating it, and no mechanism for the agent to read their own score and act on it.
Programmatic use of variety_scorer.py — the module is deterministic and ready, but orchestrators typically hand-compute variety scores and write the final number into task metadata without calling score_variety(). This means:
Per-dimension values (novelty, scope, uncertainty, risk) are not captured in metadata — only the sum
Learning II calibration records cannot surface per-dimension drift signals
The module's existence creates the appearance of enforcement without the reality
Teachback gate — no teachback_gate hook exists. The protocol relies on orchestrator discipline, which is exactly the discretion that produces the rote-ritual failure mode.
Proposal: variety-tiered blocking teachback
Locked design decisions
Decision
Choice
Rationale
Blocking threshold
variety score ≥ 7 (Medium+)
Matches the existing mandatory-auditor-dispatch threshold. Tasks serious enough for concurrent audit are serious enough for bilateral teachback verification.
Below threshold
variety score 4-6 (Low)
Teachback remains advisory. Preserves throughput for simple, well-understood tasks where wasted-work cost < stall cost.
Source of truth
variety_scorer.py — new teachback_mode_for_score() and gates_for_score() functions
Single source of truth for "what gates fire at what score." No orchestrator discretion.
Per-agent-task scoring
Required — each agent task must carry metadata.variety_score AND metadata.variety_dimensions at TaskCreate time
Gates read their own task's score, not the feature's. Decomposed sub-tasks can have different complexity than the parent feature.
Gate enforcement
New teachback_gate hook mirroring handoff_gate
Mechanical enforcement via tool interception, not orchestrator discipline.
State machine
teachback_pending → teachback_under_review → active OR teachback_correcting
Explicit states visible to both teammate and orchestrator via task metadata.
Timeout handling
If no orchestrator response within N minutes, emit algedonic ALERT (META-BLOCK: orchestrator unreachable)
Prevents indefinite stalls if orchestrator is unresponsive.
Rollout mode
Ship behind advisory-warning mode for 1-2 sessions, then flip to hard-block
Calibration period allows detection of false positives before enforcement.
Orchestrator discipline
Orchestrators shell out to variety_scorer.score_variety() at dispatch time and capture dimensions in metadata, instead of hand-computing
Closes the discretion gap between "score exists" and "score is computed mechanically".
Implementation scope
1. pact-plugin/hooks/shared/variety_scorer.py (~40 new lines)
Add three functions:
ORCHESTRATE_MIN=7# new constant mirroring COMPACT_MAX+1defteachback_mode_for_score(score: int) ->str:
"""Returns 'blocking' for score >= ORCHESTRATE_MIN, 'advisory' otherwise."""validate_score(score)
return"blocking"ifscore>=ORCHESTRATE_MINelse"advisory"defauditor_required_for_score(score: int) ->bool:
"""Returns True when mandatory auditor dispatch applies (score >= 7)."""validate_score(score)
returnscore>=ORCHESTRATE_MINdefgates_for_score(score: int) ->dict[str, Any]:
"""Canonical gate configuration for a given variety score."""validate_score(score)
return {
"teachback_mode": teachback_mode_for_score(score),
"auditor_required": auditor_required_for_score(score),
"workflow_route": route_workflow(score),
}
Plus tests in pact-plugin/tests/test_variety_scorer.py covering the new functions + threshold boundaries (scores 6 and 7 specifically).
2. New pact-plugin/hooks/teachback_gate.py (~150 lines)
Mirror handoff_gate.py structure. Intercepts Edit, Write, and SendMessage tool calls from teammates whose task has metadata.gates.teachback_mode == "blocking" and state teachback_pending.
Allowed actions while in teachback_pending:
SendMessage with the teachback content (transitions state to teachback_under_review)
TaskGet / TaskList (read-only task inspection)
Read, Glob, Grep (read-only code inspection)
Blocked actions:
Edit, Write (no code changes until approved)
SendMessage of anything other than the teachback itself (no side channel)
Timeout: teachback_pending ≥ N minutes without outbound SendMessage → algedonic ALERT
Tests in pact-plugin/tests/test_teachback_gate.py covering each state transition, timeout, fail-open behavior on hook errors, and the Edit/Write/SendMessage interception matrix.
3. TaskCreate schema validation (~60 lines)
New pre-TaskCreate hook (or wrapper logic in the command files) that enforces the following metadata shape for agent tasks in CODE phase:
Missing variety_score or variety_dimensions → TaskCreate rejects with a clear error message pointing at the gap. gates is auto-populated from variety_scorer.gates_for_score(variety_score) by the hook itself, so the orchestrator doesn't have to remember to compute it.
4. Agent-side state machine (pact-plugin/skills/PACT:pact-agent-teams/SKILL.md + pact-plugin/commands/teammate-bootstrap.md)
Extend the teammate bootstrap procedure:
After teammate reads task metadata, check metadata.gates.teachback_mode
If blocking: send teachback via SendMessage, then STOP (gate will prevent further action until teachback_approved arrives)
If advisory: send teachback via SendMessage, proceed immediately (current behavior, unchanged)
5. Orchestrator-side dispatch procedure updates
Update the dispatch sections in:
pact-plugin/commands/orchestrate.md
pact-plugin/commands/comPACT.md
pact-plugin/commands/rePACT.md
pact-plugin/commands/plan-mode.md
to include explicit variety scoring at dispatch time:
# Before each TaskCreate for an agent task, the orchestrator:# 1. Shells out to variety_scorer.score_variety() with dimension assessments# 2. Captures dimensions + score in metadata.variety_dimensions / variety_score# 3. The pre-TaskCreate hook auto-populates metadata.gates
6. Documentation updates
pact-plugin/protocols/pact-variety.md — add a section on "Gate thresholds" documenting the canonical score-to-gate mapping and explaining the shift from feature-level to per-agent-task scoring
pact-plugin/protocols/pact-ct-teachback.md — update the teachback protocol to describe the blocking semantics at variety ≥ 7, the state machine, and the timeout/algedonic behavior
pact-plugin/protocols/pact-s1-autonomy.md — if this protocol references the advisory-teachback model, update to reflect the new gate-based enforcement
7. Rollout phases
Phase 1 — Advisory-warning mode (1-2 sessions)
teachback_gate.py runs but does NOT block — emits a warning when it would have blocked, logs to a "would have blocked" journal event
Calibration period: observe false-positive rate, confirm dimension scoring is accurate, catch edge cases before enforcement
Phase 2 — Hard-block mode (permanent)
Flip the feature flag, teachback_gate.py enforces
Advisory-warning mode remains available via config flag for local debugging
Acceptance criteria
variety_scorer.py exposes teachback_mode_for_score, auditor_required_for_score, gates_for_score with full test coverage including threshold boundaries (6 and 7)
teachback_gate.py mechanically enforces blocking state at variety ≥ 7, passes all state-transition unit tests, and has integration tests proving that Edit/Write/unrelated-SendMessage are blocked while the gate is active
TaskCreate schema validation rejects agent tasks in CODE phase that lack metadata.variety_score or metadata.variety_dimensions, with a clear error message
Teammate bootstrap procedure correctly enters teachback_pending state when metadata.gates.teachback_mode == "blocking" and does not exit until a structured teachback_approved message arrives
Algedonic ALERT fires if teachback_pending exceeds the configured timeout without outbound SendMessage
Orchestrator-side dispatch procedures in the 4 command files call variety_scorer.score_variety() programmatically, capturing dimensions in metadata
pact-variety.md, pact-ct-teachback.md, and any other affected protocols are updated to describe the new behavior
Phase 1 advisory-warning mode ships first, with instrumentation to log would-have-blocked events for calibration
Phase 2 flips enforcement only after 1-2 sessions of clean advisory-warning observations
Motivation (why this matters, not just "why do the work")
The rote-ritual failure mode is not hypothetical. Concrete evidence from recent PACT sessions:
Prose enumeration drift — In PR feat(#366): reduce PACT agent spawn overhead — Phase 1 kernel elimination #390 (issue Add bounded ring buffer log for session_init R3 malformed-stdin failures #399 ring buffer cycle), a backend-coder's HANDOFF described a classification cascade as having "4 branches" when the actual code had 6. The auditor re-used the count from the HANDOFF without re-reading the code. The orchestrator copied the count into the commit message. Three sequential observers converged on the same wrong number because each trusted the previous prose summary instead of reading the source. This was corrected after the commit landed, requiring a history rewrite. With blocking teachback on that dispatch, the misalignment would have surfaced when the coder's teachback listed "4 branches" and the orchestrator verified it against the spec — before any code was written.
Orchestrator discretion leakage — In the same session, the orchestrator (me) wrote metadata.variety_score = 7 into task metadata by hand, without calling variety_scorer.score_variety(). The dimension values (2, 2, 1, 2) were assessed mentally and the sum (7) was written directly. This is exactly the "using variety as a lens, not a mechanism" failure mode — the canonical module exists but is not consulted. The Learning II calibration record for this task carries the sum but not the dimensions, which reduces the fidelity of future pattern-matching against the hook_infrastructure domain.
Per-agent-task scoring gap — Feature-level variety scoring is formalized in the workflow commands, but agent sub-tasks inherit their parent's score implicitly. A high-variety feature decomposed into simple sub-tasks still gets the high-variety gates applied to every sub-task, and vice versa. Today's per-task metadata is a free-form dict with no schema enforcement, so variety_score is not load-bearing anywhere outside of calibration.
These are the precise gaps that Option B closes. The variety framework already has a deterministic scoring backbone (variety_scorer.py). The gate-enforcement precedent exists (handoff_gate.py). The Learning II calibration loop is already wired in. What's missing is the policy layer that consumes the backbone and enforces gates at dispatch time.
Not in scope for this issue
Changes to the variety framework itself. Score ranges, thresholds, and dimension semantics stay as documented in pact-variety.md. The Learning II +1 dimension bump mechanism is unchanged.
Flag-based alternative gate metadata. An earlier draft of this proposal considered adding metadata.teachback_mode: "blocking" | "advisory" as a per-task orchestrator-controlled flag, decoupled from variety score. This was rejected because it preserves the discretion gap that creates the rote-ritual failure mode. Variety score is the first-class signal; gate configuration is derived, not set independently.
Review-phase teachback. This issue covers dispatch-time teachback only. A similar analysis may apply to peer-review reviewer assignments, but that's a separate design question.
Backwards compatibility with the advisory model. Per the "no backward compat needed for PACT" convention (sessions are ephemeral, no durable state to migrate), the Phase 2 cutover is a hard flip, not a soft migration.
Context
PACT's current teachback protocol requires dispatched specialists to send a teachback (a restatement of their understanding of the assigned task) before beginning implementation. In practice, the protocol is advisory, not blocking: the teammate sends the teachback and proceeds immediately, without waiting for the orchestrator to approve or correct.
This creates a well-documented failure mode: rote-ritual teachback. When there is no consequence for sending a shallow teachback, and no consequence for proceeding without orchestrator feedback, teammates treat the teachback as a compliance checkbox. The orchestrator is supposed to validate teachbacks against the dispatch prompt, but the validation window is short (the teammate is already working) and orchestrators routinely skip the validation entirely. The result: misunderstandings disguised as agreement go undetected until TEST phase or peer review, at which point the wasted work is significant.
The existing handoff_gate hook demonstrates that mechanical enforcement of a gate is effective — teammates cannot bypass it by intent or accident. A similar mechanism for teachback would eliminate the discretion gap at its source.
Current state
What exists
pact-plugin/hooks/shared/variety_scorer.py— deterministic variety scoring module with canonical thresholds:validate_dimension(value, name)— enforces int ∈ [1, 4]score_variety(novelty, scope, uncertainty, risk) -> int— returns sum ∈ [4, 16]route_workflow(score) -> str— maps score tocomPACT/orchestrate/plan-mode/research-spikeLEARNING_II_MIN_MATCHES = 5andLEARNING_II_MAX_BUMP = 1— Learning II parameters baked inFeature-level variety scoring — formalized in
/PACT:orchestrate,/PACT:comPACT,/PACT:plan-modecommand entry points. The feature task can carrymetadata.variety_score(convention, not enforced).Learning II calibration — secretary gathers calibration metrics during HANDOFF processing, records per-domain drift in
pact-memory. Scoring accuracy improves over time via deutero-learning feedback loop.handoff_gatehook — mechanical enforcement precedent. Teammates cannot escape HANDOFF requirements by ignoring them; the hook intercepts at the tool layer.What's missing
Per-agent-task variety scoring — feature tasks can carry
metadata.variety_score, but agent sub-tasks do not. There is no protocol requiring it, no tooling validating it, and no mechanism for the agent to read their own score and act on it.Programmatic use of
variety_scorer.py— the module is deterministic and ready, but orchestrators typically hand-compute variety scores and write the final number into task metadata without callingscore_variety(). This means:Teachback gate — no
teachback_gatehook exists. The protocol relies on orchestrator discipline, which is exactly the discretion that produces the rote-ritual failure mode.Proposal: variety-tiered blocking teachback
Locked design decisions
variety_scorer.py— newteachback_mode_for_score()andgates_for_score()functionsmetadata.variety_scoreANDmetadata.variety_dimensionsat TaskCreate timeteachback_gatehook mirroringhandoff_gateteachback_pending→teachback_under_review→activeORteachback_correctingMETA-BLOCK: orchestrator unreachable)variety_scorer.score_variety()at dispatch time and capture dimensions in metadata, instead of hand-computingImplementation scope
1.
pact-plugin/hooks/shared/variety_scorer.py(~40 new lines)Add three functions:
Plus tests in
pact-plugin/tests/test_variety_scorer.pycovering the new functions + threshold boundaries (scores 6 and 7 specifically).2. New
pact-plugin/hooks/teachback_gate.py(~150 lines)Mirror
handoff_gate.pystructure. InterceptsEdit,Write, andSendMessagetool calls from teammates whose task hasmetadata.gates.teachback_mode == "blocking"and stateteachback_pending.Allowed actions while in
teachback_pending:SendMessagewith the teachback content (transitions state toteachback_under_review)TaskGet/TaskList(read-only task inspection)Read,Glob,Grep(read-only code inspection)Blocked actions:
Edit,Write(no code changes until approved)SendMessageof anything other than the teachback itself (no side channel)State transitions (recorded in task metadata):
teachback_pending→teachback_under_review(teammate sends teachback)teachback_under_review→active(orchestrator sends structuredteachback_approvedmessage)teachback_under_review→teachback_correcting(orchestrator sendsteachback_correctionswith specific items)teachback_correcting→teachback_under_review(teammate sends revised teachback)teachback_pending≥ N minutes without outbound SendMessage → algedonic ALERTTests in
pact-plugin/tests/test_teachback_gate.pycovering each state transition, timeout, fail-open behavior on hook errors, and the Edit/Write/SendMessage interception matrix.3. TaskCreate schema validation (~60 lines)
New pre-TaskCreate hook (or wrapper logic in the command files) that enforces the following metadata shape for agent tasks in CODE phase:
{ "variety_score": <int 4-16>, "variety_dimensions": { "novelty": <int 1-4>, "scope": <int 1-4>, "uncertainty": <int 1-4>, "risk": <int 1-4> }, "gates": { "teachback_mode": "blocking" | "advisory", "auditor_required": <bool>, "workflow_route": <str> } }Missing
variety_scoreorvariety_dimensions→ TaskCreate rejects with a clear error message pointing at the gap.gatesis auto-populated fromvariety_scorer.gates_for_score(variety_score)by the hook itself, so the orchestrator doesn't have to remember to compute it.4. Agent-side state machine (
pact-plugin/skills/PACT:pact-agent-teams/SKILL.md+pact-plugin/commands/teammate-bootstrap.md)Extend the teammate bootstrap procedure:
metadata.gates.teachback_modeblocking: send teachback via SendMessage, then STOP (gate will prevent further action untilteachback_approvedarrives)advisory: send teachback via SendMessage, proceed immediately (current behavior, unchanged)5. Orchestrator-side dispatch procedure updates
Update the dispatch sections in:
pact-plugin/commands/orchestrate.mdpact-plugin/commands/comPACT.mdpact-plugin/commands/rePACT.mdpact-plugin/commands/plan-mode.mdto include explicit variety scoring at dispatch time:
6. Documentation updates
pact-plugin/protocols/pact-variety.md— add a section on "Gate thresholds" documenting the canonical score-to-gate mapping and explaining the shift from feature-level to per-agent-task scoringpact-plugin/protocols/pact-ct-teachback.md— update the teachback protocol to describe the blocking semantics at variety ≥ 7, the state machine, and the timeout/algedonic behaviorpact-plugin/protocols/pact-s1-autonomy.md— if this protocol references the advisory-teachback model, update to reflect the new gate-based enforcement7. Rollout phases
Phase 1 — Advisory-warning mode (1-2 sessions)
teachback_gate.pyruns but does NOT block — emits a warning when it would have blocked, logs to a "would have blocked" journal eventPhase 2 — Hard-block mode (permanent)
teachback_gate.pyenforcesAcceptance criteria
variety_scorer.pyexposesteachback_mode_for_score,auditor_required_for_score,gates_for_scorewith full test coverage including threshold boundaries (6 and 7)teachback_gate.pymechanically enforces blocking state at variety ≥ 7, passes all state-transition unit tests, and has integration tests proving that Edit/Write/unrelated-SendMessage are blocked while the gate is activemetadata.variety_scoreormetadata.variety_dimensions, with a clear error messageteachback_pendingstate whenmetadata.gates.teachback_mode == "blocking"and does not exit until a structuredteachback_approvedmessage arrivesteachback_pendingexceeds the configured timeout without outbound SendMessagevariety_scorer.score_variety()programmatically, capturing dimensions in metadatapact-variety.md,pact-ct-teachback.md, and any other affected protocols are updated to describe the new behaviorMotivation (why this matters, not just "why do the work")
The rote-ritual failure mode is not hypothetical. Concrete evidence from recent PACT sessions:
Prose enumeration drift — In PR feat(#366): reduce PACT agent spawn overhead — Phase 1 kernel elimination #390 (issue Add bounded ring buffer log for session_init R3 malformed-stdin failures #399 ring buffer cycle), a backend-coder's HANDOFF described a classification cascade as having "4 branches" when the actual code had 6. The auditor re-used the count from the HANDOFF without re-reading the code. The orchestrator copied the count into the commit message. Three sequential observers converged on the same wrong number because each trusted the previous prose summary instead of reading the source. This was corrected after the commit landed, requiring a history rewrite. With blocking teachback on that dispatch, the misalignment would have surfaced when the coder's teachback listed "4 branches" and the orchestrator verified it against the spec — before any code was written.
Orchestrator discretion leakage — In the same session, the orchestrator (me) wrote
metadata.variety_score = 7into task metadata by hand, without callingvariety_scorer.score_variety(). The dimension values (2, 2, 1, 2) were assessed mentally and the sum (7) was written directly. This is exactly the "using variety as a lens, not a mechanism" failure mode — the canonical module exists but is not consulted. The Learning II calibration record for this task carries the sum but not the dimensions, which reduces the fidelity of future pattern-matching against the hook_infrastructure domain.Per-agent-task scoring gap — Feature-level variety scoring is formalized in the workflow commands, but agent sub-tasks inherit their parent's score implicitly. A high-variety feature decomposed into simple sub-tasks still gets the high-variety gates applied to every sub-task, and vice versa. Today's per-task metadata is a free-form dict with no schema enforcement, so
variety_scoreis not load-bearing anywhere outside of calibration.These are the precise gaps that Option B closes. The variety framework already has a deterministic scoring backbone (
variety_scorer.py). The gate-enforcement precedent exists (handoff_gate.py). The Learning II calibration loop is already wired in. What's missing is the policy layer that consumes the backbone and enforces gates at dispatch time.Not in scope for this issue
pact-variety.md. The Learning II+1 dimension bumpmechanism is unchanged.metadata.teachback_mode: "blocking" | "advisory"as a per-task orchestrator-controlled flag, decoupled from variety score. This was rejected because it preserves the discretion gap that creates the rote-ritual failure mode. Variety score is the first-class signal; gate configuration is derived, not set independently.References
pact-plugin/hooks/shared/variety_scorer.py— existing deterministic scoring backbonepact-plugin/hooks/handoff_gate.py— mechanical enforcement precedentpact-plugin/protocols/pact-variety.md— variety framework documentationpact-plugin/protocols/pact-ct-teachback.md— current (advisory) teachback protocol