Skip to content

Commit 7939190

Browse files
feat(frameworks): add dimension combinations, best practices, and compliance evaluation
Each framework definition now supports three optional standard blocks: - `dimension_combinations` — named synergies, prerequisites, and tensions between dimensions, each with guidance an agent can inject to steer collaboration - `best_practices` — actionable per-dimension recommendations with antipatterns and observable signals for detecting when a practice is already followed - `evaluation_criteria` — machine-checkable keyword-signal criteria with weights and pass thresholds, enabling scored compliance assessment Engine: - Add `evaluateCompliance(text, framework)` — scans collaboration text against a framework's evaluation_criteria and returns a 0-100 weighted score, per-criterion pass/fail, and matched keywords MCP: - Add `evaluate_compliance` tool — agents can call this mid-session or at end to check framework adherence and identify which dimensions need attention - Update `get_framework_detail` description to surface the new blocks 4D Framework YAML (v1.0.0 -> v1.1.0): - 4 dimension_combinations (del+des synergy, des+dis prerequisite, dis+dil synergy, del+dil tension) - 8 best_practices across all four dimensions plus a cycle-close practice - 4 evaluation_criteria (one per dimension, weights sum to 1.0) Tests: 10 new test cases covering schema validation, combination/criterion constraints, and all evaluateCompliance edge cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 5166eb0 commit 7939190

10 files changed

Lines changed: 911 additions & 23 deletions

File tree

frameworks/4d-framework.yaml

Lines changed: 262 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
id: 4d-framework
22
name: AI Fluency 4D Framework
3-
version: 1.0.0
3+
version: 1.1.0
44
contributor: Dakan & Feller
55
description: >
66
Four dimensions of good human-AI collaboration. Originally described
@@ -29,3 +29,264 @@ tags:
2929
- ai-fluency
3030
- 4d
3131
reference: https://fluently.ctrl6.com
32+
33+
# ── How dimensions interact ────────────────────────────────────────────────────
34+
# Synergies, prerequisites, and tensions between dimensions.
35+
# Agents can use these to diagnose imbalances and steer the collaboration
36+
# back on track when a dimension is weak or misaligned.
37+
38+
dimension_combinations:
39+
- id: del-des-synergy
40+
dimensions: [delegation, description]
41+
type: synergy
42+
label: Autonomy drives context depth
43+
description: >
44+
The more autonomy delegated to AI, the richer the description must be to
45+
constrain the solution space. Low-delegation tasks tolerate loose framing;
46+
high-delegation tasks require explicit constraints and success criteria to
47+
prevent the AI from filling gaps with unsafe assumptions.
48+
guidance: >
49+
When delegation is set to "automated", verify that the description covers
50+
scope boundaries, constraints, edge cases, and success criteria before
51+
handing off. If the description is thin, reduce delegation to "augmented"
52+
until the context is complete.
53+
54+
- id: des-dis-prerequisite
55+
dimensions: [description, discernment]
56+
type: prerequisite
57+
label: Context enables evaluation
58+
description: >
59+
A well-formed description makes discernment possible: output can only be
60+
evaluated against criteria that were stated upfront. Without a clear
61+
description, discernment becomes arbitrary and inconsistent.
62+
guidance: >
63+
Embed explicit acceptance criteria or evaluation questions directly in the
64+
description. If you cannot articulate what good output looks like before
65+
the AI responds, the description is not complete enough to proceed.
66+
67+
- id: dis-dil-synergy
68+
dimensions: [discernment, diligence]
69+
type: synergy
70+
label: Evaluation quality anchors accountability
71+
description: >
72+
Discernment feeds diligence directly: shallow evaluation criteria make
73+
accountability meaningless because sign-off happens without genuine review,
74+
creating the appearance of oversight without its substance.
75+
guidance: >
76+
Define your discernment checklist before committing to a diligence workflow.
77+
Accountability assignments without evaluation criteria are ceremonial, not real.
78+
79+
- id: del-dil-tension
80+
dimensions: [delegation, diligence]
81+
type: tension
82+
label: Autonomy vs. accountability
83+
description: >
84+
High AI autonomy reduces the human review surface, which naturally erodes
85+
diligence unless explicit checkpoints are maintained. The more the AI acts
86+
independently, the more deliberate the accountability structure must be.
87+
guidance: >
88+
When delegation is "automated", introduce mandatory diligence gates such as
89+
output spot-checks, audit logs, or periodic human-in-the-loop reviews.
90+
A purely automated pipeline with no diligence gate is a liability, not
91+
an efficiency gain.
92+
93+
# ── Actionable recommendations ────────────────────────────────────────────────
94+
# Best practices an agent can surface to steer a collaboration toward better
95+
# outcomes. Each includes a signal so the agent can detect whether the
96+
# practice is already being followed before injecting guidance.
97+
98+
best_practices:
99+
- id: bp-del-first
100+
title: Establish autonomy level before framing the task
101+
dimension: delegation
102+
description: >
103+
Decide whether the AI acts autonomously, augments human work, or operates
104+
under close supervision before writing the task prompt. This choice shapes
105+
every downstream decision about context depth, review rigor, and accountability.
106+
antipattern: >
107+
Jumping directly to task framing without agreeing on who makes the final
108+
decision, leading to implicit and often misaligned autonomy expectations
109+
that only surface when the output is already wrong.
110+
signal: Opening prompt contains an explicit autonomy level or role assignment for the AI.
111+
112+
- id: bp-del-risk-match
113+
title: Match delegation level to task risk
114+
dimension: delegation
115+
description: >
116+
Reserve "automated" delegation for tasks where mistakes are cheap to catch
117+
and correct. Use "augmented" when errors have downstream consequences, and
118+
"supervised" when the domain is high-stakes or the AI's reliability is unknown.
119+
antipattern: >
120+
Treating all tasks as equally safe to automate because a previous task went
121+
well. Risk calibration must be per-task, not per-tool.
122+
signal: Delegation level is justified by a stated risk assessment or consequence estimate.
123+
124+
- id: bp-des-constraints
125+
title: Front-load constraints and success criteria
126+
dimension: description
127+
description: >
128+
State non-negotiable constraints, scope limits, and what "done" looks like
129+
at the start of the description rather than correcting the AI after the fact.
130+
Constraints discovered late cost more to fix than constraints stated early.
131+
antipattern: >
132+
Iterating through multiple AI responses to discover implicit constraints
133+
that should have been stated upfront, wasting cycles and eroding confidence
134+
in the AI's reliability.
135+
signal: Task description includes at least one explicit constraint or acceptance criterion.
136+
137+
- id: bp-des-context
138+
title: Include why, not just what
139+
dimension: description
140+
description: >
141+
Provide the motivation and broader context behind the task, not just the
142+
immediate ask. AI models use context to make better judgment calls at the
143+
edges; without it they default to generic behavior.
144+
antipattern: >
145+
Writing task descriptions as bare commands without context, leading to
146+
technically correct but practically useless outputs that ignore real intent.
147+
signal: Description includes a stated purpose, audience, or motivating context.
148+
149+
- id: bp-dis-criteria
150+
title: Define evaluation criteria before reviewing output
151+
dimension: discernment
152+
description: >
153+
Agree on what good output looks like — and what would disqualify it —
154+
before asking the AI to generate it. This prevents post-hoc rationalisation
155+
of mediocre outputs and keeps evaluation consistent across reviewers.
156+
antipattern: >
157+
Adjusting the criteria to match what the AI produced rather than what was
158+
actually needed, or accepting the first plausible response without any check.
159+
signal: Collaboration sequence includes an explicit review or validation step with stated criteria.
160+
161+
- id: bp-dis-pushback
162+
title: Define when to push back or iterate
163+
dimension: discernment
164+
description: >
165+
Agree upfront on the conditions under which you will reject an AI response
166+
and re-prompt rather than accept a partial result. Without a pushback policy,
167+
reviewers default to accepting whatever the AI produces.
168+
antipattern: >
169+
Treating AI output as a starting point for human editing rather than as a
170+
proposal to be evaluated, blurring the line between AI work and human work.
171+
signal: Collaboration includes a defined condition for rejection or re-prompting.
172+
173+
- id: bp-dil-sign-off
174+
title: Assign a named human accountable for the final output
175+
dimension: diligence
176+
description: >
177+
Every AI-assisted task should end with a named human who takes explicit
178+
responsibility for the output before it is used, shared, or deployed.
179+
antipattern: >
180+
Deploying or sharing AI output without a clear sign-off process, making
181+
accountability diffuse across the team and errors harder to trace.
182+
signal: Collaboration includes an explicit approval step with a named role or person.
183+
184+
- id: bp-cycle-close
185+
title: Close every cycle with a brief reflection
186+
description: >
187+
After completing a collaboration cycle, briefly review whether all four
188+
dimensions were genuinely addressed. Imbalances indicate where the next
189+
cycle should invest effort.
190+
antipattern: >
191+
Treating each AI session as independent without reviewing overall collaboration
192+
quality, leading to systematic blind spots that compound across projects.
193+
signal: Cycle ends with a retrospective, quality review, or explicit dimension balance check.
194+
195+
# ── Machine-checkable compliance criteria ─────────────────────────────────────
196+
# These criteria let the scorer and agents evaluate a collaboration text for
197+
# framework adherence using keyword signals.
198+
# Weights sum to 1.0. pass_threshold = minimum matching signals required.
199+
200+
evaluation_criteria:
201+
- id: eval-del-explicit
202+
dimension: delegation
203+
label: Autonomy level is explicit
204+
description: >
205+
The collaboration includes a clear statement of the AI's autonomy level
206+
(automated, augmented, or supervised) or an equivalent role assignment.
207+
weight: 0.25
208+
signals:
209+
present:
210+
- automated
211+
- augmented
212+
- supervised
213+
- decides
214+
- approves
215+
- oversight
216+
- autonomous
217+
- human-in-the-loop
218+
- review before
219+
- sign off before
220+
pass_threshold: 1
221+
222+
- id: eval-des-rich
223+
dimension: description
224+
label: Task description includes constraints or success criteria
225+
description: >
226+
The task framing contains at least two explicit constraints, scope
227+
boundaries, goals, or acceptance conditions — not just a bare command.
228+
weight: 0.25
229+
signals:
230+
present:
231+
- constraint
232+
- requirement
233+
- must not
234+
- must be
235+
- should not
236+
- should be
237+
- criteria
238+
- success
239+
- goal
240+
- objective
241+
- scope
242+
- out of scope
243+
- acceptable
244+
- definition of done
245+
- expected output
246+
pass_threshold: 2
247+
248+
- id: eval-dis-review
249+
dimension: discernment
250+
label: Output review or validation step is present
251+
description: >
252+
The collaboration includes at least one step where AI output is evaluated
253+
against defined criteria before being accepted or acted upon.
254+
weight: 0.25
255+
signals:
256+
present:
257+
- review
258+
- verify
259+
- validate
260+
- check
261+
- evaluate
262+
- confirm
263+
- test
264+
- assess
265+
- cross-check
266+
- compare
267+
- quality
268+
- accuracy
269+
pass_threshold: 1
270+
271+
- id: eval-dil-accountability
272+
dimension: diligence
273+
label: Accountability is assigned
274+
description: >
275+
The collaboration includes an explicit sign-off, ownership assignment, or
276+
accountability statement for the final output.
277+
weight: 0.25
278+
signals:
279+
present:
280+
- approve
281+
- sign off
282+
- accountable
283+
- responsible
284+
- owner
285+
- lead
286+
- final review
287+
- authorized
288+
- reviewed by
289+
- approved by
290+
- sign-off
291+
- takes ownership
292+
pass_threshold: 1

0 commit comments

Comments
 (0)