diff --git a/plugins/semantic-anchors/plugin.json b/plugins/semantic-anchors/plugin.json index 92d4f19..ac72ef1 100644 --- a/plugins/semantic-anchors/plugin.json +++ b/plugins/semantic-anchors/plugin.json @@ -1,6 +1,6 @@ { "name": "semantic-anchors", - "version": "0.2.0", + "version": "0.3.0", "description": "Semantic anchor skills for translating terminology and installing persistent anchor context into coding agents.", "author": { "name": "LLM Coding", diff --git a/plugins/semantic-anchors/skills/anchor-prior-test/SKILL.md b/plugins/semantic-anchors/skills/anchor-prior-test/SKILL.md new file mode 100644 index 0000000..146a4dc --- /dev/null +++ b/plugins/semantic-anchors/skills/anchor-prior-test/SKILL.md @@ -0,0 +1,47 @@ +--- +name: anchor-prior-test +description: Empirically test whether a candidate term qualifies as a semantic anchor by probing how densely it sits in LLM training data — across several model tiers in a clean room — before proposing it. Use when triaging an anchor proposal, deciding anchor vs contract, or vetting a rename. Produces a verdict against the four criteria (Precise, Rich, Consistent, Attributable), a tier rating, and a ready-to-paste proposal or rejection. +metadata: + author: LLM-Coding + version: "1.0" + source: https://github.com/LLM-Coding/Semantic-Anchors +license: MIT +--- + +# Anchor Prior Test + +Measure whether a term is a strong semantic anchor instead of guessing. This skill turns the manual litmus test in `CONTRIBUTING.adoc` ("ask the LLM what it associates") into a rigorous, multi-model, clean-room procedure with a structured verdict. + +## The principle + +A semantic anchor only delivers leverage if the term is already a **dense, pre-computed prior** in the model's training data. Naming it must reliably trigger the rich concept, the same way for everyone, across models you do not control. This skill measures that density empirically. The reasoning behind it is the article *An Anchor Delivers Only as Far as the Prior Reaches* (route `/training-data-vs-practice`). + +Two facts drive the whole method: + +- **Power tracks density, not merit.** A good, recent, niche method (e.g. "Use-Case 3.0") can be a *weak* anchor; a model will silently substitute the nearest concept it holds rather than admit the gap. Density is what you measure. +- **A weak prior is not a dead end.** It is a candidate for a **contract** (which supplies its own meaning in text) instead of an anchor. The verdict routes the term to the right home. + +## When to use + +- Triaging a `[Anchor Proposal]` issue before accepting it. +- Deciding **anchor vs contract** for new vocabulary. +- Vetting a **rename** — does the new name trigger the same concept the body describes? + +## Procedure + +1. **Frame the candidate.** Write down the exact string a user would type. List the precise/qualified form *and* any ambiguous bare form (e.g. "Morphological Box / Zwicky Box" vs bare "morphological analysis"). Note rival terms a model might confuse it with. +2. **Open a clean room.** Run a *fresh* `claude -p` process — **never a sub-agent** (sub-agents inherit this project's `CLAUDE.md` and memory and will give circular results). See `references/clean-room.md`. +3. **Run the probe battery.** Four probe types across at least two model tiers (weak + strong), with at least two runs of the decisive probe. See `references/probe-battery.md`. +4. **Score.** Map results to the four criteria, prior density, and a tier (★). See `references/scoring.md`. +5. **Emit.** A verdict plus either a ready-to-paste `propose-anchor.yml` activation-test section, or a `rejected-proposals.adoc` entry with the reason. + +## Critical gotchas — do not skip + +- **Sub-agents are contaminated.** They inherit the session's `CLAUDE.md` + memory snapshot. Use a fresh `claude -p` with `--setting-sources ""` from a neutral directory. A running session freezes that context at start-up, so moving files mid-session does *not* help. +- **Recall ≠ execution.** Test both "what is X?" (knowledge) *and* "use X to do Y" (the anchor as instruction). A model can fail to explain a method yet still apply its core move when the move has a dense name. +- **Name ambiguity.** Always probe the bare term too. If a strong model defaults to a different field (e.g. "morphological analysis" → linguistics), the catalog name must carry a disambiguator. +- **n=1 is not evidence.** Run the decisive probe (recognition and the action test) at least twice per model; dramatic single-run behaviour (hedging, mislabelling, substitution) must reproduce before you cite it. + +## Output + +A short verdict table (criteria × model), a tier (★★★ / ★★☆ / ★☆☆), a route (anchor / contract / reject), and the filled-in proposal or rejection text. See `references/worked-example.md` for a complete run (the Morphological Box). diff --git a/plugins/semantic-anchors/skills/anchor-prior-test/references/clean-room.md b/plugins/semantic-anchors/skills/anchor-prior-test/references/clean-room.md new file mode 100644 index 0000000..bd365d4 --- /dev/null +++ b/plugins/semantic-anchors/skills/anchor-prior-test/references/clean-room.md @@ -0,0 +1,36 @@ +# Clean Room + +The probes must run in a process that holds **none** of this project's context — otherwise the catalog's own `CLAUDE.md` (which already defines terms like "Cockburn's Fully Dressed format") makes the result circular. + +## The command + +```sh +mkdir -p /tmp/anchor-probe && cd /tmp/anchor-probe +claude -p "" --model \ + --strict-mcp-config --setting-sources "" +``` + +- `--setting-sources ""` drops user/project/local config — including the global `~/.claude/CLAUDE.md` **and** auto-memory. +- A **neutral working directory** (`/tmp/anchor-probe`, never a repo subdir) drops the project `CLAUDE.md`, which is discovered by walking up the tree. +- `--strict-mcp-config` disables MCP tools, so the model answers from training only. + +Verify once per session that it is clean: + +```sh +cd /tmp/anchor-probe +claude -p "Do you have any custom instructions loaded (CLAUDE.md, contracts, memory)? Name them or say 'none'." \ + --model haiku --strict-mcp-config --setting-sources "" +# Expect: "none" +``` + +## Why not sub-agents + +Sub-agents spawned inside the session inherit the parent's context, which is **frozen at session start-up**. A weak sub-agent will quote the project's own contracts back at you ("…as named in the Semantic Anchors CLAUDE.md contract"). Renaming files mid-session does not fix this, because the snapshot is already taken. Only a *freshly started* process reads the (now absent) files from disk. This is the whole reason the skill uses `claude -p` rather than the Agent tool. + +## Models + +Use the alias tiers `haiku` (weak), `sonnet` (mid), `opus` (strong). The weak tier is the most informative: a strong anchor survives even there. Older generations (Claude 3.x) are usually not reachable via the Code OAuth path — and you do not need them. If a concept is thin on the *latest* models, it is thinner on older ones; report the gap as a lower bound. + +## Cost and fallback + +Each probe is a full model call. A typical battery is 4 probe types × 3 tiers ≈ 10–14 calls; run them in the background. If only one model tier is available, the test still runs but states reduced confidence in the verdict — note this explicitly rather than implying a full sweep. diff --git a/plugins/semantic-anchors/skills/anchor-prior-test/references/probe-battery.md b/plugins/semantic-anchors/skills/anchor-prior-test/references/probe-battery.md new file mode 100644 index 0000000..0ba40e4 --- /dev/null +++ b/plugins/semantic-anchors/skills/anchor-prior-test/references/probe-battery.md @@ -0,0 +1,41 @@ +# Probe Battery + +Four probe types. Each ends with a `SELF:` line so results can be tabled without re-reading the full output. Run each across the weak and strong tiers at least; run the decisive probes (P1 recognition, P3 action) at least twice per tier. + +Prefix every prompt with: `Answer only from training knowledge; no tools; plain text.` + +## P1 — Recognition & richness (decisive) + +> What concepts do you associate with ``? List them. +> End with a line: `SELF: confidence=; richness=; reaching=; recognized=` + +Measures whether the prior exists and how rich it is. A strong anchor: `recognized=yes`, `richness=rich`, `reaching=no`, and a long list of interconnected concepts. + +## P2 — Name ambiguity + +> What is ``? +> End with a line: `SELF: dominant_domain=; mentioned_=` + +Run the *bare*, unqualified term. If the dominant domain is the wrong field (e.g. "morphological analysis" → linguistics), the catalog must use the qualified name. This sets the recommended anchor string. + +## P3 — Anchor action (decisive) + +> Using ``, . +> End with a line: `SELF: produced_=; _used=` + +Tests the term as an *instruction*, not as a quiz. This is what an anchor is actually for. A strong anchor makes the model produce the term's characteristic structure without being told what it is. Compare against a no-anchor baseline ("do the task, any format you like") to see the lift. + +## P4 — Attribution + +> Who created ``, and roughly when? +> End with a line: `SELF: creator=; confidence=` + +Confirms the *Attributable* criterion and surfaces misattribution (a common failure: the popular name differs from the inventor). + +## Optional — Substitution check + +For a suspected weak prior, ask for the term explicitly and watch whether the model **hedges**, **invents**, or **silently substitutes** a neighbouring concept. Silent substitution under a confident heading is the signature of an absent prior and a strong reason to reject (or route to a contract). + +## Tabling results + +Collect the `SELF:` lines into one matrix (rows = probe, columns = model). Quote the most telling raw sentence (a hedge, a misattribution, a rich association list) — verbatim quotes are the evidence the proposal needs. diff --git a/plugins/semantic-anchors/skills/anchor-prior-test/references/scoring.md b/plugins/semantic-anchors/skills/anchor-prior-test/references/scoring.md new file mode 100644 index 0000000..5fbd34b --- /dev/null +++ b/plugins/semantic-anchors/skills/anchor-prior-test/references/scoring.md @@ -0,0 +1,35 @@ +# Scoring + +Map the probe results to the catalog's four quality criteria, a prior-density judgement, and a tier. The criteria are defined in `CONTRIBUTING.adoc`. + +## The four criteria → which probe answers it + +| Criterion | Met when | Evidence from | +|---|---|---| +| **Precise** | the term names one bounded body of knowledge | P1 (focused, not scattered) + P2 (unambiguous, or qualified) | +| **Rich** | it activates many interconnected concepts | P1 (`richness=rich`, long list) | +| **Consistent** | different users/models get the same activation | P1 + P3 agree across tiers; P2 not split across fields | +| **Attributable** | it traces to a proponent / publication | P4 (`creator` named, `confidence=high`) | + +Add a fifth, decisive judgement: + +- **Prior density** — recognized + rich + *acts as an instruction* (P3) across **all** tiers, including the weak one. This is what separates an anchor from a merely real concept. If the weak tier fails recognition or action, the prior is thin. + +## Tier rating (from the proposal template) + +- **★★★ Self-standing** — the bare (or single recommended) name reliably triggers the behaviour on all tiers. P3 produces the structure without explanation. +- **★★☆ Needs qualification** — known, but requires a qualifier for reliable results (P2 shows ambiguity, or the weak tier needs the explicit form). Record the recommended qualified string. +- **★☆☆ Descriptive only** — names a concept but gives no actionable instruction; P3 does not produce a characteristic structure. **Reject** unless a concrete prompt pattern is demonstrated. + +## Route the term + +- **Strong prior + matches intended use → anchor.** Emit the `propose-anchor.yml` activation-test section, pre-filled with the matrix and verbatim quotes. +- **Weak prior but the team uses the vocabulary → contract.** A contract supplies meaning in text and does not depend on the prior. Note this in the verdict. +- **Weak prior and not used → reject.** Emit a `rejected-proposals.adoc` entry naming the failed criterion and the substitution observed. + +## Honesty rules + +- Cite only what reproduced (n ≥ 2 for decisive probes). Label single-run anecdotes as such. +- Separate **recall** (P1/P4) from **execution** (P3): a model may apply a move it cannot define. +- State the model set and that older generations were not tested; frame any gap as a lower bound, not a ceiling. +- If only one tier was available, lower the stated confidence and say so. diff --git a/plugins/semantic-anchors/skills/anchor-prior-test/references/worked-example.md b/plugins/semantic-anchors/skills/anchor-prior-test/references/worked-example.md new file mode 100644 index 0000000..034895d --- /dev/null +++ b/plugins/semantic-anchors/skills/anchor-prior-test/references/worked-example.md @@ -0,0 +1,30 @@ +# Worked Example — Morphological Box + +A complete run of the skill on the candidate term **Morphological Box** (Zwicky Box), 2026-06. Clean room: `claude -p --setting-sources "" --strict-mcp-config` from `/tmp`, on Claude Haiku 4.5, Sonnet 4.6, Opus 4.8. + +## Results + +| Probe | Haiku 4.5 | Sonnet 4.6 | Opus 4.8 | +|---|---|---|---| +| **P1 recognition / richness** | recognized, rich | recognized, rich | recognized, **rich** — Zwicky, parameters, values, Cartesian product, Cross-Consistency Assessment, General Morphological Analysis, ideation, scenario planning | +| **P3 action** ("lay out the design space for a mug") | parameter×value matrix ✓, CCA ✓ | matrix ✓, CCA ✓ | matrix ✓, CCA ✓ | +| **P4 attribution** | Fritz Zwicky, high | — | Fritz Zwicky (1940s–1969), high | +| **P2 bare "morphological analysis"** | innovation / systems eng. | — | **linguistics** | + +## Verdict + +| Criterion | Result | +|---|---| +| Precise | ✓ — *as* "Morphological Box / Zwicky Box". P2 shows the bare "morphological analysis" pulls a strong model to **linguistics**, so the name must carry "Box"/"Zwicky". | +| Rich | ✓✓ — dense (parameters × values, solution space, CCA, GMA). | +| Consistent | ✓ — with the qualified name, all tiers converge. | +| Attributable | ✓✓ — Fritz Zwicky, 1940s–1969. | +| Prior density | **Strong** — recognized + rich + acts as an instruction on *all* tiers, including Haiku. | + +**Tier: ★★★** (self-standing under the name "Morphological Box"; the bare "morphological analysis" form is ★★☆ — needs qualification). + +**Route: anchor.** Recommended name: *Morphological Box* (aka Zwicky Box). Natural placement: beside the **Pugh Matrix** in the decision flow — the box *generates* the option space, the Pugh Matrix *scores* it, the ADR *decides*. + +## Lesson captured + +The candidate was much stronger than a recent-but-niche term (contrast: "Use-Case 3.0", which all tiers doubted and substituted). The only risk was the **name**, caught by P2 — exactly the probe that exists to catch it. diff --git a/skill/anchor-prior-test/SKILL.md b/skill/anchor-prior-test/SKILL.md new file mode 100644 index 0000000..146a4dc --- /dev/null +++ b/skill/anchor-prior-test/SKILL.md @@ -0,0 +1,47 @@ +--- +name: anchor-prior-test +description: Empirically test whether a candidate term qualifies as a semantic anchor by probing how densely it sits in LLM training data — across several model tiers in a clean room — before proposing it. Use when triaging an anchor proposal, deciding anchor vs contract, or vetting a rename. Produces a verdict against the four criteria (Precise, Rich, Consistent, Attributable), a tier rating, and a ready-to-paste proposal or rejection. +metadata: + author: LLM-Coding + version: "1.0" + source: https://github.com/LLM-Coding/Semantic-Anchors +license: MIT +--- + +# Anchor Prior Test + +Measure whether a term is a strong semantic anchor instead of guessing. This skill turns the manual litmus test in `CONTRIBUTING.adoc` ("ask the LLM what it associates") into a rigorous, multi-model, clean-room procedure with a structured verdict. + +## The principle + +A semantic anchor only delivers leverage if the term is already a **dense, pre-computed prior** in the model's training data. Naming it must reliably trigger the rich concept, the same way for everyone, across models you do not control. This skill measures that density empirically. The reasoning behind it is the article *An Anchor Delivers Only as Far as the Prior Reaches* (route `/training-data-vs-practice`). + +Two facts drive the whole method: + +- **Power tracks density, not merit.** A good, recent, niche method (e.g. "Use-Case 3.0") can be a *weak* anchor; a model will silently substitute the nearest concept it holds rather than admit the gap. Density is what you measure. +- **A weak prior is not a dead end.** It is a candidate for a **contract** (which supplies its own meaning in text) instead of an anchor. The verdict routes the term to the right home. + +## When to use + +- Triaging a `[Anchor Proposal]` issue before accepting it. +- Deciding **anchor vs contract** for new vocabulary. +- Vetting a **rename** — does the new name trigger the same concept the body describes? + +## Procedure + +1. **Frame the candidate.** Write down the exact string a user would type. List the precise/qualified form *and* any ambiguous bare form (e.g. "Morphological Box / Zwicky Box" vs bare "morphological analysis"). Note rival terms a model might confuse it with. +2. **Open a clean room.** Run a *fresh* `claude -p` process — **never a sub-agent** (sub-agents inherit this project's `CLAUDE.md` and memory and will give circular results). See `references/clean-room.md`. +3. **Run the probe battery.** Four probe types across at least two model tiers (weak + strong), with at least two runs of the decisive probe. See `references/probe-battery.md`. +4. **Score.** Map results to the four criteria, prior density, and a tier (★). See `references/scoring.md`. +5. **Emit.** A verdict plus either a ready-to-paste `propose-anchor.yml` activation-test section, or a `rejected-proposals.adoc` entry with the reason. + +## Critical gotchas — do not skip + +- **Sub-agents are contaminated.** They inherit the session's `CLAUDE.md` + memory snapshot. Use a fresh `claude -p` with `--setting-sources ""` from a neutral directory. A running session freezes that context at start-up, so moving files mid-session does *not* help. +- **Recall ≠ execution.** Test both "what is X?" (knowledge) *and* "use X to do Y" (the anchor as instruction). A model can fail to explain a method yet still apply its core move when the move has a dense name. +- **Name ambiguity.** Always probe the bare term too. If a strong model defaults to a different field (e.g. "morphological analysis" → linguistics), the catalog name must carry a disambiguator. +- **n=1 is not evidence.** Run the decisive probe (recognition and the action test) at least twice per model; dramatic single-run behaviour (hedging, mislabelling, substitution) must reproduce before you cite it. + +## Output + +A short verdict table (criteria × model), a tier (★★★ / ★★☆ / ★☆☆), a route (anchor / contract / reject), and the filled-in proposal or rejection text. See `references/worked-example.md` for a complete run (the Morphological Box). diff --git a/skill/anchor-prior-test/references/clean-room.md b/skill/anchor-prior-test/references/clean-room.md new file mode 100644 index 0000000..bd365d4 --- /dev/null +++ b/skill/anchor-prior-test/references/clean-room.md @@ -0,0 +1,36 @@ +# Clean Room + +The probes must run in a process that holds **none** of this project's context — otherwise the catalog's own `CLAUDE.md` (which already defines terms like "Cockburn's Fully Dressed format") makes the result circular. + +## The command + +```sh +mkdir -p /tmp/anchor-probe && cd /tmp/anchor-probe +claude -p "" --model \ + --strict-mcp-config --setting-sources "" +``` + +- `--setting-sources ""` drops user/project/local config — including the global `~/.claude/CLAUDE.md` **and** auto-memory. +- A **neutral working directory** (`/tmp/anchor-probe`, never a repo subdir) drops the project `CLAUDE.md`, which is discovered by walking up the tree. +- `--strict-mcp-config` disables MCP tools, so the model answers from training only. + +Verify once per session that it is clean: + +```sh +cd /tmp/anchor-probe +claude -p "Do you have any custom instructions loaded (CLAUDE.md, contracts, memory)? Name them or say 'none'." \ + --model haiku --strict-mcp-config --setting-sources "" +# Expect: "none" +``` + +## Why not sub-agents + +Sub-agents spawned inside the session inherit the parent's context, which is **frozen at session start-up**. A weak sub-agent will quote the project's own contracts back at you ("…as named in the Semantic Anchors CLAUDE.md contract"). Renaming files mid-session does not fix this, because the snapshot is already taken. Only a *freshly started* process reads the (now absent) files from disk. This is the whole reason the skill uses `claude -p` rather than the Agent tool. + +## Models + +Use the alias tiers `haiku` (weak), `sonnet` (mid), `opus` (strong). The weak tier is the most informative: a strong anchor survives even there. Older generations (Claude 3.x) are usually not reachable via the Code OAuth path — and you do not need them. If a concept is thin on the *latest* models, it is thinner on older ones; report the gap as a lower bound. + +## Cost and fallback + +Each probe is a full model call. A typical battery is 4 probe types × 3 tiers ≈ 10–14 calls; run them in the background. If only one model tier is available, the test still runs but states reduced confidence in the verdict — note this explicitly rather than implying a full sweep. diff --git a/skill/anchor-prior-test/references/probe-battery.md b/skill/anchor-prior-test/references/probe-battery.md new file mode 100644 index 0000000..0ba40e4 --- /dev/null +++ b/skill/anchor-prior-test/references/probe-battery.md @@ -0,0 +1,41 @@ +# Probe Battery + +Four probe types. Each ends with a `SELF:` line so results can be tabled without re-reading the full output. Run each across the weak and strong tiers at least; run the decisive probes (P1 recognition, P3 action) at least twice per tier. + +Prefix every prompt with: `Answer only from training knowledge; no tools; plain text.` + +## P1 — Recognition & richness (decisive) + +> What concepts do you associate with ``? List them. +> End with a line: `SELF: confidence=; richness=; reaching=; recognized=` + +Measures whether the prior exists and how rich it is. A strong anchor: `recognized=yes`, `richness=rich`, `reaching=no`, and a long list of interconnected concepts. + +## P2 — Name ambiguity + +> What is ``? +> End with a line: `SELF: dominant_domain=; mentioned_=` + +Run the *bare*, unqualified term. If the dominant domain is the wrong field (e.g. "morphological analysis" → linguistics), the catalog must use the qualified name. This sets the recommended anchor string. + +## P3 — Anchor action (decisive) + +> Using ``, . +> End with a line: `SELF: produced_=; _used=` + +Tests the term as an *instruction*, not as a quiz. This is what an anchor is actually for. A strong anchor makes the model produce the term's characteristic structure without being told what it is. Compare against a no-anchor baseline ("do the task, any format you like") to see the lift. + +## P4 — Attribution + +> Who created ``, and roughly when? +> End with a line: `SELF: creator=; confidence=` + +Confirms the *Attributable* criterion and surfaces misattribution (a common failure: the popular name differs from the inventor). + +## Optional — Substitution check + +For a suspected weak prior, ask for the term explicitly and watch whether the model **hedges**, **invents**, or **silently substitutes** a neighbouring concept. Silent substitution under a confident heading is the signature of an absent prior and a strong reason to reject (or route to a contract). + +## Tabling results + +Collect the `SELF:` lines into one matrix (rows = probe, columns = model). Quote the most telling raw sentence (a hedge, a misattribution, a rich association list) — verbatim quotes are the evidence the proposal needs. diff --git a/skill/anchor-prior-test/references/scoring.md b/skill/anchor-prior-test/references/scoring.md new file mode 100644 index 0000000..5fbd34b --- /dev/null +++ b/skill/anchor-prior-test/references/scoring.md @@ -0,0 +1,35 @@ +# Scoring + +Map the probe results to the catalog's four quality criteria, a prior-density judgement, and a tier. The criteria are defined in `CONTRIBUTING.adoc`. + +## The four criteria → which probe answers it + +| Criterion | Met when | Evidence from | +|---|---|---| +| **Precise** | the term names one bounded body of knowledge | P1 (focused, not scattered) + P2 (unambiguous, or qualified) | +| **Rich** | it activates many interconnected concepts | P1 (`richness=rich`, long list) | +| **Consistent** | different users/models get the same activation | P1 + P3 agree across tiers; P2 not split across fields | +| **Attributable** | it traces to a proponent / publication | P4 (`creator` named, `confidence=high`) | + +Add a fifth, decisive judgement: + +- **Prior density** — recognized + rich + *acts as an instruction* (P3) across **all** tiers, including the weak one. This is what separates an anchor from a merely real concept. If the weak tier fails recognition or action, the prior is thin. + +## Tier rating (from the proposal template) + +- **★★★ Self-standing** — the bare (or single recommended) name reliably triggers the behaviour on all tiers. P3 produces the structure without explanation. +- **★★☆ Needs qualification** — known, but requires a qualifier for reliable results (P2 shows ambiguity, or the weak tier needs the explicit form). Record the recommended qualified string. +- **★☆☆ Descriptive only** — names a concept but gives no actionable instruction; P3 does not produce a characteristic structure. **Reject** unless a concrete prompt pattern is demonstrated. + +## Route the term + +- **Strong prior + matches intended use → anchor.** Emit the `propose-anchor.yml` activation-test section, pre-filled with the matrix and verbatim quotes. +- **Weak prior but the team uses the vocabulary → contract.** A contract supplies meaning in text and does not depend on the prior. Note this in the verdict. +- **Weak prior and not used → reject.** Emit a `rejected-proposals.adoc` entry naming the failed criterion and the substitution observed. + +## Honesty rules + +- Cite only what reproduced (n ≥ 2 for decisive probes). Label single-run anecdotes as such. +- Separate **recall** (P1/P4) from **execution** (P3): a model may apply a move it cannot define. +- State the model set and that older generations were not tested; frame any gap as a lower bound, not a ceiling. +- If only one tier was available, lower the stated confidence and say so. diff --git a/skill/anchor-prior-test/references/worked-example.md b/skill/anchor-prior-test/references/worked-example.md new file mode 100644 index 0000000..034895d --- /dev/null +++ b/skill/anchor-prior-test/references/worked-example.md @@ -0,0 +1,30 @@ +# Worked Example — Morphological Box + +A complete run of the skill on the candidate term **Morphological Box** (Zwicky Box), 2026-06. Clean room: `claude -p --setting-sources "" --strict-mcp-config` from `/tmp`, on Claude Haiku 4.5, Sonnet 4.6, Opus 4.8. + +## Results + +| Probe | Haiku 4.5 | Sonnet 4.6 | Opus 4.8 | +|---|---|---|---| +| **P1 recognition / richness** | recognized, rich | recognized, rich | recognized, **rich** — Zwicky, parameters, values, Cartesian product, Cross-Consistency Assessment, General Morphological Analysis, ideation, scenario planning | +| **P3 action** ("lay out the design space for a mug") | parameter×value matrix ✓, CCA ✓ | matrix ✓, CCA ✓ | matrix ✓, CCA ✓ | +| **P4 attribution** | Fritz Zwicky, high | — | Fritz Zwicky (1940s–1969), high | +| **P2 bare "morphological analysis"** | innovation / systems eng. | — | **linguistics** | + +## Verdict + +| Criterion | Result | +|---|---| +| Precise | ✓ — *as* "Morphological Box / Zwicky Box". P2 shows the bare "morphological analysis" pulls a strong model to **linguistics**, so the name must carry "Box"/"Zwicky". | +| Rich | ✓✓ — dense (parameters × values, solution space, CCA, GMA). | +| Consistent | ✓ — with the qualified name, all tiers converge. | +| Attributable | ✓✓ — Fritz Zwicky, 1940s–1969. | +| Prior density | **Strong** — recognized + rich + acts as an instruction on *all* tiers, including Haiku. | + +**Tier: ★★★** (self-standing under the name "Morphological Box"; the bare "morphological analysis" form is ★★☆ — needs qualification). + +**Route: anchor.** Recommended name: *Morphological Box* (aka Zwicky Box). Natural placement: beside the **Pugh Matrix** in the decision flow — the box *generates* the option space, the Pugh Matrix *scores* it, the ADR *decides*. + +## Lesson captured + +The candidate was much stronger than a recent-but-niche term (contrast: "Use-Case 3.0", which all tiers doubted and substituted). The only risk was the **name**, caught by P2 — exactly the probe that exists to catch it.