docs: add viability test to contribution guidelines by JensGrote · Pull Request #641 · LLM-Coding/Semantic-Anchors

JensGrote · 2026-06-25T09:49:26Z

(German version below)

docs: add viability test to contribution guidelines

Recognition ≠ activation. The current contribution workflow tests whether a model recognizes a proposed anchor term — but not whether naming it actually changes the output. The training-data-vs-practice article demonstrated empirically that these are different things: "Use-Case 3.0" is recognized by every model, yet naming it produces silent substitution or confabulation rather than reliable activation.

Changes

1. New section in CONTRIBUTING.adoc (EN + DE): "Viability Test — Does the Anchor Deliver?"

Positioned between "Testing Your Semantic Anchor" and "Developer Setup". Describes:

The Before/After test (same task without vs. with anchor — does structure change?)
Three failure modes to watch for (silent substitution, confabulation, no structural change)
Cross-model check (weak + strong model, mapped to ★★★/★★/★ tier system)
When to use a contract instead (passes quality criteria but fails viability → contract, not anchor)

2. New optional field in issue template (propose-anchor.yml)

"Viability Test" textarea — encourages proposers to include a before/after comparison. Starts as optional; can be made mandatory once adopted.

3. Renamed rejection category (rejected-proposals.adoc)

"Not in training data" → "Insufficient training-data density" — expanded definition covering the three failure modes and pointing to the article for methodology. Updated the MIRRR UX Framework entry to use the new category name.

Rationale

The existing four quality criteria (Precise, Rich, Consistent, Attributable) are necessary but not sufficient. A term can pass all four yet still fail in practice because:

The prior is too thin (silent substitution on weaker models)
The model confabulates (invents plausible but wrong content)
The term is recognized but doesn't actually reshape output

The viability test makes the implicit ★★★/★★/★ tier distinction explicit and testable — providing clear criteria for reviewers and a methodology proposers can follow.

Not included (deliberately)

No retroactive re-evaluation of existing anchors (the tier system already captures this implicitly)
No schema changes
The review agent workflow is not modified (future work)

Deutsche Version

docs: Viability-Test zu den Contribution-Guidelines hinzufügen

Erkennung ≠ Aktivierung. Der aktuelle Contribution-Workflow testet, ob ein Modell einen vorgeschlagenen Anchor-Term erkennt — aber nicht, ob das Benennen die Ausgabe tatsächlich verändert. Der training-data-vs-practice-Artikel hat empirisch gezeigt, dass das verschiedene Dinge sind: "Use-Case 3.0" wird von jedem Modell erkannt, produziert aber Silent Substitution oder Confabulation statt zuverlässiger Aktivierung.

Änderungen

1. Neuer Abschnitt in CONTRIBUTING.adoc (EN + DE): "Viability-Test — Liefert der Anker?"

Positioniert zwischen "Testen Ihres semantischen Ankers" und "Entwickler-Setup". Beschreibt:

Den Before/After-Test (gleiche Aufgabe ohne vs. mit Anchor — ändert sich die Struktur?)
Drei Failure-Modes auf die zu achten ist (Silent Substitution, Confabulation, No Structural Change)
Cross-Model-Check (schwaches + starkes Modell, gemappt auf ★★★/★★/★ Tier-System)
Wann stattdessen einen Contract verwenden (besteht Qualitätskriterien aber nicht Viability → Contract, nicht Anchor)

2. Neues optionales Feld im Issue-Template (propose-anchor.yml)

"Viability Test" Textarea — ermutigt Proposer, einen Before/After-Vergleich einzufügen. Startet als optional; kann obligatorisch werden sobald adoptiert.

3. Umbenannte Rejection-Kategorie (rejected-proposals.adoc)

"Not in training data" → "Insufficient training-data density" — erweiterte Definition die die drei Failure-Modes abdeckt und auf den Artikel für Methodik verweist. MIRRR UX Framework-Eintrag auf neuen Kategorienamen aktualisiert.

Begründung

Die bestehenden vier Qualitätskriterien (Precise, Rich, Consistent, Attributable) sind notwendig aber nicht hinreichend. Ein Term kann alle vier bestehen und dennoch in der Praxis scheitern, weil:

Die Prior zu dünn ist (Silent Substitution auf schwächeren Modellen)
Das Modell konfabuliert (erfindet plausiblen aber falschen Inhalt)
Der Term erkannt wird aber die Ausgabe nicht tatsächlich verändert

Der Viability-Test macht die implizite ★★★/★★/★ Tier-Unterscheidung explizit und testbar — mit klaren Kriterien für Reviewer und einer Methodik der Proposer folgen können.

Summary by CodeRabbit

Neue Funktionen
- Ergänzt wurde ein optionaler Viability-Test für neue semantische Anchors, um zu prüfen, ob sie die Ausgabe tatsächlich strukturell verändern.
Dokumentation
- Die Beitrags- und Mitwirkungsleitfäden wurden um Before/After-Tests, Failure-Modes, Cross-Model-Checks und Hinweise zur Auswahl zwischen Anchor und Contract erweitert.
- Die deutsche Dokumentation wurde entsprechend aktualisiert und bereinigt.
- Die Liste abgelehnter Vorschläge wurde sprachlich und inhaltlich präzisiert.

Recognition ≠ activation. The existing activation test checks whether a model knows a term, but not whether naming it changes the output. Add a Viability Test section (EN + DE) that checks: - Before/After structural change - Silent substitution risk - Confabulation risk - Cross-model portability Also: rename 'Not in training data' rejection category to 'Insufficient training-data density' with expanded definition covering the three failure modes. Refs: training-data-vs-practice article, LLM-Coding#585

coderabbitai · 2026-06-25T09:49:37Z

Walkthrough

Die PR ergänzt das Issue-Template und die Beitragsdokumentation um einen Viability-Test für semantische Anker, aktualisiert die deutsche und öffentliche Doku dazu und benennt in den abgelehnten Vorschlägen eine Kategorie um.

Changes

Viability-Test und Dokumentation

Layer / File(s)	Summary
Vorlage und englische Anleitung `.github/ISSUE_TEMPLATE/propose-anchor.yml`, `CONTRIBUTING.adoc`	Das Issue-Formular fügt `viability-test` hinzu; die englische Anleitung ergänzt Before/After-Test, Failure Modes, Cross-Model-Check und die Abgrenzung zu Contracts.
Deutsche Doku und Website `docs/CONTRIBUTING.de.adoc`, `website/public/CONTRIBUTING.de.adoc`	Die deutschen Beitragsdokumente ergänzen den Viability-Test, entfernen den Pflicht-Hinweis zum Aktivierungsergebnis, erweitern den Abschnitt zur Anker-Datei und entfernen die Issue-Titel-Konvention.
Ablehnungskategorie `docs/rejected-proposals.adoc`	Die Rejected-Proposals-Seite benennt eine Kategorie zu `Insufficient training-data density` um und passt einen Tabelleneintrag an.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[Process Proposal]: Add Viability Test to contribution guidelines — recognition ≠ activation #640: Behandelt dieselbe Viability-Test-Guidance sowie die Änderung im Issue-Template und die Umbenennung in docs/rejected-proposals.adoc.

Possibly related PRs

LLM-Coding/Semantic-Anchors#571: Änderte dieselben Issue-Template- und CONTRIBUTING-Stellen rund um Aktivierungs- und Testfelder für semantische Anker.
LLM-Coding/Semantic-Anchors#586: Führte die Methodikseite ein, auf die die neue Viability-Test-Dokumentation verweist.
LLM-Coding/Semantic-Anchors#594: Fügte die Cross-Model- und Failure-Mode-Methodik hinzu, die hier in die Viability-Test-Anleitung einfließt.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Der Titel beschreibt die Hauptänderung treffend: ein Viability-Test wurde zu den Beitragsrichtlinien hinzugefügt.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/rejected-proposals.adoc`:
- Around line 12-13: The rejected-proposals category rename is only partially
applied, leaving mixed terminology between “Insufficient training-data density”
and the older label. Update the relevant rejected-proposals documentation and
any linked references in the same proposal stack so the category is named
consistently everywhere, using the existing anchor and section names in this
document to locate the affected entries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8fbd932c-68e8-4cac-bf96-838204d909f5

📥 Commits

Reviewing files that changed from the base of the PR and between 6b65efc and 48eaa62.

📒 Files selected for processing (5)

.github/ISSUE_TEMPLATE/propose-anchor.yml
CONTRIBUTING.adoc
docs/CONTRIBUTING.de.adoc
docs/rejected-proposals.adoc
website/public/CONTRIBUTING.de.adoc

coderabbitai · 2026-06-25T09:57:00Z

+Insufficient training-data density::
+The term is recognised (models can talk _about_ it) but does not reliably _activate_ the concept in outputs. Naming it does not structurally change the output compared to a generic prompt, or triggers silent substitution / confabulation on weaker models. Terms in this category may become viable anchors as training data grows — consider a link:#/contracts[contract] for now. See link:#/training-data-vs-practice[An Anchor Delivers Only as Far as the Prior Reaches] for the empirical methodology. Fails the viability test.


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Kategorie-Umbenennung nicht isoliert ausrollen.

Hier wird „Insufficient training-data density“ bereits eingeführt, aber der bereitgestellte Kontext zeigt, dass docs/plans/2026-03-09-rejected-proposals-design.md weiterhin „Not in training data“ verwendet. Bitte die Umbenennung und alle Referenzen im selben Stack aktualisieren, sonst entsteht eine doppelte Bezeichnung für dieselbe Kategorie.

Also applies to: 45-45

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/rejected-proposals.adoc` around lines 12 - 13, The rejected-proposals category rename is only partially applied, leaving mixed terminology between “Insufficient training-data density” and the older label. Update the relevant rejected-proposals documentation and any linked references in the same proposal stack so the category is named consistently everywhere, using the existing anchor and section names in this document to locate the affected entries.

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

JensGrote mentioned this pull request Jun 25, 2026

[Anchor Proposal]: Sender-Receiver Discrepancy #639

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add viability test to contribution guidelines#641

docs: add viability test to contribution guidelines#641
JensGrote wants to merge 1 commit into
LLM-Coding:mainfrom
JensGrote:docs/viability-test-contributing

JensGrote commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		Insufficient training-data density::
		The term is recognised (models can talk _about_ it) but does not reliably _activate_ the concept in outputs. Naming it does not structurally change the output compared to a generic prompt, or triggers silent substitution / confabulation on weaker models. Terms in this category may become viable anchors as training data grows — consider a link:#/contracts[contract] for now. See link:#/training-data-vs-practice[An Anchor Delivers Only as Far as the Prior Reaches] for the empirical methodology. Fails the viability test.

Uh oh!

Conversation

JensGrote commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!