docs: add cross-model validation section by JensGrote · Pull Request #594 · LLM-Coding/Semantic-Anchors

JensGrote · 2026-06-10T08:29:49Z

Adds a "Cross-Model Validation" section to the training-data-vs-practice article, as requested in #586 (comment).

What's in this section

Reproduction of the full A–E battery against GPT-5, GPT-5-mini and Gemini 2.5 Flash
Confirms the anchor mechanism is model-family-independent (Cockburn fires universally)
Documents three failure modes: transparent substitution, silent substitution, and confabulation (new)
Introduces the "Anchor Viability Horizon" concept (stable / emerging / pre-anchor)
Summary table with results across all three models

Placement

Inserted after "How Far the Prior Actually Reaches" and before "Why the Anchor Lags Real Practice".

Raw data

Test outputs are in anchor-activation-test-20260609/ — happy to include if useful.

cc @raifdmueller

Summary by CodeRabbit

Dokumentation
- Neuer Abschnitt zur Cross-Model Validation mit erweiterten Tests über verschiedene KI-Modelle hinweg
- Dokumentation von Failure Modes und einer Anchor Viability Horizon zur Bewertung der Modellunabhängigkeit
- Aktualisierte Zusammenfassungstabelle mit verbesserten Richtlinien für die praktische Anwendung

Reproduces the A–E anchor activation battery against GPT-5, GPT-5-mini and Gemini 2.5 Flash. Confirms the mechanism is model-family-independent and documents a third failure mode (confabulation) not visible in the Claude-only test. Addresses: LLM-Coding#586 (comment)

coderabbitai · 2026-06-10T08:30:03Z

Walkthrough

Die Änderung ergänzt die Dokumentation um einen neuen Abschnitt „Cross-Model Validation", der zeigt, dass das Cockburn fully-dressed-Framing Semantic Anchors über mehrere LLM-Modelle hinweg zuverlässig auslöst, während gleichzeitig Failure Modes und ein Viability-Konzept zur Beurteilung der Anker-Stabilität eingeführt werden.

Changes

Cross-Model Validation für Semantic Anchors

Layer / File(s)	Zusammenfassung
Experimenteinführung und Methodik `docs/training-data-vs-practice.adoc`	Einleitung des neuen Cross-Model-Validation-Abschnitts mit Beschreibung des Experimental-Designs: Die A–E-Framing-Batterie und P1–P5 Prior-Mapping-Probes werden repliziert gegen GPT-5, GPT-5-mini und Gemini 2.5 Flash getestet.
Universale Cockburn-Anker-Feuerung `docs/training-data-vs-practice.adoc`	Kernfeststellung, dass das „fully-dressed use cases according to Cockburn"-Framing das vollständige Use-Case-Apparat bei allen getesteten Modellen zuverlässig auslöst, unabhängig von Hersteller und Größe.
Failure Modes und Viability Horizon `docs/training-data-vs-practice.adoc`	Detaillierte Analyse dreier Failure-Modi für „Use-Case 3.0"-Anker (transparente Substitution, stille Substitution, Confabulation). Einführung des Anchor Viability Horizon Konzepts, das Ankerstabilität nach Trainingskorpus-Datendichte in stabil/emerging/pre-anchor Kategorien unterteilt.
Zusammenfassung und Implikationen `docs/training-data-vs-practice.adoc`	Tabellarische Zusammenfassung der Framing A–E Responses über die drei Modelle. Drei Implikationen: Modellfamilien-Unabhängigkeit, Notwendigkeit der dreistufigen Architektur zur Konfabulationsbehandlung, periodische Viability-Tests erforderlich.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 Minuten

Possibly related PRs

LLM-Coding/Semantic-Anchors#591: Beide PRs ändern docs/training-data-vs-practice.adoc — #591 ergänzt „Fable 5" in die Prior-Mapping-Matrix, während diese PR eine „Cross-Model Validation" über mehrere Modelle (GPT-5, Gemini) mit Failure-Mode-Analyse durchführt.
LLM-Coding/Semantic-Anchors#586: Diese PR erweitert die in #586 hinzugefügte docs/training-data-vs-practice.adoc Dokumentation direkt, indem sie neue „Cross-Model Validation"-Inhalte (A–E-Vergleiche, Failure Modes, Viability Horizon) anfügt und damit auf den Grundlagen von #586 aufbaut.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Der Titel beschreibt präzise die Hauptänderung: Ein neuer Abschnitt zur Cross-Model-Validierung wird zur Dokumentation hinzugefügt.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/training-data-vs-practice.adoc`:
- Line 112: The document text claims "five prior-mapping probes (P1–P5)" but the
probe definitions only include P1–P4; either change the text to "four
prior-mapping probes (P1–P4)" or add a missing P5 definition to the probe
battery so labels match. Locate the string "five prior-mapping probes (P1–P5)"
in the adoc and replace it with "four prior-mapping probes (P1–P4)" if you
choose the first option, or if you add P5, create a matching probe entry named
P5 in the probe-battery definitions (and update any related references or
indexes) so that the P1–P5 labeling is consistent across the docs.
- Around line 114-116: Die Einträge für die Modelle GPT-5, GPT-5-mini und Gemini
2.5 Flash in der Modell-Liste sind datumsfehlerhaft; passe die „Mai
2026“-Angaben an die tatsächlichen Veröffentlichungs-/GA-Daten (z. B. GPT-5 /
GPT-5-mini → August 2025; Gemini 2.5 Flash → 2025 Preview/GA) und füge kurz
überprüfbare Quellen hinzu; suche die drei Namen (GPT-5, GPT-5-mini, Gemini 2.5
Flash) in docs/training-data-vs-practice.adoc, aktualisiere die Datumsstrings
dort und ergänze je einen Quellenhinweis/Footnote mit einem verlässlichen Link
oder Referenztext.
- Line 118: The doc references a non-existent Raw outputs directory
"anchor-activation-test-20260609/"; update docs/training-data-vs-practice.adoc
to either (A) add a clear note that Raw outputs come from the
"anchor-activation-test-20260609/" experiment and describe how to generate or
obtain them (e.g., reproduction steps, script name, or download location), or
(B) change the reference to the actual repository artifact pattern (e.g.,
"anchor-activation-test-*") and explain which variant to use; ensure the string
"anchor-activation-test-20260609/" in the file is replaced or augmented with the
generation/ acquisition instructions so readers can locate or recreate the Raw
outputs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0556dfd3-6115-4f45-841c-e8263189ad4c

📥 Commits

Reviewing files that changed from the base of the PR and between c5d4735 and 7b4021c.

📒 Files selected for processing (1)

docs/training-data-vs-practice.adoc

coderabbitai · 2026-06-10T08:34:39Z

+
+=== Setup
+
+The same five framings (A–E) and five prior-mapping probes (P1–P5) were run against three non-Claude models, each in a clean session without system prompts or custom instructions:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Überprüfung der definierten Probe-Typen in der Methodendokumentation rg -n "^## P[0-9]" plugins/semantic-anchors/skills/anchor-prior-test/references/probe-battery.md

Repository: LLM-Coding/Semantic-Anchors

Length of output: 200

Inkonsistenz bei den Probe-Typen (P1–P5 vs. Methodendefinition)

In docs/training-data-vs-practice.adoc (Zeile 112) wird „five prior-mapping probes (P1–P5)“ behauptet, aber plugins/semantic-anchors/skills/anchor-prior-test/references/probe-battery.md definiert nur P1–P4 (kein P5). Entweder „four prior-mapping probes (P1–P4)“ verwenden oder P5 in der Methodendokumentation ergänzen.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/training-data-vs-practice.adoc` at line 112, The document text claims "five prior-mapping probes (P1–P5)" but the probe definitions only include P1–P4; either change the text to "four prior-mapping probes (P1–P4)" or add a missing P5 definition to the probe battery so labels match. Locate the string "five prior-mapping probes (P1–P5)" in the adoc and replace it with "four prior-mapping probes (P1–P4)" if you choose the first option, or if you add P5, create a matching probe entry named P5 in the probe-battery definitions (and update any related references or indexes) so that the P1–P5 labeling is consistent across the docs.

coderabbitai · 2026-06-10T08:34:39Z

+* **GPT-5** (OpenAI, May 2026)
+* **GPT-5-mini** (OpenAI, May 2026)
+* **Gemini 2.5 Flash** (Google, May 2026)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Gibt es offizielle Veröffentlichungen zu GPT-5 und GPT-5-mini von OpenAI im Mai 2026?

💡 Result:

Im Mai 2026 gab es keine offizielle Veröffentlichung oder Ankündigung von OpenAI, die sich spezifisch auf ein Modell namens "GPT-5" oder "GPT-5-mini" als neues Release bezog. Diese Modelle waren zu diesem Zeitpunkt bereits seit August 2025 Teil der OpenAI-Modellfamilie [1][2][3]. Zusammenfassend lässt sich der Stand der Dinge im Mai 2026 wie folgt einordnen: 1. GPT-5 und GPT-5-mini: Die Basis-Modelle der GPT-5-Serie wurden bereits am 7. August 2025 offiziell eingeführt und für Entwickler in der API sowie für Nutzer in ChatGPT bereitgestellt [1][2][3]. "GPT-5-mini" wurde dabei als kosteneffizientere und schnellere Variante innerhalb dieses Systems spezifiziert [4][1][3]. 2. Aktuelle Entwicklungen im Mai 2026: Im Mai 2026 lag der Fokus von OpenAI bereits auf weiterentwickelten Modellen der GPT-5-Reihe. So wurde beispielsweise GPT-5.5 bereits am 23. April 2026 vorgestellt [5] und Anfang Juni 2026 die Verfügbarkeit von Modellen wie GPT-5.4 und GPT-5.5 auf der AWS Bedrock-Plattform bekannt gegeben [6]. 3. Hintergrund: Die Modellbezeichnungen GPT-5 und GPT-5-mini gehören somit zum etablierten Portfolio, das bereits Monate vor Mai 2026 veröffentlicht wurde [1][2][7]. Berichte oder Suchanfragen im Mai 2026, die ein Release dieser spezifischen Modelle suggerieren, sind vor dem Hintergrund der bereits existierenden Verfügbarkeit nicht als Ankündigung einer Neuerscheinung zu werten [1][2].

Citations:

1: https://openai.com/index/introducing-gpt-5-for-developers/

2: https://openai.com/index/introducing-gpt-5/

3: https://openai.com/index/gpt-5-system-card/

4: https://developers.openai.com/api/docs/models/gpt-5-mini

5: https://openai.com/de-DE/index/introducing-gpt-5-5/

6: https://www.ad-hoc-news.de/wissenschaft/openai-auf-aws-gpt-5-5-und-gpt-5-4-ab-sofort-auf-bedrock/69474301

7: https://nevercodealone.de/de/glossare/ki-tools-2026/chatgpt-openai-gpt-5-modelle-2026

🌐 Web query:

Gibt es eine offizielle Veröffentlichung zu Gemini 2.5 Flash von Google im Mai 2026?

💡 Result:

Es gibt keine offizielle Veröffentlichung zu Gemini 2.5 Flash im Mai 2026, da dieses Modell bereits im Jahr 2025 veröffentlicht wurde [1][2]. Gemini 2.5 Flash wurde ursprünglich am 17. April 2025 als Preview-Version vorgestellt [3] und am 17. Juni 2025 als stabile, allgemein verfügbare Version (General Availability) für Entwickler und Unternehmen freigegeben [4][2]. Im Mai 2026 lag der Fokus der offiziellen Veröffentlichungen von Google bereits auf der neuen Modellgeneration Gemini 3.5. So wurde beispielsweise das Modell Gemini 3.5 Flash am 19. Mai 2026 offiziell veröffentlicht [5]. Die offizielle Dokumentation zur Gemini API listet Gemini 2.5 Flash als ein bereits etabliertes Modell der früheren 2.5-Familie, nicht als eine Neuerscheinung für den Mai 2026 [6].

Citations:

1: https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/gemini/2-5-flash

2: https://blog.google/products-and-platforms/products/gemini/gemini-2-5-model-family-expands/

3: https://blog.google/products-and-platforms/products/gemini/gemini-2-5-flash-preview/

4: https://developers.googleblog.com/gemini-2-5-thinking-model-updates/

5: https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/model-versions

6: https://ai.google.dev/gemini-api/docs/models

Veröffentlichungsdaten in der Modell-Liste korrigieren/belegen
In docs/training-data-vs-practice.adoc (Zeilen 114–116) wirken die „Mai 2026“-Angaben zu GPT-5/GPT-5-mini und „Gemini 2.5 Flash“ nicht stimmig: GPT-5 bzw. GPT-5-mini wurden bereits im August 2025 eingeführt, und Gemini 2.5 Flash war bereits 2025 (Preview/GA) verfügbar. Modellnamen können dabei korrekt sein, aber die genannten Daten sollten auf die tatsächlichen Veröffentlichungs-/GA-Daten angepasst und ggf. mit Quellen belegt werden.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/training-data-vs-practice.adoc` around lines 114 - 116, Die Einträge für die Modelle GPT-5, GPT-5-mini und Gemini 2.5 Flash in der Modell-Liste sind datumsfehlerhaft; passe die „Mai 2026“-Angaben an die tatsächlichen Veröffentlichungs-/GA-Daten (z. B. GPT-5 / GPT-5-mini → August 2025; Gemini 2.5 Flash → 2025 Preview/GA) und füge kurz überprüfbare Quellen hinzu; suche die drei Namen (GPT-5, GPT-5-mini, Gemini 2.5 Flash) in docs/training-data-vs-practice.adoc, aktualisiere die Datumsstrings dort und ergänze je einen Quellenhinweis/Footnote mit einem verlässlichen Link oder Referenztext.

coderabbitai · 2026-06-10T08:34:39Z

+* **GPT-5-mini** (OpenAI, May 2026)
+* **Gemini 2.5 Flash** (Google, May 2026)
+
+Raw outputs: `anchor-activation-test-20260609/`.


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Überprüfung, ob das referenzierte Verzeichnis existiert fd -t d "anchor-activation-test-20260609"

Repository: LLM-Coding/Semantic-Anchors

Length of output: 53

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Kontext in docs/training-data-vs-practice.adoc (Zeilen 100-140) ===" sed -n '100,140p' docs/training-data-vs-practice.adoc || true echo echo "=== Suche nach anchor-activation-test-20260609 im Repo ===" rg -n "anchor-activation-test-20260609" -S . || true echo echo "=== Suche nach Verzeichnis-Namensvarianten (Wildcard) ===" fd -t d "*anchor-activation-test-20260609*" . || true

Repository: LLM-Coding/Semantic-Anchors

Length of output: 3177

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Exakter Ordnername: anchor-activation-test-20260609 ===" find . -type d -name 'anchor-activation-test-20260609' -print echo echo "=== Ordner-Namensvarianten (glob) ===" fd -t d --glob '*anchor-activation-test-20260609*' . || true echo echo "=== Alle Ordner mit prefix anchor-activation-test-* (kurz) ===" fd -t d --glob 'anchor-activation-test-*' . | head -n 50 || true

Repository: LLM-Coding/Semantic-Anchors

Length of output: 234

Verzeichnisreferenz der Raw Outputs prüfen (anchor-activation-test-20260609/ fehlt)

In docs/training-data-vs-practice.adoc (Zeile 118) wird anchor-activation-test-20260609/ als Verzeichnis für „Raw outputs“ referenziert, jedoch existiert es (inkl. passender Namesvarianten mit Prefix anchor-activation-test-*) nicht im Repository. Entweder das Verzeichnis/Artefakte ergänzen oder die Doku so anpassen, dass klar ist, woher die Raw Outputs stammen bzw. wie sie generiert/bezogen werden.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/training-data-vs-practice.adoc` at line 118, The doc references a non-existent Raw outputs directory "anchor-activation-test-20260609/"; update docs/training-data-vs-practice.adoc to either (A) add a clear note that Raw outputs come from the "anchor-activation-test-20260609/" experiment and describe how to generate or obtain them (e.g., reproduction steps, script name, or download location), or (B) change the reference to the actual repository artifact pattern (e.g., "anchor-activation-test-*") and explain which variant to use; ensure the string "anchor-activation-test-20260609/" in the file is replaced or augmented with the generation/ acquisition instructions so readers can locate or recreate the Raw outputs.

rdmueller · 2026-06-11T05:09:23Z

Thanx!

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

rdmueller merged commit 6c70221 into LLM-Coding:main Jun 11, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add cross-model validation section#594

docs: add cross-model validation section#594
rdmueller merged 1 commit into
LLM-Coding:mainfrom
JensGrote:docs/cross-model-validation

JensGrote commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

Uh oh!

rdmueller commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		=== Setup

		The same five framings (A–E) and five prior-mapping probes (P1–P5) were run against three non-Claude models, each in a clean session without system prompts or custom instructions:

Conversation

JensGrote commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in this section

Placement

Raw data

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rdmueller commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JensGrote commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading