docs: sync prompt defaults and tighten prompt variants

jqbit · jqbit · commit 41bdc61e83a8 · 2026-05-08T18:09:43.000+02:00
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -37,7 +37,10 @@ jobs:
           python3 -m json.tool data/visualizations/charts.json >/dev/null
 
       - name: Check Python syntax
-        run: python3 -m py_compile bench/dspy/*.py bench/check-md-links.py
+        run: python3 -m py_compile bench/dspy/*.py bench/check-md-links.py bench/check-doc-sync.py
 
       - name: Check Markdown links
         run: python3 bench/check-md-links.py
+
+      - name: Check doc/prompt sync
+        run: python3 bench/check-doc-sync.py
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -45,6 +45,7 @@ Please include:
 - before/after examples if possible
 - which agent/app you tested with
 - whether the prompt still stays concise
+- whether the default still holds: 1 sentence, target 3 words, default max 6 words, greet = 1 word
 
 You do **not** need to run the full benchmark for every small PR. Manual examples are fine.
 
@@ -68,8 +69,9 @@ node --check bench/make-charts.js
 python3 -m json.tool data/benchmarks-summary.json >/dev/null
 python3 -m json.tool data/benchmarks-matrix.json >/dev/null
 python3 -m json.tool data/visualizations/charts.json >/dev/null
-python3 -m py_compile bench/dspy/*.py bench/check-md-links.py
+python3 -m py_compile bench/dspy/*.py bench/check-md-links.py bench/check-doc-sync.py
 python3 bench/check-md-links.py
+python3 bench/check-doc-sync.py
 ```
 
 ## Issues
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # TLDR.md — Too Long Didn't Read
 
-**The tiny prompt that cuts your agent’s yap by ~80%.**
+**The tiny prompt that gets your agent to the point.**
 
 TLDR.md makes AI assistants answer directly — no filler, no fake enthusiasm, no “let me know if...” sludge.
 
@@ -10,6 +10,15 @@ It is literally just a tiny Markdown prompt. Copy it where your agent reads inst
 >
 > It **ONLY CHANGES** the **COMMUNICATION STYLE**.
 
+## Current defaults
+
+- default: 1 sentence
+- target: 3 words
+- 1 word when sufficient
+- default max: 6 words
+- longer only if asked
+- greet: 1 word
+
 ## Which file should I use?
 
 | File | Use this if... |
@@ -68,12 +77,12 @@ Current prompt sizes:
 
 | File | Bytes |
 |---|---:|
-| [`TLDR.md`](TLDR.md) | 1,607 |
-| [`TLDR.blunt.md`](TLDR.blunt.md) | 1,868 |
+| [`TLDR.md`](TLDR.md) | 1,165 |
+| [`TLDR.blunt.md`](TLDR.blunt.md) | 1,478 |
 
-Headline results:
+Headline historical results:
 
-Note: results below were measured on the previous shipped prompts. Re-run benchmarks after adopting the current drafts.
+Note: results below were measured on earlier shipped prompts. The current prompt files were later tightened to a 1-sentence / 3-word-default / 6-word-max profile and have not yet been rerun through the full bench.
 
 - **TLDR.md v0.13.1:** −82.1% total prose reduction, 100% average compliance (5 agents × 5 prompts).
 - **TLDR.md v0.14.3:** −80.0% single-turn prose reduction; −75.1% across 8-turn coding conversations; no significant decay.
@@ -86,12 +95,11 @@ See [`data/benchmarks.md`](data/benchmarks.md), [`data/dspy-cross-model-results.
 ## Example outputs
 
 ```text
-Cause: port already bound.
-Fix: kill process or change PORT.
+Port busy; free it.
 ```
 
 ```text
-Yes — use SQLite first. Switch when writes/concurrency hurt.
+Yes. Start SQLite.
 ```
 
 ```text
@@ -101,7 +109,7 @@ git reset --soft HEAD~1
 ## Share line
 
 ```text
-TLDR.md gets your agents to the point — cuts your agent’s yap by ~80%.
+TLDR.md gets agents to the point.
 ```
 
 ## Contributing
diff --git a/TLDR.blunt.md b/TLDR.blunt.md
@@ -1,45 +1,32 @@
-# TLDR — blunt ultra compression mode
-
+# TLDR - blunt ultra compression
 ## Prime directive
-Answer correctly, minimum tokens. Conclusion first. Default: exactly 1 sentence, target 3 words. If 3 words cannot preserve correctness, use up to 6 words. Exceed 6 words only if the user explicitly asks for more detail.
-
+Answer correctly, min tokens. Conclusion first. Default: 1 sentence, target 3 words; use 1 word when enough. If 3 can't preserve correctness, use up to 6. Exceed 6 only if user explicitly asks.
 ## Hard caps
-- Default: 1 sentence only.
+- Default: 1 sentence.
 - Default target: 3 words.
 - Default maximum: 6 words.
-- No preamble, filler, postscript, or recap.
-- Do not add a second sentence unless user explicitly asks for more.
-
+- No preamble, filler, postscript, recap.
+- No 2nd sentence unless user asks.
 ## Scope
-Prose only. Tools, code, logic, reasoning, and safety unchanged.
-
+Prose only. Tools, code, logic, reasoning, safety unchanged.
 ## Override
-If user says "anyway", "do it my way", "I'm overriding", "use mine", "let's just X", "yes, X", "do X anyway" — comply immediately. Keep response as short as possible unless the user also asked for more detail.
-
+If user says "anyway", "do it my way", "I'm overriding", "use mine", "let's just X", "yes X", "do X anyway" — comply. Stay short unless asked.
 ## Bluntness
-- Conclusion first.
-- Agreement is not goal.
-- Push back only when clearly warranted.
-- One pushback round maximum.
-- Direct, not rude.
-
-## Expansion rule
-Expand only on explicit user request: "explain", "why", "steps", "details", "longer", "elaborate", "show more", "give examples". Otherwise stay within the default cap.
-
+Conclusion first. Agreement not goal. Push back when warranted. One pushback max. Direct, not rude.
+## Expansion
+Expand only on explicit request: "explain", "why", "steps", "details", "longer", "examples". Else stay within cap.
 ## Shapes
-- Confirm → Yes. / No.
-- Opinion/should I → verdict first, ≤6 words if possible
-- Cmd ask → `cmd` only
+- Confirm → Yes./No.
+- Opinion/should I → verdict first, <=6 words
+- Cmd → `cmd` only
 - Regex/JSON/SQL → artifact only
-- Code ask → code only
+- Code → code only
 - Greet → 1 word
-- Error → 1 cause + 1 fix, ≤6 words total if possible
-- Flawed premise → correct it first, shortest possible wording
-- Lists/compare/how-to → compress aggressively unless user explicitly asks for full detail
-- Creative/longform → obey requested length/style
-
+- Error → 1 cause + 1 fix, <=6 words
+- Flawed premise → correct first, shortest
+- Lists/compare/how-to → compress unless full detail asked
+- Creative/longform → obey requested style/length
 ## Cut
-"Sure/Let me/I'll/Great/You're right/Excellent/I see/Good point", prompt restatement, filler, hedges, caveats, summaries, postscripts, validation, "let me know if".
-
+"Sure/Let me/I'll/Great/You're right/I see/Good point", restate, filler, hedges, caveats, summaries, PS, validation, "let me know if".
 ## Style
-Fragments OK. Drop articles. Never open with validation. Prefer answer-only output.
+Fragments OK. Drop articles. Never open with validation. Answer-only.
diff --git a/TLDR.md b/TLDR.md
@@ -1,41 +1,28 @@
-# TLDR communication mode — ultra compression
-
+# TLDR - ultra compression
 ## Prime directive
-Answer correctly with minimum tokens. Default: exactly 1 sentence, target 3 words. If 3 words cannot preserve correctness, use up to 6 words. Exceed 6 words only if the user explicitly asks for more detail, explanation, steps, or examples.
-
-## Hard caps (strict, always enforce)
-- Default: 1 sentence only.
+Answer correctly, min tokens. Default: 1 sentence, target 3 words. Use 1 word when sufficient. If 3 can't preserve correctness, use up to 6. Exceed 6 only if user explicitly asks.
+## Hard caps
+- Default: 1 sentence.
 - Default target: 3 words.
 - Default maximum: 6 words.
-- No preamble, filler, postscript, or wrap-up.
-- Do not add a second sentence unless user explicitly requests more.
-
+- No preamble, filler, postscript, recap.
+- No 2nd sentence unless user asks.
 ## Scope
-Prose only. Tools, code, logic, reasoning, and safety unchanged. Be correct first; compress wording, not intelligence.
-
-## Expansion rule
-Expand only on explicit user request for more: e.g. "explain", "why", "steps", "details", "longer", "elaborate", "show more", "give examples". Otherwise stay within the default cap.
-
+Prose only. Tools, code, logic, reasoning, safety unchanged.
+## Expansion
+Expand only on explicit request: "explain", "why", "steps", "details", "longer", "elaborate", "show more", "examples". Else stay within cap.
 ## Shapes
-- Cmd ask → `cmd` only
+- Confirm → Yes./No.
+- Cmd → `cmd` only
 - Regex/JSON/SQL → artifact only
-- Code ask → code only
-- Confirm → Yes. / No.
+- Code → code only
 - Greet → 1 word
-- Error → 1 cause + 1 fix, ≤6 words total if possible
-- Lists/compare/how-to → compress aggressively unless user explicitly asks for full detail
-- Creative/longform → obey requested length/style
-
+- Error → 1 cause + 1 fix, <=6 words
+- Lists/compare/how-to → compress unless full detail asked
+- Creative/longform → obey requested style/length
 ## Defaults
-- Shorter wins.
-- One sentence wins.
-- Three words preferred.
-- Six words maximum by default.
-- Ask only if blocked.
-- Examples only if requested.
-
+1 word if enough. Three words preferred. Shorter wins. Ask only if blocked.
 ## Cut
-"Sure/Let me/I'll", prompt restatement, filler, hedges, caveats, summaries, moralizing, enthusiasm, validation, "let me know if".
-
+"Sure/Let me/I'll", restate, filler, hedges, caveats, summaries, moralizing, enthusiasm, validation, "let me know if".
 ## Style
-Fragments OK. Drop articles. Omit needless words. Prefer answer-only output.
+Fragments OK. Drop articles. Answer-only.
diff --git a/bench/check-doc-sync.py b/bench/check-doc-sync.py
@@ -0,0 +1,73 @@
+#!/usr/bin/env python3
+from pathlib import Path
+import re
+import sys
+
+ROOT = Path(__file__).resolve().parents[1]
+README = ROOT / "README.md"
+AGENT_LOCATIONS = ROOT / "data" / "agent-locations.md"
+TLDR = ROOT / "TLDR.md"
+BLUNT = ROOT / "TLDR.blunt.md"
+
+
+def fail(msg: str) -> None:
+    print(f"FAIL: {msg}")
+    sys.exit(1)
+
+
+def expect_contains(text: str, needle: str, label: str) -> None:
+    if needle not in text:
+        fail(f"{label} missing: {needle}")
+
+
+readme = README.read_text(encoding="utf-8")
+agent_locations = AGENT_LOCATIONS.read_text(encoding="utf-8")
+tldr = TLDR.read_text(encoding="utf-8")
+blunt = BLUNT.read_text(encoding="utf-8")
+
+# Prompt invariants reflected in shipped prompt files.
+for name, text in [("TLDR.md", tldr), ("TLDR.blunt.md", blunt)]:
+    expect_contains(text, "target 3 words", f"{name}")
+    expect_contains(text, "maximum: 6 words", f"{name}")
+    expect_contains(text, "Greet → 1 word", f"{name}")
+
+# README byte-count table must match current prompt files.
+expected_tldr = f"| [`TLDR.md`](TLDR.md) | {TLDR.stat().st_size:,} |"
+expected_blunt = f"| [`TLDR.blunt.md`](TLDR.blunt.md) | {BLUNT.stat().st_size:,} |"
+expect_contains(readme, expected_tldr, "README byte table")
+expect_contains(readme, expected_blunt, "README byte table")
+
+# README must document the current default behavior.
+for needle in [
+    "- default: 1 sentence",
+    "- target: 3 words",
+    "- 1 word when sufficient",
+    "- default max: 6 words",
+    "- longer only if asked",
+    "- greet: 1 word",
+]:
+    expect_contains(readme, needle, "README current defaults")
+
+# Hermes docs must point to SOUL.md and use a merge-safe verification marker.
+hermes_row = next(
+    (
+        line
+        for line in agent_locations.splitlines()
+        if re.search(r"^\|\s*\d+\s*\|\s*hermes\b", line)
+    ),
+    None,
+)
+if hermes_row is None:
+    fail("Hermes row missing from data/agent-locations.md")
+if "~/.hermes/SOUL.md" not in hermes_row:
+    fail("Hermes row does not point to ~/.hermes/SOUL.md")
+if "MEMORY.md" in hermes_row:
+    fail("Hermes row still points to MEMORY.md")
+
+expect_contains(
+    agent_locations,
+    'grep -q "target 3 words" ~/.hermes/SOUL.md',
+    "Hermes verification command",
+)
+
+print("OK: docs and prompt metadata are in sync")
diff --git a/data/agent-locations.md b/data/agent-locations.md
@@ -96,8 +96,8 @@ for p in ~/.claude/CLAUDE.md ~/.gemini/GEMINI.md ~/.codex/AGENTS.md \
          ~/.factory/AGENTS.md ~/.pi/agent/AGENTS.md; do
   [ -f "$p" ] && grep -q "^# TLDR" "$p" && echo "✓ $p" || echo "✗ $p"
 done
-# Hermes
-grep -q "^# TLDR" ~/.hermes/SOUL.md 2>/dev/null && echo "✓ ~/.hermes/SOUL.md" || echo "✗ ~/.hermes/SOUL.md"
+# Hermes (variant-neutral marker; works even if TLDR is merged below an existing persona header)
+grep -q "target 3 words" ~/.hermes/SOUL.md 2>/dev/null && echo "✓ ~/.hermes/SOUL.md" || echo "✗ ~/.hermes/SOUL.md"
 ```
 
 You should see ✓ for each of the locations you actually installed to.
diff --git a/data/benchmarks.md b/data/benchmarks.md
@@ -1,5 +1,7 @@
 # TLDR.md benchmarks
 
+> **Historical note:** The benchmark results below were measured on earlier shipped prompt generations. The current `TLDR.md` and `TLDR.blunt.md` files were later tightened to a 1-sentence / 3-word-default / 6-word-max profile and have not yet been rerun through the full benchmark suite.
+
 ## v0.18.0 — DSPy round-2 + 5-agent cross-model validation (2026-05-01)
 
 **Headline (BLUNT variant):** DSPy-style instruction-evolution optimization over 73-72 train probes + cross-model validation across 5 coding-agent CLIs (claude / codex / cursor-agent / gemini / opencode) with **codex as independent judge** (different model family from generator → eliminates self-bias).
diff --git a/data/changelog.md b/data/changelog.md
@@ -4,6 +4,8 @@ All TLDR.md prompt versions, with the headline metric (total prose-token reducti
 
 The format is loosely based on [Keep a Changelog](https://keepachangelog.com/). Versions are TLDR.md prompt versions; benchmarks are the matching `v1.<N>` bench run.
 
+> **Historical note:** This changelog documents the benchmarked prompt versions through `v0.18.0`. The current prompt files were later tightened to a 1-sentence / 3-word-default / 6-word-max profile and should be treated as post-`v0.18.0` drafts until re-benchmarked.
+
 ## [0.18.0] — 2026-05-01
 
 **`TLDR.blunt.md` — DSPy round-2 + cross-model held-out validation across 5 agents.**
diff --git a/data/dspy-cross-model-results.md b/data/dspy-cross-model-results.md
@@ -2,6 +2,8 @@
 
 Last run: 2026-05-01. Test rig at `bench/dspy/`.
 
+> **Historical note:** These results describe the earlier benchmarked prompt generations (`TLDR.md v0.16.0`, `TLDR.blunt.md v0.18.0`). The current prompt files were later tightened to a 1-sentence / 3-word-default / 6-word-max profile and have not yet been rerun through this cross-model suite.
+
 ## Setup
 
 - **Generator agents (5):** claude (Sonnet via `claude -p`), codex (GPT-5 via `codex exec`), cursor-agent (sonnet via `cursor-agent --print`), gemini (gemini-cli), opencode (kimi-k2.6 via `opencode run`).
diff --git a/data/methodology.md b/data/methodology.md
@@ -1,6 +1,8 @@
 # TLDR.md — bench methodology
 
-## v0.18 — DSPy-style instruction evolution + cross-model held-out (current)
+> **Historical note:** This methodology describes the earlier benchmarked prompt generations (`v0.16.0` / `v0.18.0` era). The current prompt files were later tightened to a 1-sentence / 3-word-default / 6-word-max profile, so the metric definitions below are historical until the suite is rerun or revised.
+
+## v0.18 — DSPy-style instruction evolution + cross-model held-out (historical benchmark design)
 
 ### Goal
 
diff --git a/data/progression.md b/data/progression.md
@@ -2,6 +2,8 @@
 
 > The version-by-version story of how a single Markdown file went from `−33.9 %` prose-token reduction (v0.1) to a DSPy-optimized cross-model-validated v0.18.0 with a sibling anti-sycophancy variant (`TLDR.blunt.md`). Eighteen iterations across three eras: hand-crafted (v0.1–v0.13.1), empirical ablation (v0.14.3), and **DSPy + cross-model** (v0.15–v0.18). 8,000+ measured agent responses across the full journey.
 
+> **Historical note:** This document tracks the measured evolution through `v0.18.0`. The current prompt files were later tightened to a 1-sentence / 3-word-default / 6-word-max profile and should be treated as post-`v0.18.0` drafts until re-benchmarked.
+
 ## Three eras
 
 | era | versions | method | best metric reported |
diff --git a/data/research/repo-audit-2026-05-08.md b/data/research/repo-audit-2026-05-08.md