1.1.1: Treat skill names as illustrative, allow non-binding hints in spec §8

Jeehut · Jeehut · commit 45dc0cde633d · 2026-04-20T14:27:24.000+02:00
Hardcoded skill names (macos-peekaboo, macos-accessibility-ids, and
similar) across /tandemkit:init and the ApplePlatform strategy read as
requirements that every user had them installed. Skill names and
availability vary per project and setup — every reference is now
framed as "scan ~/.claude/skills/ for what's installed, here's a
common name as an example".

The Planner's blanket ban on skill references in specs is now a
mandate-vs-suggestion distinction. Mandates in the spec body remain
banned; §8 "Possible Directions &amp; Ideas" is explicitly allowed to
surface skills worth considering, files worth reading, and tactical
hints, as starting points the Generator can ignore. The Generator's
loop is updated to treat §8 suggestions as non-contractual.
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "tandemkit",
-  "version": "1.1.0",
+  "version": "1.1.1",
   "description": "Describe your goal, approve the spec, then step away — Claude and Codex loop together until it's right.",
   "author": {
     "name": "Cihat Gündüz",
diff --git a/commands/init.md b/commands/init.md
@@ -153,12 +153,12 @@ peekaboo daemon start
 peekaboo daemon status
 ```
 
-**Tell the user about the companion skills** (both are user-level and reusable across any macOS/iOS project):
+**Check for companion skills the user may have** at `~/.claude/skills/`. Names and availability vary — examples of skills that pair well with Peekaboo workflows:
 
-- `macos-peekaboo` — full Peekaboo usage catalog (see → act → verify loop, gotchas, troubleshooting).
-- `macos-accessibility-ids` — how to add `.accessibilityIdentifier(…)` / `setAccessibilityIdentifier(_:)` so `peekaboo see` returns fast AX trees instead of hanging at its 25-second cap.
+- A Peekaboo-usage skill (one common name is `macos-peekaboo`) — full Peekaboo usage catalog (see → act → verify loop, gotchas, troubleshooting).
+- An accessibility-identifiers skill (one common name is `macos-accessibility-ids`) — how to add `.accessibilityIdentifier(…)` / `setAccessibilityIdentifier(_:)` so `peekaboo see` returns fast AX trees instead of hanging at its 25-second cap.
 
-Both skills live at `~/.claude/skills/` if already installed. If they're missing, they're documented in `ApplePlatform.md` — you can create them later or skip for now; Peekaboo still works, you just won't have inline guidance.
+These names are illustrative — scan `~/.claude/skills/` and surface whatever exists. If nothing similar is installed, the CLI still works directly; inline guidance is just nice-to-have. The Evaluator strategy in `ApplePlatform.md` covers the technique fundamentals regardless of which skills are present.
 
 Ask the user to come back and say "done" when Peekaboo is verified so you can proceed.
 
@@ -367,7 +367,16 @@ Populate each with project-specific context. Key requirements:
 - **For macOS apps:** primary UI verification is **Peekaboo CLI** on the running app (screenshot, AX tree, click, type, menu navigation). `#Preview` + Xcode MCP `RenderPreview` is still the fastest path for isolated view rendering. Forbid `mcp__computer-use__*` on macOS (unreliable on Tahoe). Include Peekaboo command examples in the role file.
 - **For macOS apps with backends (ASC, DB, API):** verify backend side-effects via the app's CLI (e.g., `asc`, `psql`) — do not trust the UI alone.
 
-**Generator.md should reference existing project skills** that are relevant (e.g., "Load `swift-code-context` before writing Swift code", "Load `swiftui-code-context` for SwiftUI work", "Use `review-swift-changes` for validation"). **For macOS apps, reference `macos-peekaboo` (runtime UI automation) and `macos-accessibility-ids` (so new/touched SwiftUI/AppKit views are automatable by Peekaboo and XCUITest out of the box).**
+**Generator.md should reference existing project skills** that are relevant. The specific skills depend entirely on what exists in the project's `.claude/skills/` directory — scan that directory and name the ones that apply. Typical categories to look for (names and availability vary by project):
+
+- A style-guide skill for the project's primary language (e.g., something like `<language>-code-context` or `<language>-style-guide`)
+- Framework-specific skills (e.g., a UI-framework style guide if the project has one)
+- A testing-conventions skill if the project documents one
+- A final-review / linter skill if the project has one (e.g., something named like `review-<language>-changes`)
+
+For macOS apps specifically, the project may have skills for runtime UI automation (e.g., something like a Peekaboo CLI wrapper) and accessibility-identifier authoring (so new SwiftUI/AppKit views are automatable). Name whatever the project actually has — do not invent or assume skill names.
+
+Example phrasing in Generator.md (adjust names to match reality): "Load `<project-style-skill>` before writing <language> code" / "Use `<project-review-skill>` for validation at end of every round".
 
 **Generator.md AND Evaluator.md must both include a top-of-file reminder** pointing at the Signal Protocol in SKILL.md. Suggested exact wording (add verbatim at the top of each file, under the first heading):
 
diff --git a/skills/evaluator/strategies/ApplePlatform.md b/skills/evaluator/strategies/ApplePlatform.md
@@ -132,7 +132,7 @@ peekaboo see --app "MyApp" --window-id <id> --json  # re-capture to verify
 
 #### Accessibility identifiers are a hard prerequisite for `see`
 
-On SwiftUI apps without `.accessibilityIdentifier(…)` coverage, `peekaboo see` hangs at a 25-second hard cap walking unnamed descendants. **With identifiers the AX walk completes in ~1 second.** Adding identifiers is part of the Generator's responsibility when it touches interactive views — see the `macos-accessibility-ids` skill for naming conventions and the AppKit/UIKit equivalents (`setAccessibilityIdentifier(_:)` and `view.accessibilityIdentifier = …`).
+On SwiftUI apps without `.accessibilityIdentifier(…)` coverage, `peekaboo see` hangs at a 25-second hard cap walking unnamed descendants. **With identifiers the AX walk completes in ~1 second.** Adding identifiers is part of the Generator's responsibility when it touches interactive views. If your Claude setup has a convenience skill for accessibility-identifier conventions (e.g. one named `macos-accessibility-ids`, but actual names vary — scan `~/.claude/skills/`), load it; otherwise follow the standard AppKit/UIKit equivalents (`.accessibilityIdentifier(_:)` on SwiftUI, `setAccessibilityIdentifier(_:)` on AppKit, `view.accessibilityIdentifier = …` on UIKit).
 
 **Fallback for apps without identifiers (or non-native apps like Electron):**
 1. `peekaboo image` for the screenshot.
@@ -146,12 +146,12 @@ On SwiftUI apps without `.accessibilityIdentifier(…)` coverage, `peekaboo see`
 - Fuzzy `click "label"` can match across apps — always scope with `--window-title` or `--window-id`, or prefer menu-path clicks.
 - Focus stealing: every shell call reactivates the terminal as frontmost. Chain multi-step flows in a single `peekaboo run <script.peekaboo.json>` script, or re-focus the target before every click (`peekaboo app switch --to X --verify`).
 
-Full pattern catalog and troubleshooting: **`macos-peekaboo` skill** (user-level, installed at `~/.claude/skills/macos-peekaboo/`).
+Full pattern catalog and troubleshooting lives in the Peekaboo CLI's own docs (`peekaboo --help`) and optionally in a user-level convenience skill if your Claude setup has one (a common name is `macos-peekaboo` at `~/.claude/skills/macos-peekaboo/`, but actual names vary — scan `~/.claude/skills/` to see what's installed).
 
 #### When Peekaboo is NOT enough
 
 - **SwiftUI `#Preview` screenshots** via Xcode MCP `RenderPreview` are still the fastest verification for pure view rendering — use them first when the change is isolated to a single view's appearance. Peekaboo is for end-to-end flows against the running app.
-- **AppKit with no identifiers** — add them (`setAccessibilityIdentifier(_:)`) per the `macos-accessibility-ids` skill; `see` works on AppKit the moment identifiers are present.
+- **AppKit with no identifiers** — add them (`setAccessibilityIdentifier(_:)`); `see` works on AppKit the moment identifiers are present. If your setup has a helper skill for accessibility-identifier conventions (example name: `macos-accessibility-ids`), it offers faster naming guidance — but the API calls are standard AppKit regardless.
 - **Electron/Chromium-based apps** — AX tree is opaque; fall back to menu paths, hotkeys, and coordinate clicks. Check if the app has an official CLI/extension API first.
 
 ## Evaluation Checklist
@@ -171,7 +171,7 @@ Full pattern catalog and troubleshooting: **`macos-peekaboo` skill** (user-level
 
 ### When the Mission Involves UI (macOS)
 5m. **Build and launch** via `xcodebuildmcp macos build` then `xcodebuildmcp macos launch --app-path …` (or `peekaboo app launch`)
-6m. **Read the accessibility tree** via `peekaboo see --app X --window-id <id> --json --timeout-seconds 30` (requires `.accessibilityIdentifier` coverage on interactive views — if missing, add them per the `macos-accessibility-ids` skill)
+6m. **Read the accessibility tree** via `peekaboo see --app X --window-id <id> --json --timeout-seconds 30` (requires `.accessibilityIdentifier` coverage on interactive views — if missing, add them to the relevant views; your Claude setup may have a helper skill for naming conventions, e.g. something named `macos-accessibility-ids`, if not the standard SwiftUI/AppKit APIs work directly)
 7m. **Take screenshots** via `peekaboo image --app X --window-id <id> --path …`
 8m. **Test interaction flows** via `peekaboo click --on <id> --snapshot <snap>`, `peekaboo type`, `peekaboo hotkey`, `peekaboo menu click --path "…"`
 9m. **Verify backend side-effects via the app's own CLI/API** when the UI writes to a remote service (e.g., `asc` CLI for App Store Connect writes, database reads for DB writes) — do not trust the UI alone
@@ -225,7 +225,7 @@ During init, create `TandemKit/Evaluator.md` with:
 - Click / type: `peekaboo click --on <id> --snapshot <snap>`, `peekaboo type "…" --clear`
 - Menu navigation: `peekaboo menu click --app "[AppName]" --path "File > Save"`
 - Hotkeys: `peekaboo hotkey --keys "cmd,s"`
-- **Adding identifiers**: see the `macos-accessibility-ids` skill — unlocks fast `see` and stable targeting.
+- **Adding identifiers**: adding `.accessibilityIdentifier(_:)` (SwiftUI) or `setAccessibilityIdentifier(_:)` (AppKit) unlocks fast `see` and stable targeting. If your Claude setup has a convenience skill for naming conventions (e.g. one named like `macos-accessibility-ids`), it's worth a scan; otherwise the API calls are standard.
 - **DO NOT use `mcp__computer-use__*` on macOS** — unreliable on Tahoe.
 
 ### Shared
diff --git a/skills/generator/SKILL.md b/skills/generator/SKILL.md
@@ -101,7 +101,7 @@ The user invokes this skill with `/tandemkit:generator NNN-MissionName`. First r
 1. Read `TandemKit/Config.json` — verify the mission exists and is current
 2. **Read `TandemKit/Generator.md`** for project-specific context — this is mandatory, do not skip
 3. Read the mission's `Spec.md` — this is your source of truth
-4. **Scan `.claude/skills/` for skills relevant to this mission's topic.** List the skill names and descriptions. Load any that seem related — they may contain domain knowledge, conventions, or validation rules critical for correct implementation. If the Spec mentions specific skills, load those too.
+4. **Scan `.claude/skills/` for skills relevant to this mission's topic.** List the skill names and descriptions. Load any that seem related — they may contain domain knowledge, conventions, or validation rules critical for correct implementation. If the Spec's §8 "Possible Directions & Ideas" (or a similarly-named "Context the Generator Might Find Useful" section) lists suggested skills, **treat those as starting points, not contracts** — load what seems relevant to the approach you choose, ignore what doesn't match. A suggestion in that section is never a pass/fail criterion; only Acceptance Criteria and Scope are binding.
 5. Read `State.json`. If `phase` is `"ready-for-execution"` or `"planning"`:
    a. Check `evaluatorStatus`. If already `"watching"` → update State.json: `generatorStatus: "working"`, `phase: "generation"`. Proceed to step 6.
    b. If `evaluatorStatus` is `null` → the Evaluator is not ready yet. Update State.json: `generatorStatus: "researching"`. You may read files, investigate the codebase, and prepare — but do NOT create or modify implementation files.
diff --git a/skills/planner/SKILL.md b/skills/planner/SKILL.md
@@ -330,7 +330,7 @@ This section is the **most important rule** for keeping the spec requirement-foc
 2. **No complete code blocks** (full functions, full files, full type definitions). A 5-line snippet showing an exact API contract or a tricky edge case is fine. A 50-line block of "here's how the function should look" is not.
 3. **No step-by-step implementation procedures.** "First call `foo()`, then construct `Bar` with these args, then call `baz(...)` inside a `try` block" is HOW. Replace with WHAT: "The created entity must be visible to existing read tools and respect exclusion rules" — and let the Generator figure out the call sequence by reading the code you pointed to.
 4. **No acceptance criteria that prescribe implementation order or specific function calls.** AC is about observable outcomes. (See "What Makes a Good Acceptance Criterion" below.)
-5. **No "Style Guide Reminder" section telling the Generator which skills to load.** The Generator already has its own role file (`TandemKit/Generator.md`) that specifies which skills to load. The spec should not duplicate that or push role-specific instructions.
+5. **No "Style Guide Reminder" / "Skills to Load" section as a mandate.** The Generator already has its own role file (`TandemKit/Generator.md`) that specifies which skills to load. The spec should not duplicate that or push role-specific instructions as requirements. **Allowed exception — non-binding suggestions:** skill names MAY appear in the spec's §8 "Possible Directions & Ideas" (or a similarly-named "Context the Generator Might Find Useful" section) when framed as starting points the Generator can ignore. The distinction is mandate vs. suggestion: "the Generator MUST load `<skill-name>`" is banned as a contract; "`<skill-name>` is worth considering when writing `<feature>`" in the non-binding section is fine as a suggestion. Specific skill names are examples of what could be relevant — actual skills depend on the project. The binding WHAT/WHY stays in Acceptance Criteria + Scope.
 6. **No transcribed file contents.** If `auth_handler.py:42-78` is relevant, write that path and one sentence about WHY it's relevant. Don't paste 50 lines of that file into the spec.
 
 **The spec IS:**
diff --git a/skills/planner/templates/Spec-Format.md b/skills/planner/templates/Spec-Format.md
@@ -14,7 +14,7 @@ This rule is the difference between a 100-line spec the Generator can execute ag
 2. **No complete code blocks** — no full function bodies, full type definitions, full file contents. A 1–5 line snippet that pins down an exact API contract or shows a tricky edge case is fine. A 20-line block of "here's how the function should look" is not.
 3. **No step-by-step implementation procedures.** "First call `foo()`, then construct `Bar`, then call `baz()` inside a `try` block" is HOW. Replace with WHAT: "the operation must be atomic and respect existing exclusion rules" — and let the Generator find the call sequence by reading the file you pointed to.
 4. **No transcribed file contents.** If `auth_handler.py:42-78` is relevant, write that path and one sentence about WHY. Don't paste the 50 lines into the spec — the Generator will read the file.
-5. **No "Style Guide Reminder" / "Skills to Load" / "Generator MUST..." sections.** Role-specific instructions belong in `TandemKit/Generator.md`, not the spec. The spec is mission-specific; role files are role-specific.
+5. **No "Style Guide Reminder" / "Skills to Load" / "Generator MUST..." sections as mandates.** Role-specific *instructions* belong in `TandemKit/Generator.md`, not the spec. The spec is mission-specific; role files are role-specific. **Allowed exception — non-binding suggestions:** skill names and reference files that may be relevant MAY appear in §8 "Possible Directions & Ideas" (or a similarly-named "Context the Generator Might Find Useful" section), provided they are clearly framed as starting points the Generator can ignore. The distinction is mandate vs. suggestion: *"The Generator MUST load `<style-guide-skill>`"* is a mandate (banned in the spec body). *"`<style-guide-skill>` is worth considering when writing the <relevant feature>"* in a non-binding section is a suggestion (allowed). Whatever example skill names appear in this template are illustrative placeholders — actual skills vary by project. The binding WHAT/WHY stays in Acceptance Criteria + Scope; skill and file hints stay explicitly non-binding.
 6. **No acceptance criteria that prescribe implementation order or specific function calls.** ACs are about observable outcomes. (See §4 below for examples.)
 
 ### When implementation detail IS acceptable (rare exceptions)
@@ -216,16 +216,33 @@ Explicit boundaries. The Generator must NOT implement these. The Evaluator must
 
 ### 8. Possible Directions & Ideas (Optional)
 
-Soft suggestions from the Planner. Non-binding. The Generator can take these or ignore them.
+Soft suggestions from the Planner. Non-binding. The Generator can take these or ignore them. This is also the right place for **skill hints** and **reference files** — content that doesn't fit as a requirement but would save the Generator time if surfaced. Frame everything as "worth considering", not "must do".
+
+Typical contents (all optional, all non-binding — use whichever sub-headings fit the mission):
+- **Skills worth considering** — name the style-guide, testing, or domain skills that exist in this project and may help. The Generator decides whether to load them. Actual skill names depend on the project.
+- **Files and folders worth reading** — documentation, prior missions, adjacent patterns.
+- **Protocol or RFC references** — when the spec references a standard, point at it.
+- **Tactical hints** — possible implementation directions, testing approaches, edge cases to be aware of.
+- **Naming notes** — suggestions for tool names, branch names, enum values, with alternatives when any is fine.
+- **Suggested milestones** — a possible implementation sequence.
+
+Example — in a hypothetical auth-API mission:
 
 ```markdown
 ## Possible Directions & Ideas
 
-- Consider using the existing `RateLimiter.swift` middleware pattern as
-  a template for the auth middleware
+**Skills worth considering**
+- `<testing-skill-name>` — patterns for the test suite
+- `<api-style-skill-name>` — HTTP handler conventions
+(replace with the skills that actually exist in your project)
+
+**Files worth reading**
+- `Sources/Middleware/RateLimiter.swift` — reference pattern for the new middleware
+- `Documentation/Toolbox.md` — reusable solutions
+
+**Tactical hints**
 - The `swift-jwt` library already in the project supports RS256 natively
-- A `TokenService` actor might be a clean way to encapsulate token
-  creation/validation/rotation logic
+- A `TokenService` actor might cleanly encapsulate token lifecycle
 
 **Suggested milestones for the Generator:**
 1. Data model and token service
@@ -234,6 +251,8 @@ Soft suggestions from the Planner. Non-binding. The Generator can take these or
 4. Tests
 ```
 
+**Key framing**: every item in this section is a starting point. If the Generator finds a better approach by reading the codebase, taking that better approach is not a spec violation — the binding content is only what's in Acceptance Criteria, Scope, and Edge Cases. Skill names shown in examples above are placeholders; actual skill names depend on what exists in the project's `.claude/skills/` directory.
+
 ## Principles
 
 1. **Constrain deliverables, not implementation** — specify WHAT and WHY, never HOW. Re-read "The One Rule" at the top of this file before drafting any section.
@@ -252,6 +271,6 @@ Most well-scoped missions produce specs in the **150–400 line** range. If your
 - An "Implementation Sketch" section (delete it entirely)
 - Acceptance criteria that prescribe call sequences (rewrite as observable outcomes)
 - Transcribed file contents in the Context section (replace with file path + 1 sentence)
-- A "Style Guide" or "Skills to Load" section (delete — that belongs in `TandemKit/Generator.md`, not the spec)
+- A "Style Guide" or mandatory "Skills to Load" section in the spec body (delete — that belongs in `TandemKit/Generator.md`, not the spec). Non-binding skill/file *suggestions* are fine in §8 "Possible Directions & Ideas" — the distinction is mandate vs. suggestion, not whether skills can be mentioned at all.
 
 Before finalizing: scan your spec for any code block > 5 lines. For each one, ask "could the Generator have written this themselves after reading the codebase?" If yes — delete it.

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "tandemkit",`
`3`		`- "version": "1.1.0",`
	`3`	`+ "version": "1.1.1",`
`4`	`4`	`"description": "Describe your goal, approve the spec, then step away — Claude and Codex loop together until it's right.",`
`5`	`5`	`"author": {`
`6`	`6`	`"name": "Cihat Gündüz",`