Skip to content

Commit 66c8faf

Browse files
susiejojoclaude
andauthored
feat(wiki): add /suggest-next skill for knowledge-driven campaign recommendations (#275)
* feat(wiki): add /suggest-next skill for knowledge-driven campaign recommendations Add retrieval script and Claude skill that recommend next experiment framings based on accumulated cross-campaign knowledge from the registry. - scripts/retrieve_wiki_context.py: deterministic entity-scoped subgraph retrieval from per-campaign JSON files (principles, dead-ends, frontiers, interactions, cost context) - .claude/commands/suggest-next.md: skill that selects campaigns/entities from registry, runs retrieval, synthesizes scored recommendations Closes #274 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(suggest-next): add Phase E interactive campaign.yaml generation After producing scored recommendations, the skill now offers to generate schema-valid campaign.yaml files directly from the recommendations. Users select which recommendations to convert, and the generated configs are written to ~/.nous/wiki/suggestions/campaigns/ with full traceability metadata linking back to the source suggestion. Also fixes campaign/entity selection to use strict caps (3 campaigns, 6 entities) instead of ambiguous ranges, and documents /suggest-next in docs/nous-wiki.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(retrieve_wiki_context): defensive error handling for LLM-generated input Address PR review feedback: 1. load_json() now warns on stderr when a file exists but can't be parsed, instead of silently returning None (data loss visibility) 2. All dict["key"] access replaced with .get() defaults throughout entity, concept, parameter, and principle iteration (prevents KeyError crashes) 3. JSONL metrics parsing wraps each line in try/except with per-line warning, and uses .get() for all field access (handles truncated campaign files) 4. Missing campaigns emit per-campaign stderr warnings, and if zero campaigns load successfully the script exits with an error instead of producing an empty context block Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3499dbe commit 66c8faf

3 files changed

Lines changed: 777 additions & 0 deletions

File tree

.claude/commands/suggest-next.md

Lines changed: 344 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,344 @@
1+
Given a user's research intent, retrieve prior knowledge from the cross-campaign registry and recommend how to frame a new campaign.
2+
3+
## Usage
4+
5+
`/suggest-next <repo_path_or_project_name> <intent>`
6+
7+
Examples:
8+
- `/suggest-next /path/to/inference-sim "improve admission control fairness across priority bands"`
9+
- `/suggest-next inference-sim "reduce tail latency under burst workloads"`
10+
- `/suggest-next` (no arguments — list available projects and ask)
11+
12+
## Argument Parsing
13+
14+
- If `$ARGUMENTS` is empty, read `~/.nous/wiki/registry.json`, list all projects (by name and path), and ask the user which project and what their research intent is.
15+
- If `$ARGUMENTS` starts with a path (contains `/`) or matches a project name in the registry, use it as the project filter. Everything after it is the intent.
16+
- If `$ARGUMENTS` doesn't match any project, treat the entire argument as the intent and ask the user which project to use.
17+
18+
## Algorithm
19+
20+
The algorithm has five phases: **A: Retrieval** (script-driven, deterministic), **B: Synthesis** (LLM reasoning over the retrieved context), **C: Output** (write markdown), **D: Format** (file structure), and **E: Campaign Generation** (interactive YAML creation). The LLM selects what to retrieve; the script does the mechanical graph traversal and filtering.
21+
22+
---
23+
24+
### Phase A: Retrieval
25+
26+
#### A1. Load Registry and Match Project
27+
28+
Read `~/.nous/wiki/registry.json`. Find the project entry matching the user's repo path or project name:
29+
- Try exact path match against `projects` keys
30+
- Try substring match (user might give just the repo name, match against the end of each key)
31+
- Try fuzzy match against project `name` fields
32+
33+
If not found, report: "No prior knowledge for this system. Available projects:" and list them. **STOP.**
34+
35+
#### A2. Select Campaigns and Entities (LLM judgment)
36+
37+
From the matched project's registry entry, select:
38+
39+
- **Exactly 3 campaign names** (or all campaigns if fewer than 3 exist) — rank by relevance to the user's intent using `research_question`, `concepts[].name`, and `frontiers[].title`
40+
- **Exactly 6 entity names** (or all entities if fewer than 6 exist) — from the project-level `entities` array, pick those whose `name` or `aliases` relate to the user's intent. Also include entities that appear in the selected campaigns if their role is relevant.
41+
42+
#### A3. Run Retrieval Script
43+
44+
Call the retrieval script with the selected campaigns and entities:
45+
46+
```bash
47+
python scripts/retrieve_wiki_context.py \
48+
-c <campaign-1> <campaign-2> ... \
49+
-e "<Entity Name 1>" "<Entity Name 2>" ... \
50+
-i "<user's research intent>"
51+
```
52+
53+
The script:
54+
1. Builds a knowledge graph from each campaign's `concepts.json` (nodes = entities/concepts/parameters, edges = shared principles)
55+
2. Extracts the subgraph reachable from the specified entities (1-hop via principle overlap)
56+
3. Loads principles from `principles.json` — only those referenced by the subgraph
57+
4. Loads all dead-ends from `dead-ends.json`
58+
5. Loads frontiers and interactions filtered by the scoped principle IDs
59+
6. Outputs a structured context block to stdout
60+
61+
#### A4. Read Script Output
62+
63+
Capture the script's stdout. This is the **Retrieved Context** block that feeds Phase B.
64+
65+
---
66+
67+
### Phase B: Synthesis
68+
69+
Using the assembled context block from Phase A, generate **top 3 recommended campaign framings**. For each recommendation:
70+
71+
1. **Score it** on five dimensions:
72+
- **Novelty (weight 0.25):** How far is this from known dead-ends? Does it explore genuinely new territory?
73+
- **Foundation (weight 0.20):** How many scoped principles does it build upon? Stronger foundation = higher confidence.
74+
- **Impact (weight 0.25):** Based on related results, what's the estimated effect size? Prioritize high-impact experiments.
75+
- **Testability (weight 0.15):** Can this be validated in a single campaign run? Concrete, bounded experiments score higher.
76+
- **Efficiency (weight 0.15):** How cost-effective is this experiment predicted to be? Score based on:
77+
- Predicted cost relative to predicted impact (low cost + high impact = high efficiency)
78+
- Whether the experiment can reuse cached context from prior runs (cache reads reduce cost)
79+
- Whether a cheaper model configuration could work (e.g., Sonnet-only for refinement campaigns vs Opus+Sonnet for exploratory)
80+
- Fewer predicted iterations = higher efficiency
81+
82+
2. **For each recommendation, provide:**
83+
- A suggested `research_question` (1-2 sentences, phrased as a testable question)
84+
- Which entities/concepts from the context block it builds on (with brief context)
85+
- Which frontiers it addresses (by ID and title)
86+
- Which interactions it could test (by ID and title)
87+
- Which dead-ends to explicitly avoid (by ID and brief reason)
88+
- Score breakdown (Novelty/Foundation/Impact/Testability/Efficiency + weighted total)
89+
- Predicted cost (iterations × cost/iter, with basis for estimate)
90+
- Suggested model configuration (which models for design/execute phases, with rationale)
91+
92+
### Phase C: Output File
93+
94+
Write the full recommendation to a markdown file at:
95+
96+
```
97+
~/.nous/wiki/suggestions/<YYYY-MM-DD>-<slugified-intent>.md
98+
```
99+
100+
- Create the `~/.nous/wiki/suggestions/` directory if it doesn't exist.
101+
- Slugify the intent: lowercase, replace spaces with `-`, strip non-alphanumeric characters, truncate to 50 chars.
102+
- If the file already exists (same date + intent), append a numeric suffix (`-2`, `-3`, etc.).
103+
104+
After writing the file, print a short summary to the terminal:
105+
106+
```
107+
Wrote: ~/.nous/wiki/suggestions/<filename>.md
108+
109+
Top recommendations:
110+
1. <title> — score: <total>/1.0
111+
2. <title> — score: <total>/1.0
112+
3. <title> — score: <total>/1.0
113+
```
114+
115+
### Phase D: File Format
116+
117+
The markdown file should follow this structure. The scoring table is **required** for every recommendation — it is the primary decision-making artifact.
118+
119+
```markdown
120+
# Suggest-Next: <project name>
121+
122+
**Date:** <YYYY-MM-DD>
123+
**Research intent:** "<user's intent>"
124+
**Prior campaigns:** <count>
125+
**Total confirmed principles:** <count>
126+
**Campaigns consulted:** <comma-separated names>
127+
**Entities scoped:** <comma-separated names>
128+
129+
---
130+
131+
## Scoring Summary
132+
133+
| # | Recommendation | Novelty | Foundation | Impact | Testability | Efficiency | **Total** |
134+
|---|---------------|---------|-----------|--------|-------------|------------|-----------|
135+
| 1 | <short title> | X.XX | X.XX | X.XX | X.XX | X.XX | **X.XX** |
136+
| 2 | <short title> | X.XX | X.XX | X.XX | X.XX | X.XX | **X.XX** |
137+
| 3 | <short title> | X.XX | X.XX | X.XX | X.XX | X.XX | **X.XX** |
138+
139+
*Weights: Novelty 0.25, Foundation 0.20, Impact 0.25, Testability 0.15, Efficiency 0.15*
140+
141+
---
142+
143+
## Recommendation 1: <short title>
144+
145+
**Suggested research question:**
146+
> <1-2 sentence testable question>
147+
148+
### Score Breakdown
149+
150+
**Weighted total: X.XX/1.0**
151+
152+
| Dimension | Weight | Score | Rationale |
153+
|-------------|--------|-------|-----------|
154+
| Novelty | 0.25 | X.XX | <brief — what makes this novel or not> |
155+
| Foundation | 0.20 | X.XX | <brief — which principles it builds on> |
156+
| Impact | 0.25 | X.XX | <brief — expected effect size and why> |
157+
| Testability | 0.15 | X.XX | <brief — how bounded/measurable it is> |
158+
| Efficiency | 0.15 | X.XX | <brief — cost/impact ratio reasoning> |
159+
160+
### Builds on
161+
- <Entity/Concept name> — <how it's relevant>
162+
- ...
163+
164+
### Addresses frontiers
165+
- F-N: <title> — <how this experiment would push the boundary>
166+
- ...
167+
168+
### Tests interactions
169+
- I-N: <title> — <what combining these would reveal>
170+
- ...
171+
172+
### Avoid (dead-ends)
173+
- DE-N: <title> — <why this failed before>
174+
- ...
175+
176+
### Predicted cost
177+
178+
| Metric | Estimate | Basis |
179+
|--------|----------|-------|
180+
| Iterations | N-M | <reasoning: refinement/exploratory, builds on N principles, etc.> |
181+
| Cost/iter | ~$X.XX | Project historical average (adjusted if applicable) |
182+
| Total | $XX-YY | iterations × cost/iter |
183+
| Duration | ~Xh | Based on avg duration/iter from similar campaigns |
184+
185+
### Model configuration
186+
- Design phase: <model> (<rationale>)
187+
- Execute phase: <model> (<rationale>)
188+
- Alternative: <cheaper/costlier option with savings estimate>
189+
190+
**Efficiency note:** <1 sentence on why this cost is justified relative to expected impact>
191+
192+
---
193+
194+
## Recommendation 2: <short title>
195+
196+
<same structure as Recommendation 1>
197+
198+
---
199+
200+
## Recommendation 3: <short title>
201+
202+
<same structure as Recommendation 1>
203+
204+
---
205+
206+
## Next Steps
207+
208+
To start a campaign from these recommendations, use the interactive generator below or manually:
209+
1. Select recommendations to generate `campaign.yaml` files (Phase E prompt follows)
210+
2. Review and adjust the generated config if needed
211+
3. Run: `nous run <path-to-campaign.yaml>`
212+
4. After completion, run `/post-campaign` to feed results back into the registry
213+
```
214+
215+
### Phase E: Interactive Campaign Generation
216+
217+
After printing the terminal summary (end of Phase C), offer to generate executable `campaign.yaml` files from the recommendations.
218+
219+
#### E1. Ask the User
220+
221+
Use AskUserQuestion to present choices:
222+
223+
**Question:** "Which recommendations would you like to generate campaign.yaml files for?"
224+
225+
**Options:**
226+
- "1" — Generate for recommendation 1 only
227+
- "2" — Generate for recommendation 2 only
228+
- "3" — Generate for recommendation 3 only
229+
- "All" — Generate for all recommendations
230+
- "None" — Skip campaign generation
231+
232+
Allow multi-select (the user can pick e.g. "1" and "3").
233+
234+
If the user selects "None", print `No campaigns generated.` and **STOP**.
235+
236+
#### E2. Generate campaign.yaml for Each Selected Recommendation
237+
238+
For each selected recommendation, produce a YAML document with these field mappings:
239+
240+
| campaign.yaml field | Source |
241+
|---|---|
242+
| `research_question` | Recommendation's suggested research question (verbatim from the `> <question>` block) |
243+
| `run_id` | Slugified recommendation title (lowercase, hyphens, ≤50 chars) |
244+
| `max_iterations` | Upper bound from the "Iterations" row in the Predicted cost table (e.g., "6-8" → 8) |
245+
| `target_system.name` | From registry `projects[key].name` |
246+
| `target_system.description` | Synthesized from registry project description + recommendation context |
247+
| `target_system.repo_path` | The project key (path) from the registry |
248+
| `target_system.observable_metrics` | Inferred from recommendation's Impact rationale (omit field entirely if not confidently inferable) |
249+
| `target_system.controllable_knobs` | Parameter names from "Builds on" section (omit field entirely if not confidently inferable) |
250+
| `prompts.methodology_layer` | `"prompts/methodology"` (standard default) |
251+
| `prompts.domain_adapter_layer` | `null` |
252+
| `models.design` | From recommendation's "Model configuration → Design phase" model name |
253+
| `models.execute_analyze` | From recommendation's "Model configuration → Execute phase" model name |
254+
| `metadata` | Traceability block (see E3) |
255+
256+
**Schema compliance rules:**
257+
- Do NOT include any fields not in `orchestrator/schemas/campaign.schema.yaml`
258+
- Root object: only `research_question`, `run_id`, `max_iterations`, `target_system`, `prompts`, `models`, `metadata`
259+
- `target_system`: only `name`, `description`, `repo_path`, `observable_metrics`, `controllable_knobs`, `live_target`
260+
- `prompts`: only `methodology_layer`, `domain_adapter_layer`
261+
- `models`: only `design`, `execute_analyze`, `report`
262+
- Omit optional fields rather than including empty values
263+
- Model values default: `claude-opus-4-6` (design), `claude-sonnet-4-6` (execute_analyze)
264+
265+
#### E3. Metadata Traceability Block
266+
267+
Include a `metadata` section for provenance tracking:
268+
269+
```yaml
270+
metadata:
271+
source_suggestion: "<YYYY-MM-DD>-<slug>.md"
272+
recommendation_rank: <1|2|3>
273+
research_intent: "<user's original intent verbatim>"
274+
builds_on_frontiers: ["F-1", "F-3"]
275+
tests_interactions: ["I-2"]
276+
avoids_dead_ends: ["DE-1", "DE-4"]
277+
foundation_principles: ["RP-5", "RP-12"]
278+
composite_score: 0.XX
279+
```
280+
281+
- Use the actual IDs from the recommendation's "Addresses frontiers", "Tests interactions", "Avoid (dead-ends)" sections
282+
- `foundation_principles`: principle IDs referenced in the Foundation score rationale
283+
- `composite_score`: the weighted total from the scoring table
284+
285+
#### E4. Write Files
286+
287+
Write each generated YAML to:
288+
289+
```
290+
~/.nous/wiki/suggestions/campaigns/<YYYY-MM-DD>-<slugified-intent>-<N>.yaml
291+
```
292+
293+
Where `<N>` is the recommendation number (1, 2, or 3).
294+
295+
- The `<YYYY-MM-DD>-<slugified-intent>` prefix matches the suggestion markdown filename (without `.md`)
296+
- If the file already exists, append a numeric suffix before `.yaml` (e.g., `-1-2.yaml`)
297+
- Create `~/.nous/wiki/suggestions/campaigns/` if it doesn't exist
298+
299+
#### E5. Print Execution Instructions
300+
301+
After writing all campaign files, print:
302+
303+
```
304+
Generated campaign files:
305+
<N>. ~/.nous/wiki/suggestions/campaigns/<filename>.yaml
306+
Run: nous run <full-path>
307+
308+
...
309+
```
310+
311+
Example:
312+
```
313+
Generated campaign files:
314+
1. ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-1.yaml
315+
Run: nous run ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-1.yaml
316+
3. ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-3.yaml
317+
Run: nous run ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-3.yaml
318+
```
319+
320+
---
321+
322+
## Model Configuration Guidance
323+
324+
When suggesting models for a recommendation, use the **Cost Context** section from the retrieved context and apply these heuristics:
325+
326+
- **Opus design + Sonnet execute** (default): For campaigns exploring new territory, combining multiple approaches, or where the design phase needs to reason about complex interactions. Historical cost: ~$5.50-6.00/iter.
327+
- **Sonnet design + Sonnet execute** (cheaper, ~45% savings): For campaigns that are narrow refinements of known-good configurations — the design space is well-constrained by prior principles. Historical cost: ~$3.00-3.50/iter estimate.
328+
- **Opus both** (expensive, ~80% increase): Only for campaigns that need deep analysis in the execute phase (e.g., debugging subtle failures where Sonnet might miss root causes). Historical cost: ~$10-11/iter estimate.
329+
330+
Iteration count heuristics:
331+
- **Refinement** (builds on 3+ confirmed principles, narrow scope): 4-6 iterations
332+
- **Exploratory** (new territory, tests interactions, <2 confirmed principles to build on): 8-12 iterations
333+
- **Standard** (mix of known and new): 6-8 iterations
334+
335+
## Important Rules
336+
337+
- This skill **writes files only to `~/.nous/wiki/suggestions/`** — the suggestion markdown at the top level, and optionally campaign YAML files in the `campaigns/` subdirectory. It never modifies registry files, campaign data, or any other existing files.
338+
- All reasoning happens in-context using the LLM's judgment — no external scripts beyond `retrieve_wiki_context.py`.
339+
- If the registry is empty or the project has no campaigns, say so clearly and suggest the user run their first campaign manually.
340+
- Always ground recommendations in specific prior data (principle IDs, frontier IDs, dead-end IDs). Never hallucinate IDs that don't exist in the loaded files.
341+
- Keep recommendations actionable — each should be concrete enough to immediately write a `campaign.yaml` from.
342+
- Prefer recommendations that combine insights from multiple campaigns over those that just extend a single campaign.
343+
- Always use the Cost Context section to ground cost predictions in real data — never invent cost numbers without historical basis.
344+
- **Scoring transparency is non-negotiable** — every recommendation must include its full score breakdown table with per-dimension rationale. The summary table at the top lets users compare at a glance.

0 commit comments

Comments
 (0)