|
| 1 | +Given a user's research intent, retrieve prior knowledge from the cross-campaign registry and recommend how to frame a new campaign. |
| 2 | + |
| 3 | +## Usage |
| 4 | + |
| 5 | +`/suggest-next <repo_path_or_project_name> <intent>` |
| 6 | + |
| 7 | +Examples: |
| 8 | +- `/suggest-next /path/to/inference-sim "improve admission control fairness across priority bands"` |
| 9 | +- `/suggest-next inference-sim "reduce tail latency under burst workloads"` |
| 10 | +- `/suggest-next` (no arguments — list available projects and ask) |
| 11 | + |
| 12 | +## Argument Parsing |
| 13 | + |
| 14 | +- If `$ARGUMENTS` is empty, read `~/.nous/wiki/registry.json`, list all projects (by name and path), and ask the user which project and what their research intent is. |
| 15 | +- If `$ARGUMENTS` starts with a path (contains `/`) or matches a project name in the registry, use it as the project filter. Everything after it is the intent. |
| 16 | +- If `$ARGUMENTS` doesn't match any project, treat the entire argument as the intent and ask the user which project to use. |
| 17 | + |
| 18 | +## Algorithm |
| 19 | + |
| 20 | +The algorithm has five phases: **A: Retrieval** (script-driven, deterministic), **B: Synthesis** (LLM reasoning over the retrieved context), **C: Output** (write markdown), **D: Format** (file structure), and **E: Campaign Generation** (interactive YAML creation). The LLM selects what to retrieve; the script does the mechanical graph traversal and filtering. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +### Phase A: Retrieval |
| 25 | + |
| 26 | +#### A1. Load Registry and Match Project |
| 27 | + |
| 28 | +Read `~/.nous/wiki/registry.json`. Find the project entry matching the user's repo path or project name: |
| 29 | +- Try exact path match against `projects` keys |
| 30 | +- Try substring match (user might give just the repo name, match against the end of each key) |
| 31 | +- Try fuzzy match against project `name` fields |
| 32 | + |
| 33 | +If not found, report: "No prior knowledge for this system. Available projects:" and list them. **STOP.** |
| 34 | + |
| 35 | +#### A2. Select Campaigns and Entities (LLM judgment) |
| 36 | + |
| 37 | +From the matched project's registry entry, select: |
| 38 | + |
| 39 | +- **Exactly 3 campaign names** (or all campaigns if fewer than 3 exist) — rank by relevance to the user's intent using `research_question`, `concepts[].name`, and `frontiers[].title` |
| 40 | +- **Exactly 6 entity names** (or all entities if fewer than 6 exist) — from the project-level `entities` array, pick those whose `name` or `aliases` relate to the user's intent. Also include entities that appear in the selected campaigns if their role is relevant. |
| 41 | + |
| 42 | +#### A3. Run Retrieval Script |
| 43 | + |
| 44 | +Call the retrieval script with the selected campaigns and entities: |
| 45 | + |
| 46 | +```bash |
| 47 | +python scripts/retrieve_wiki_context.py \ |
| 48 | + -c <campaign-1> <campaign-2> ... \ |
| 49 | + -e "<Entity Name 1>" "<Entity Name 2>" ... \ |
| 50 | + -i "<user's research intent>" |
| 51 | +``` |
| 52 | + |
| 53 | +The script: |
| 54 | +1. Builds a knowledge graph from each campaign's `concepts.json` (nodes = entities/concepts/parameters, edges = shared principles) |
| 55 | +2. Extracts the subgraph reachable from the specified entities (1-hop via principle overlap) |
| 56 | +3. Loads principles from `principles.json` — only those referenced by the subgraph |
| 57 | +4. Loads all dead-ends from `dead-ends.json` |
| 58 | +5. Loads frontiers and interactions filtered by the scoped principle IDs |
| 59 | +6. Outputs a structured context block to stdout |
| 60 | + |
| 61 | +#### A4. Read Script Output |
| 62 | + |
| 63 | +Capture the script's stdout. This is the **Retrieved Context** block that feeds Phase B. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +### Phase B: Synthesis |
| 68 | + |
| 69 | +Using the assembled context block from Phase A, generate **top 3 recommended campaign framings**. For each recommendation: |
| 70 | + |
| 71 | +1. **Score it** on five dimensions: |
| 72 | + - **Novelty (weight 0.25):** How far is this from known dead-ends? Does it explore genuinely new territory? |
| 73 | + - **Foundation (weight 0.20):** How many scoped principles does it build upon? Stronger foundation = higher confidence. |
| 74 | + - **Impact (weight 0.25):** Based on related results, what's the estimated effect size? Prioritize high-impact experiments. |
| 75 | + - **Testability (weight 0.15):** Can this be validated in a single campaign run? Concrete, bounded experiments score higher. |
| 76 | + - **Efficiency (weight 0.15):** How cost-effective is this experiment predicted to be? Score based on: |
| 77 | + - Predicted cost relative to predicted impact (low cost + high impact = high efficiency) |
| 78 | + - Whether the experiment can reuse cached context from prior runs (cache reads reduce cost) |
| 79 | + - Whether a cheaper model configuration could work (e.g., Sonnet-only for refinement campaigns vs Opus+Sonnet for exploratory) |
| 80 | + - Fewer predicted iterations = higher efficiency |
| 81 | + |
| 82 | +2. **For each recommendation, provide:** |
| 83 | + - A suggested `research_question` (1-2 sentences, phrased as a testable question) |
| 84 | + - Which entities/concepts from the context block it builds on (with brief context) |
| 85 | + - Which frontiers it addresses (by ID and title) |
| 86 | + - Which interactions it could test (by ID and title) |
| 87 | + - Which dead-ends to explicitly avoid (by ID and brief reason) |
| 88 | + - Score breakdown (Novelty/Foundation/Impact/Testability/Efficiency + weighted total) |
| 89 | + - Predicted cost (iterations × cost/iter, with basis for estimate) |
| 90 | + - Suggested model configuration (which models for design/execute phases, with rationale) |
| 91 | + |
| 92 | +### Phase C: Output File |
| 93 | + |
| 94 | +Write the full recommendation to a markdown file at: |
| 95 | + |
| 96 | +``` |
| 97 | +~/.nous/wiki/suggestions/<YYYY-MM-DD>-<slugified-intent>.md |
| 98 | +``` |
| 99 | + |
| 100 | +- Create the `~/.nous/wiki/suggestions/` directory if it doesn't exist. |
| 101 | +- Slugify the intent: lowercase, replace spaces with `-`, strip non-alphanumeric characters, truncate to 50 chars. |
| 102 | +- If the file already exists (same date + intent), append a numeric suffix (`-2`, `-3`, etc.). |
| 103 | + |
| 104 | +After writing the file, print a short summary to the terminal: |
| 105 | + |
| 106 | +``` |
| 107 | +Wrote: ~/.nous/wiki/suggestions/<filename>.md |
| 108 | +
|
| 109 | +Top recommendations: |
| 110 | + 1. <title> — score: <total>/1.0 |
| 111 | + 2. <title> — score: <total>/1.0 |
| 112 | + 3. <title> — score: <total>/1.0 |
| 113 | +``` |
| 114 | + |
| 115 | +### Phase D: File Format |
| 116 | + |
| 117 | +The markdown file should follow this structure. The scoring table is **required** for every recommendation — it is the primary decision-making artifact. |
| 118 | + |
| 119 | +```markdown |
| 120 | +# Suggest-Next: <project name> |
| 121 | + |
| 122 | +**Date:** <YYYY-MM-DD> |
| 123 | +**Research intent:** "<user's intent>" |
| 124 | +**Prior campaigns:** <count> |
| 125 | +**Total confirmed principles:** <count> |
| 126 | +**Campaigns consulted:** <comma-separated names> |
| 127 | +**Entities scoped:** <comma-separated names> |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## Scoring Summary |
| 132 | + |
| 133 | +| # | Recommendation | Novelty | Foundation | Impact | Testability | Efficiency | **Total** | |
| 134 | +|---|---------------|---------|-----------|--------|-------------|------------|-----------| |
| 135 | +| 1 | <short title> | X.XX | X.XX | X.XX | X.XX | X.XX | **X.XX** | |
| 136 | +| 2 | <short title> | X.XX | X.XX | X.XX | X.XX | X.XX | **X.XX** | |
| 137 | +| 3 | <short title> | X.XX | X.XX | X.XX | X.XX | X.XX | **X.XX** | |
| 138 | + |
| 139 | +*Weights: Novelty 0.25, Foundation 0.20, Impact 0.25, Testability 0.15, Efficiency 0.15* |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## Recommendation 1: <short title> |
| 144 | + |
| 145 | +**Suggested research question:** |
| 146 | +> <1-2 sentence testable question> |
| 147 | +
|
| 148 | +### Score Breakdown |
| 149 | + |
| 150 | +**Weighted total: X.XX/1.0** |
| 151 | + |
| 152 | +| Dimension | Weight | Score | Rationale | |
| 153 | +|-------------|--------|-------|-----------| |
| 154 | +| Novelty | 0.25 | X.XX | <brief — what makes this novel or not> | |
| 155 | +| Foundation | 0.20 | X.XX | <brief — which principles it builds on> | |
| 156 | +| Impact | 0.25 | X.XX | <brief — expected effect size and why> | |
| 157 | +| Testability | 0.15 | X.XX | <brief — how bounded/measurable it is> | |
| 158 | +| Efficiency | 0.15 | X.XX | <brief — cost/impact ratio reasoning> | |
| 159 | + |
| 160 | +### Builds on |
| 161 | +- <Entity/Concept name> — <how it's relevant> |
| 162 | +- ... |
| 163 | + |
| 164 | +### Addresses frontiers |
| 165 | +- F-N: <title> — <how this experiment would push the boundary> |
| 166 | +- ... |
| 167 | + |
| 168 | +### Tests interactions |
| 169 | +- I-N: <title> — <what combining these would reveal> |
| 170 | +- ... |
| 171 | + |
| 172 | +### Avoid (dead-ends) |
| 173 | +- DE-N: <title> — <why this failed before> |
| 174 | +- ... |
| 175 | + |
| 176 | +### Predicted cost |
| 177 | + |
| 178 | +| Metric | Estimate | Basis | |
| 179 | +|--------|----------|-------| |
| 180 | +| Iterations | N-M | <reasoning: refinement/exploratory, builds on N principles, etc.> | |
| 181 | +| Cost/iter | ~$X.XX | Project historical average (adjusted if applicable) | |
| 182 | +| Total | $XX-YY | iterations × cost/iter | |
| 183 | +| Duration | ~Xh | Based on avg duration/iter from similar campaigns | |
| 184 | + |
| 185 | +### Model configuration |
| 186 | +- Design phase: <model> (<rationale>) |
| 187 | +- Execute phase: <model> (<rationale>) |
| 188 | +- Alternative: <cheaper/costlier option with savings estimate> |
| 189 | + |
| 190 | +**Efficiency note:** <1 sentence on why this cost is justified relative to expected impact> |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## Recommendation 2: <short title> |
| 195 | + |
| 196 | +<same structure as Recommendation 1> |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Recommendation 3: <short title> |
| 201 | + |
| 202 | +<same structure as Recommendation 1> |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Next Steps |
| 207 | + |
| 208 | +To start a campaign from these recommendations, use the interactive generator below or manually: |
| 209 | +1. Select recommendations to generate `campaign.yaml` files (Phase E prompt follows) |
| 210 | +2. Review and adjust the generated config if needed |
| 211 | +3. Run: `nous run <path-to-campaign.yaml>` |
| 212 | +4. After completion, run `/post-campaign` to feed results back into the registry |
| 213 | +``` |
| 214 | + |
| 215 | +### Phase E: Interactive Campaign Generation |
| 216 | + |
| 217 | +After printing the terminal summary (end of Phase C), offer to generate executable `campaign.yaml` files from the recommendations. |
| 218 | + |
| 219 | +#### E1. Ask the User |
| 220 | + |
| 221 | +Use AskUserQuestion to present choices: |
| 222 | + |
| 223 | +**Question:** "Which recommendations would you like to generate campaign.yaml files for?" |
| 224 | + |
| 225 | +**Options:** |
| 226 | +- "1" — Generate for recommendation 1 only |
| 227 | +- "2" — Generate for recommendation 2 only |
| 228 | +- "3" — Generate for recommendation 3 only |
| 229 | +- "All" — Generate for all recommendations |
| 230 | +- "None" — Skip campaign generation |
| 231 | + |
| 232 | +Allow multi-select (the user can pick e.g. "1" and "3"). |
| 233 | + |
| 234 | +If the user selects "None", print `No campaigns generated.` and **STOP**. |
| 235 | + |
| 236 | +#### E2. Generate campaign.yaml for Each Selected Recommendation |
| 237 | + |
| 238 | +For each selected recommendation, produce a YAML document with these field mappings: |
| 239 | + |
| 240 | +| campaign.yaml field | Source | |
| 241 | +|---|---| |
| 242 | +| `research_question` | Recommendation's suggested research question (verbatim from the `> <question>` block) | |
| 243 | +| `run_id` | Slugified recommendation title (lowercase, hyphens, ≤50 chars) | |
| 244 | +| `max_iterations` | Upper bound from the "Iterations" row in the Predicted cost table (e.g., "6-8" → 8) | |
| 245 | +| `target_system.name` | From registry `projects[key].name` | |
| 246 | +| `target_system.description` | Synthesized from registry project description + recommendation context | |
| 247 | +| `target_system.repo_path` | The project key (path) from the registry | |
| 248 | +| `target_system.observable_metrics` | Inferred from recommendation's Impact rationale (omit field entirely if not confidently inferable) | |
| 249 | +| `target_system.controllable_knobs` | Parameter names from "Builds on" section (omit field entirely if not confidently inferable) | |
| 250 | +| `prompts.methodology_layer` | `"prompts/methodology"` (standard default) | |
| 251 | +| `prompts.domain_adapter_layer` | `null` | |
| 252 | +| `models.design` | From recommendation's "Model configuration → Design phase" model name | |
| 253 | +| `models.execute_analyze` | From recommendation's "Model configuration → Execute phase" model name | |
| 254 | +| `metadata` | Traceability block (see E3) | |
| 255 | + |
| 256 | +**Schema compliance rules:** |
| 257 | +- Do NOT include any fields not in `orchestrator/schemas/campaign.schema.yaml` |
| 258 | +- Root object: only `research_question`, `run_id`, `max_iterations`, `target_system`, `prompts`, `models`, `metadata` |
| 259 | +- `target_system`: only `name`, `description`, `repo_path`, `observable_metrics`, `controllable_knobs`, `live_target` |
| 260 | +- `prompts`: only `methodology_layer`, `domain_adapter_layer` |
| 261 | +- `models`: only `design`, `execute_analyze`, `report` |
| 262 | +- Omit optional fields rather than including empty values |
| 263 | +- Model values default: `claude-opus-4-6` (design), `claude-sonnet-4-6` (execute_analyze) |
| 264 | + |
| 265 | +#### E3. Metadata Traceability Block |
| 266 | + |
| 267 | +Include a `metadata` section for provenance tracking: |
| 268 | + |
| 269 | +```yaml |
| 270 | +metadata: |
| 271 | + source_suggestion: "<YYYY-MM-DD>-<slug>.md" |
| 272 | + recommendation_rank: <1|2|3> |
| 273 | + research_intent: "<user's original intent verbatim>" |
| 274 | + builds_on_frontiers: ["F-1", "F-3"] |
| 275 | + tests_interactions: ["I-2"] |
| 276 | + avoids_dead_ends: ["DE-1", "DE-4"] |
| 277 | + foundation_principles: ["RP-5", "RP-12"] |
| 278 | + composite_score: 0.XX |
| 279 | +``` |
| 280 | +
|
| 281 | +- Use the actual IDs from the recommendation's "Addresses frontiers", "Tests interactions", "Avoid (dead-ends)" sections |
| 282 | +- `foundation_principles`: principle IDs referenced in the Foundation score rationale |
| 283 | +- `composite_score`: the weighted total from the scoring table |
| 284 | + |
| 285 | +#### E4. Write Files |
| 286 | + |
| 287 | +Write each generated YAML to: |
| 288 | + |
| 289 | +``` |
| 290 | +~/.nous/wiki/suggestions/campaigns/<YYYY-MM-DD>-<slugified-intent>-<N>.yaml |
| 291 | +``` |
| 292 | + |
| 293 | +Where `<N>` is the recommendation number (1, 2, or 3). |
| 294 | + |
| 295 | +- The `<YYYY-MM-DD>-<slugified-intent>` prefix matches the suggestion markdown filename (without `.md`) |
| 296 | +- If the file already exists, append a numeric suffix before `.yaml` (e.g., `-1-2.yaml`) |
| 297 | +- Create `~/.nous/wiki/suggestions/campaigns/` if it doesn't exist |
| 298 | + |
| 299 | +#### E5. Print Execution Instructions |
| 300 | + |
| 301 | +After writing all campaign files, print: |
| 302 | + |
| 303 | +``` |
| 304 | +Generated campaign files: |
| 305 | + <N>. ~/.nous/wiki/suggestions/campaigns/<filename>.yaml |
| 306 | + Run: nous run <full-path> |
| 307 | +
|
| 308 | + ... |
| 309 | +``` |
| 310 | + |
| 311 | +Example: |
| 312 | +``` |
| 313 | +Generated campaign files: |
| 314 | + 1. ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-1.yaml |
| 315 | + Run: nous run ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-1.yaml |
| 316 | + 3. ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-3.yaml |
| 317 | + Run: nous run ~/.nous/wiki/suggestions/campaigns/2026-06-03-improve-fairness-3.yaml |
| 318 | +``` |
| 319 | + |
| 320 | +--- |
| 321 | + |
| 322 | +## Model Configuration Guidance |
| 323 | + |
| 324 | +When suggesting models for a recommendation, use the **Cost Context** section from the retrieved context and apply these heuristics: |
| 325 | + |
| 326 | +- **Opus design + Sonnet execute** (default): For campaigns exploring new territory, combining multiple approaches, or where the design phase needs to reason about complex interactions. Historical cost: ~$5.50-6.00/iter. |
| 327 | +- **Sonnet design + Sonnet execute** (cheaper, ~45% savings): For campaigns that are narrow refinements of known-good configurations — the design space is well-constrained by prior principles. Historical cost: ~$3.00-3.50/iter estimate. |
| 328 | +- **Opus both** (expensive, ~80% increase): Only for campaigns that need deep analysis in the execute phase (e.g., debugging subtle failures where Sonnet might miss root causes). Historical cost: ~$10-11/iter estimate. |
| 329 | + |
| 330 | +Iteration count heuristics: |
| 331 | +- **Refinement** (builds on 3+ confirmed principles, narrow scope): 4-6 iterations |
| 332 | +- **Exploratory** (new territory, tests interactions, <2 confirmed principles to build on): 8-12 iterations |
| 333 | +- **Standard** (mix of known and new): 6-8 iterations |
| 334 | + |
| 335 | +## Important Rules |
| 336 | + |
| 337 | +- This skill **writes files only to `~/.nous/wiki/suggestions/`** — the suggestion markdown at the top level, and optionally campaign YAML files in the `campaigns/` subdirectory. It never modifies registry files, campaign data, or any other existing files. |
| 338 | +- All reasoning happens in-context using the LLM's judgment — no external scripts beyond `retrieve_wiki_context.py`. |
| 339 | +- If the registry is empty or the project has no campaigns, say so clearly and suggest the user run their first campaign manually. |
| 340 | +- Always ground recommendations in specific prior data (principle IDs, frontier IDs, dead-end IDs). Never hallucinate IDs that don't exist in the loaded files. |
| 341 | +- Keep recommendations actionable — each should be concrete enough to immediately write a `campaign.yaml` from. |
| 342 | +- Prefer recommendations that combine insights from multiple campaigns over those that just extend a single campaign. |
| 343 | +- Always use the Cost Context section to ground cost predictions in real data — never invent cost numbers without historical basis. |
| 344 | +- **Scoring transparency is non-negotiable** — every recommendation must include its full score breakdown table with per-dimension rationale. The summary table at the top lets users compare at a glance. |
0 commit comments