docs: clarify research and review workflow

goodguy1963 · goodguy1963 · commit c1fb8e153ac6 · 2026-05-03T18:54:35.000+02:00
Document Research as the benchmark-driven optimization surface rather than the default todo pre-execution research path.

Clarify that needs-bot-review is the todo-specific research and planning handoff, and that the default execution flow is Todo -&gt; needs-bot-review when planning or research is needed -&gt; Task for direct execution or Job for staged execution.
diff --git a/README.md b/README.md
@@ -30,7 +30,7 @@ It connects three tightly linked layers:
 2. Execution and scheduling through `Tasks` and `Jobs`
 3. Tool and control-plane integration through `Research`, `MCP`, and optional agent surfaces
 
-The workflow is explicit on purpose. A `Todo` is the planning artifact. A `Task` is one executable unit. A `Job` is an orchestrated or scheduled run built from steps. `Research` is an exploratory context-building artifact when the system still needs evidence before execution.
+The workflow is explicit on purpose. A `Todo` is the planning artifact. `needs-bot-review` is the default planning and research handoff when a todo still needs analysis, outside evidence, or a clearer execution plan. A `Task` is one executable unit. A `Job` is an orchestrated or scheduled run built from steps. `Research` is a separate bounded benchmark-and-optimization surface for repeated measured runs and self-improvement loops.
 
 That structure keeps the LLM as the native execution chat surface while Copilot Cockpit provides the approval, scheduling, and control layer around it. The goal is not less automation. The goal is accountable automation that can move from intake to execution without losing review, context, or ownership.
 
@@ -75,10 +75,11 @@ For the step-by-step walkthrough, open [docs/feature-tour.md](https://github.com
 The recommended default path is simple:
 
 1. Start with a `Todo` in `Todo Cockpit` for intake, planning, and triage.
-2. Use `Research` only when context is missing or the direction still needs evidence.
+2. Use `needs-bot-review` when the todo still needs research, planning, or a better handoff before execution.
 3. Promote approved work into a `Task` for one executable unit or a `Job` for an orchestrated run.
 4. Review the result before granting more autonomy or scheduling the next cycle.
-5. Add `MCP`, repo-local skills, or agent/control-plane features only when the core loop is already working.
+5. Use `Research` when the goal is benchmark-driven iteration or tool and agent improvement over repeated measured runs.
+6. Add `MCP`, repo-local skills, or agent/control-plane features only when the core loop is already working.
 
 This keeps the relationship collaborative: the workflow starts with planning, earns execution, and only then extends into higher-autonomy integrations.
 
@@ -135,7 +136,7 @@ These are the default path and the main product surface.
 
 ### Todo Cockpit
 
-`Todo Cockpit` is the planning and triage layer. A `Todo` stays a planning artifact: capture work, add comments, apply labels and workflow flags, and decide what should happen next.
+`Todo Cockpit` is the planning and triage layer. A `Todo` stays a planning artifact: capture work, add comments, apply labels and workflow flags, and decide what should happen next. When a todo needs analysis, outside evidence, or a more explicit plan, move it through `needs-bot-review` so the planning prompt carries the todo context forward into the handoff.
 
 Optional GitHub inbox triage also lives here. The `Settings` tab can save repo-local GitHub repository settings plus a reusable automation prompt, then expose a cached GitHub inbox at the top of the board with `Issues`, `Pull Requests`, and `Security Alerts`. Refresh uses your existing VS Code GitHub sign-in, inbox rows can create a plain Todo or `Create Todo + Review`, and repeat imports reuse the existing GitHub-sourced card instead of creating duplicates. For setup, storage, and current limits, see [docs/github-integration.md](https://github.com/goodguy1963/Copilot-Cockpit/blob/main/docs/github-integration.md).
 
@@ -153,9 +154,9 @@ Think of `Jobs` as deeper agentic workflows inside VS Code: research, decision s
 
 ### Research
 
-`Research` is the exploratory context-building layer. Use it when the system is missing context, needs outside evidence, or should iterate against a benchmark before you decide on execution.
+`Research` is the benchmark-driven iteration layer. Use it when the goal is repeated measured improvement: benchmark a prompt, tool, harness, or agent, extract a score, and iterate within explicit limits.
 
-Research is especially useful when work should pull in fresher outside knowledge first, through web search, Perplexity, scrapers, or other tooling, and then return that material for user review before implementation begins.
+This is not the default place to research a todo before execution. Todo-specific discovery and planning belong in `Todo Cockpit` and the `needs-bot-review` flow; `Research` is for bounded optimization loops such as the included AutoAgent-style benchmark example.
 
 ### Experimental and advanced playground capabilities
 
@@ -173,17 +174,17 @@ That also creates a control layer for cost: GitHub Copilot or OpenRouter can use
 
 ### How To Use
 
-`How To Use` is the built-in onboarding tab. Start there if you want the recommended path explained in order: `Todo` first, `Research` when context is missing, `Task` or `Job` for execution, then optional control-plane integration after the core loop is working.
+`How To Use` is the built-in onboarding tab. Start there if you want the recommended path explained in order: `Todo` first, `needs-bot-review` when planning or research is needed, `Task` or `Job` for execution, then `Research` for benchmark-driven iteration and optional control-plane integration after the core loop is working.
 
 ## Common Workflows
 
 ### Approval-First Work
 
 Capture work in `Todo Cockpit`, discuss it, move it into `ready`, and only then prepare the execution unit.
 
-### Research-First Collaboration
+### Todo Research And Planning
 
-Use `Research`, web search, or tool-assisted discovery to gather current information first. Review that output with the user, discuss changes, and only then convert the result into scheduled implementation work.
+Use `needs-bot-review` when a todo needs analysis, outside evidence, or planning before execution. That flow already carries the todo context, can mention the configured search and research providers in its guidance, and should end in a simpler downstream `Task` or `Job` handoff.
 
 ### Scheduled Execution
 
@@ -213,9 +214,9 @@ Start with one recurring loop that produces useful work instead of toy output.
 - `Delivery Risk and Security Watch (Daily)` looks for shipping, trust, and operational blind spots.
 - `Knowledge and Shipping Packager (Daily)` stages reusable docs, memory candidates, and release material for later curation.
 - `Project Intelligence and Delivery Prep` runs those steps in sequence and stops at a review checkpoint before anything turns into real execution.
-- `Onboarding Example Coverage Research` starts with a Todo Cockpit intake item, uses Research to gather or benchmark onboarding evidence, and then promotes approved follow-up into Tasks or Jobs.
+- `Onboarding Example Coverage Research` starts with a Todo Cockpit intake item, uses `needs-bot-review` for the todo-specific planning handoff, and then uses Research only when onboarding quality should be benchmarked and improved over repeated measured runs before promoting approved follow-up into Tasks or Jobs.
 
-Use that onboarding example when you want one concrete loop to demonstrate the product: start in Todo Cockpit, gather context with Research, promote approved work into Tasks or Jobs, and stop at a review checkpoint before autonomy expands.
+Use that onboarding example when you want one concrete loop to demonstrate the product: start in Todo Cockpit, plan or research the todo through `needs-bot-review`, promote approved work into Tasks or Jobs, and use Research separately when benchmark-driven optimization is the real goal.
 
 This is a good fit for a solo product, an internal tool, a small SaaS, or an actively maintained extension like this repo.
 
@@ -234,9 +235,10 @@ The point is not to overclaim autonomy. The point is to show recurring, inspecta
 1. Open Copilot Cockpit from the activity bar or run `Copilot Cockpit: Create Scheduled Prompt (GUI)` from the command palette. Or use the todo-list icon in the top right.
 2. Start in `How To Use` if you are new to the extension, or click the top-bar `Intro Tutorial` button for the same guided walkthrough.
 3. Capture or refine work in `Todo Cockpit` until the planning artifact is clear.
-4. Use `Research` if the work still needs exploratory context or outside evidence.
+4. Use `needs-bot-review` if the todo still needs research, planning, or outside evidence before execution.
 5. Move approved work into `ready`, then promote it into a `Task` for one executable unit or a `Job` for an orchestrated run.
-6. Open `Settings` to configure repo-local defaults and optional integrations such as the GitHub inbox flow. Add `MCP`, Copilot skills, starter agents, or other control-plane features only when you want those optional extensions.
+6. Use `Research` when you want bounded benchmark-driven iteration, such as improving a prompt, tool, or agent over repeated measured runs.
+7. Open `Settings` to configure repo-local defaults and optional integrations such as the GitHub inbox flow. Add `MCP`, Copilot skills, starter agents, or other control-plane features only when you want those optional extensions.
 
 In the same `Settings` tab you can also choose the scheduled task execution provider:
 
@@ -248,7 +250,7 @@ If you select `Codex` or `OpenCode`, install and authenticate those tools separa
 
 If you want the optional integration layers, the practical order is:
 
-1. Get the core `Todo` -> `Research` -> `Task` or `Job` loop working first.
+1. Get the core `Todo` -> `needs-bot-review` -> `Task` or `Job` loop working first.
 2. Use `Set Up MCP` to create or repair `.vscode/mcp.json` and activate the repo-local scheduler MCP server for this workspace.
 3. Add any separate third-party MCP servers you want, such as Tavily, Perplexity, or [Prefab by Max Health Inc.](https://github.com/Max-Health-Inc/prefab), to that same workspace MCP config. Those servers are separate from Copilot Cockpit's scheduler server and may need their own API keys or provider-specific setup.
 4. Optionally, but recommended if you want the full repo-local Copilot guidance layer, use `Sync Bundled Skills` to write the bundled Copilot skills into `.github/skills`. If the Prefab by Max Health Inc. MCP server is configured, that bundled path also adds the `prefab-ui` skill so installed users can route Prefab by Max Health Inc. UI and wire-format work through the shipped contract instead of keeping it as a repo-only extra.
diff --git a/docs/feature-tour.md b/docs/feature-tour.md
@@ -3,7 +3,7 @@
 Copilot Cockpit is easiest to understand as one operating loop:
 
 1. Plan the work.
-2. Research and refine the direction.
+2. Research or plan the todo in the right surface.
 3. Approve the handoff.
 4. Run the right execution unit.
 5. Review the result before granting more autonomy.
@@ -40,6 +40,7 @@ Caption: Plan and review work before it runs.
 
 - Use it to capture work before it becomes execution.
 - Keep comments, labels, flags, due dates, and approval state close to the item.
+- Use `needs-bot-review` when the todo needs research, evidence, or a more explicit execution plan before it should become a task or job.
 - Move work into `ready` when it should hand off into execution.
 - When optional GitHub integration is enabled, the top of the board also becomes a cached inbox for `Issues`, `Pull Requests`, and `Security Alerts`, with direct `Create Todo` and `Create Todo + Review` actions.
 
@@ -107,9 +108,9 @@ Caption: Improve against a benchmark, not by guesswork.
 - Let the system try repeated improvements against a metric.
 - Stop the loop with explicit limits instead of running indefinitely.
 
-Research can also act as a collaborative discovery phase before implementation: gather web knowledge, review the findings with the user, refine the direction, and only then turn the result into scheduled execution.
+Research is not the default place to investigate a todo before execution. Todo-specific discovery, outside evidence gathering, and planning handoff belong in `Todo Cockpit` through `needs-bot-review`, where the todo context is already attached.
 
-For onboarding, `Onboarding Example Coverage Research` shows the full loop in one pass: capture the gap in Todo Cockpit, use Research to benchmark the docs, promote approved fixes into Tasks or Jobs, and pause at a review checkpoint before broader autonomy.
+For onboarding, `Onboarding Example Coverage Research` shows the benchmarked version of the loop: capture the gap in Todo Cockpit, use `needs-bot-review` for the todo handoff, use Research to benchmark the docs when measured iteration is warranted, then promote approved fixes into Tasks or Jobs and pause at a review checkpoint before broader autonomy.
 
 ![Illustrative Research mockup](../images/feature-tour-research.svg)
 
@@ -190,14 +191,15 @@ Best for: first-time users who want the operating model before the controls.
 ## Choosing The Right Surface
 
 - Use `Todo Cockpit` when the work still needs planning or approval.
+- Use `needs-bot-review` when that todo needs research, outside evidence, or a better plan before execution.
 - Use `Tasks` when one prompt and one schedule are enough.
 - Use `Jobs` when the work needs ordered stages or pause points.
 - Use `Research` when the goal is measured improvement against a benchmark.
 
 ## Working Style This Enables
 
 - Keep the human in the loop while still using AI for the heavy lifting.
-- Let research happen before implementation instead of after mistakes are made.
+- Let todo planning and review happen before implementation, and use benchmark research when optimization needs measurement instead of guesswork.
 - Run non-conflicting work in parallel while keeping risky work sequenced and visible.
 - Archive completed, rejected, or reviewed work so the project gains memory over time.
 - Use specialized agents, prompts, and models as a team of different experts instead of forcing one general agent to do every job.
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -6,17 +6,18 @@ Copilot Cockpit works best when you treat it as one workflow stack with three la
 2. Execution and scheduling through `Tasks` and `Jobs`
 3. Optional tool/control-plane integration through `Research`, `MCP`, and repo-local agent surfaces
 
-The recommended path is: start with a `Todo`, use `Research` when context is missing, then promote approved work into a `Task` or `Job`.
+The recommended path is: start with a `Todo`, use `needs-bot-review` when the todo needs research or planning, then promote approved work into a `Task` or `Job`.
 
 ## Quick Start
 
 1. Open Copilot Cockpit from the activity bar or with `Copilot Cockpit: Create Scheduled Prompt (GUI)`.
 2. Start in `How To Use` if you are new to the extension, or click the top-bar `Intro Tutorial` button for the same walkthrough.
 3. Capture or refine work in `Todo Cockpit`. A `Todo` is the planning artifact and intake surface.
-4. Use `Research` if the work still needs exploratory context, outside evidence, or benchmarked iteration.
+4. Use `needs-bot-review` if the todo still needs analysis, outside evidence, or a better execution plan.
 5. Move approved work into `ready`, then promote it into a `Task` for one executable unit or a `Job` for an orchestrated or scheduled run.
-6. Open `Settings` to configure repo-local defaults and integrations. This is also where you choose the scheduled task execution provider: `GitHub Copilot Chat`, `OpenAI Codex CLI`, or `OpenCode CLI`.
-7. Use the top-bar `Plan Integration` button only when you want optional control-plane extensions such as MCP, skills, starter agents, or the GitHub inbox flow.
+6. Use `Research` when you want bounded benchmarked iteration, such as improving a prompt, tool, or agent over repeated measured runs.
+7. Open `Settings` to configure repo-local defaults and integrations. This is also where you choose the scheduled task execution provider: `GitHub Copilot Chat`, `OpenAI Codex CLI`, or `OpenCode CLI`.
+8. Use the top-bar `Plan Integration` button only when you want optional control-plane extensions such as MCP, skills, starter agents, or the GitHub inbox flow.
 
 ## Optional: Enable GitHub Inbox Triage
 
@@ -35,7 +36,8 @@ The GitHub inbox is repo-local, uses cached manual refreshes, and resolves crede
 - Use `Todo Cockpit` when the work still needs planning, comments, approval, or triage.
 - Use `Tasks` when one prompt and one schedule are enough for one executable unit.
 - Use `Jobs` when the work needs ordered stages, orchestration, or pause checkpoints.
-- Use `Research` when the work needs exploratory context or measured improvement before execution.
+- Use `needs-bot-review` when a todo needs research or planning before it becomes execution.
+- Use `Research` when the work needs measured improvement against a benchmark.
 
 ## Optional Extensions
 
@@ -61,7 +63,7 @@ Skip toy prompts. Start with one recurring loop that would still be worth keepin
 - For a company team, use the same pattern for product signals, security and release readiness, support queues, or operations follow-up.
 - If you also want to show the Research surface, add one benchmarked profile that scores onboarding or prompt quality against a simple command before you promote anything into execution.
 
-`Onboarding Example Coverage Research` is the simplest version of that pattern: log the onboarding gap in Todo Cockpit, use Research to gather examples or benchmark the docs, then turn the approved next step into Tasks for a direct doc pass or Jobs for a staged follow-up. Use it when you want a real onboarding loop that still stops at a review checkpoint before autonomy expands.
+`Onboarding Example Coverage Research` is the simplest version of that pattern: log the onboarding gap in Todo Cockpit, use `needs-bot-review` for the todo-specific planning handoff, and use Research only when you want to benchmark and improve the docs over repeated measured runs. Then turn the approved next step into Tasks for a direct doc pass or Jobs for a staged follow-up. Use it when you want a real onboarding loop that still stops at a review checkpoint before autonomy expands.
 
 That keeps the demo honest: the proof is useful output plus explicit review, not a claim that the system should run unchecked.
 
diff --git a/docs/index.md b/docs/index.md
@@ -21,7 +21,7 @@ Use this folder for the detailed reference that used to live in the top-level RE
 - If you want to verify which visuals are live footage, which are illustrative mockups, and where retired media now lives, go to [Media Reference](./media-reference.md).
 - If you want to connect GitHub inbox triage to Todo Cockpit, go to [GitHub Integration](./github-integration.md).
 - If you want to understand the optional starter-agent orchestration layer, go to [Agent Workflow](./agent-workflow.md).
-- If you want to understand Todo Cockpit, Tasks, Jobs, and Research, go to [Workflows](./workflows.md).
+- If you want to understand the default `Todo` -> `needs-bot-review` -> `Task` or `Job` path, plus where benchmark-driven Research fits, go to [Workflows](./workflows.md).
 - If you want MCP, skills, Copilot, OpenRouter, Codex, or Telegram details, go to [Integrations](./integrations.md).
 - If you want persistence and repo-local boundary details, go to [Storage and Boundaries](./storage-and-boundaries.md).
 - If you want the design intent and fork background, go to [Architecture and Principles](./architecture-and-principles.md).
diff --git a/docs/integrations.md b/docs/integrations.md
diff --git a/docs/workflows.md b/docs/workflows.md