diff --git a/docs/CLI_TOOL_EXECUTION.md b/docs/CLI_TOOL_EXECUTION.md new file mode 100644 index 00000000..880f55db --- /dev/null +++ b/docs/CLI_TOOL_EXECUTION.md @@ -0,0 +1,160 @@ +# CLI Tool Execution for Spec Task Runner + +## Overview + +The Spec Task Runner can now execute tasks via external CLI tools (GitHub Copilot CLI, Claude Code, Gemini CLI, Codex) instead of exclusively through the built-in LLM provider. CLI tools run as external processes, work directly on the filesystem, and use the Backlog MCP server to manage task status — the same completion detection mechanism used by the LLM path. + +## Architecture + +### Strategy Pattern + +`SpecTaskRunnerService.submitNextTask()` delegates to one of two execution paths based on the user's selection: + +``` +submitNextTask() + ├── submitTaskViaLlm() → message bus → langchain4j agent loop → backlog_task_edit(Done) + └── submitTaskViaCli() → ProcessBuilder → CLI tool (with Backlog MCP) → task_edit(Done) +``` + +Both paths converge at `notifyPromptExecutionCompleted()` — the only difference is what triggers it. + +### Completion Detection (Both Paths) + +``` +LLM Path: prompt → langchain4j → backlog_task_edit(Done) → spec file changes + ↓ + onSpecsChanged() sets flag + ↓ + ActionButtonsPanel.enableButtons() → notifyPromptExecutionCompleted() + ↓ + advance immediately + +CLI Path: prompt → ProcessBuilder → CLI tool → task_edit(Done) → spec file changes + ↓ + onSpecsChanged() sets flag + ↓ + process.waitFor() exits → notifyPromptExecutionCompleted() + ↓ + advance immediately +``` + +**Important**: `ActionButtonsPanel.enableButtons()` guards its call to `notifyPromptExecutionCompleted()` with `!runner.isCliMode()` so it only fires for the LLM path. Without this guard, the LLM prompt lifecycle (which still runs in the background) would trigger a spurious completion notification immediately after the CLI process starts, causing every task to be skipped via the grace timer. + +If the CLI tool exits without marking the task Done, the existing grace timer fires after 3 seconds and skips to the next task. + +### Prompt Construction + +CLI tools receive the same structured prompt as the LLM provider: + +1. **Instruction block** (`SpecContextBuilder.buildCliInstruction()`) — task identification, step-by-step workflow using Backlog MCP tools (set In Progress, check criteria, append notes, write summary, set Done) +2. **Context block** (`SpecContextBuilder.buildContext()`) — full `` XML with metadata, description, acceptance criteria, definition of done, dependencies, references, implementation plan/notes +3. **Implementation request** suffix + +### Process Execution + +`CliTaskExecutorService` runs the CLI tool as an external process: + +- **Command format**: `[executablePath, promptFlag, prompt, ...extraArgs]` +- **Working directory**: project base path +- **Thread model**: pooled thread via `ApplicationManager.executeOnPooledThread()` +- **Output streaming**: stdout/stderr streamed line-by-line to an IntelliJ `ConsoleView` in the Run tool window +- **Debug logging**: command details, process PID, exit code, elapsed time, and first 5 lines of output are logged to the IDE log for troubleshooting +- **Cancellation**: `Process.destroyForcibly()` on cancel + +## Configuration + +### CLI Tool Settings + +Settings > Spec Driven Development > CLI Runners + +Each CLI tool entry has: + +| Field | Description | Example | +|-------|-------------|---------| +| Name | Display name used in the mode selector | `copilot` | +| Executable Path | Absolute path to the CLI binary | `/opt/homebrew/bin/copilot` | +| Prompt Flag | Flag that precedes the prompt argument | `-p` | +| Extra Args | Additional arguments appended after the prompt | `--allow-all` | +| Enabled | Whether the tool appears in the mode selector | `true` | + +A default **copilot** entry is pre-populated: `/opt/homebrew/bin/copilot -p "..." --allow-all` + +### Execution Mode Selector + +The Spec Browser toolbar includes a ComboBox with: +- **LLM Provider** — uses the built-in langchain4j agent loop (default) +- **CLI: {name}** — one entry per enabled CLI tool + +Selection is persisted to `DevoxxGenieStateService` (`specRunnerMode` and `specSelectedCliTool`). The combo refreshes automatically when settings are applied (via `SETTINGS_CHANGED_TOPIC` subscription). + +## Prerequisites + +CLI tools **must have the Backlog MCP server installed** and configured. The tool needs access to the same backlog directory so it can: + +1. Set task status to "In Progress" +2. Check off acceptance criteria as it works +3. Append implementation notes +4. Write a final summary +5. Set task status to "Done" + +Without Backlog MCP, the CLI tool cannot mark tasks as complete, and the grace timer will skip tasks after 3 seconds. + +## Files + +### New Files + +| File | Type | Description | +|------|------|-------------| +| `model/spec/CliToolConfig.java` | Model | CLI tool configuration (name, path, flag, args, enabled) | +| `service/spec/CliTaskExecutorService.java` | Service | Runs CLI processes, streams output, notifies on exit | +| `service/spec/CliConsoleManager.java` | Service | Manages ConsoleView in the Run tool window | + +### Modified Files + +| File | Changes | +|------|---------| +| `service/spec/SpecTaskRunnerService.java` | Added `cliMode` flag (with getter), `submitTaskViaCli()`, `findCliTool()`, CLI cancel support, debug logging | +| `service/spec/SpecContextBuilder.java` | Added `buildCliInstruction()` for CLI-specific prompt instructions | +| `ui/settings/DevoxxGenieStateService.java` | Added `cliTools`, `specRunnerMode`, `specSelectedCliTool` fields | +| `ui/settings/spec/SpecSettingsComponent.java` | Added CLI Runners section with table, Add/Edit/Remove dialog | +| `ui/panel/spec/SpecBrowserPanel.java` | Added execution mode ComboBox in toolbar, subscribes to `SETTINGS_CHANGED_TOPIC` to refresh combo | +| `ui/panel/ActionButtonsPanel.java` | Guarded `notifyPromptExecutionCompleted()` with `!runner.isCliMode()` to prevent LLM path from interfering with CLI execution | + +## Testing + +1. **Settings**: Open Settings > Spec Driven Development > verify CLI Runners section with pre-populated copilot entry +2. **Mode selector**: Open Spec Browser > verify ComboBox shows "LLM Provider" and "CLI: copilot" +3. **Single task CLI run**: Select "CLI: copilot", check one To Do task, click Run Selected: + - Run tool window opens with console showing CLI output + - CLI tool receives prompt with full task spec and backlog instructions + - CLI tool uses Backlog MCP to mark task In Progress then Done + - `onSpecsChanged()` detects Done, process exits, runner advances +4. **Batch run**: Run multiple tasks to verify sequential execution with dependency ordering +5. **Cancel**: Click Cancel during CLI execution to verify process is killed +6. **Grace timer**: If CLI exits without marking Done, grace timer fires and task is skipped +7. **Error handling**: Configure an invalid executable path to verify graceful error notification + +## Troubleshooting + +### Debug Logging + +`CliTaskExecutorService` and `SpecTaskRunnerService` emit detailed logs to the IDE log (Help > Show Log in Finder). Key log lines to look for: + +| Log Pattern | What It Tells You | +|-------------|-------------------| +| `CLI execute: task=..., promptLength=...` | Command details and prompt size | +| `CLI process started successfully (pid=...)` | Process launched; if missing, `ProcessBuilder.start()` failed | +| `CLI process exited: ... exitCode=..., elapsed=...ms` | How long the process ran and whether it succeeded | +| `CLI stdout [...] line N: ...` | First 5 lines of stdout (look for error messages from the CLI tool) | +| `CLI stderr [...] line N: ...` | First 5 lines of stderr | +| `CLI stdout-reader finished: 0 lines total` | CLI produced no output — likely crashed or rejected the prompt | +| `CLI mode: specRunnerMode=..., specSelectedCliTool=...` | Confirms the correct mode and tool are selected | + +### Common Issues + +| Symptom | Likely Cause | +|---------|-------------| +| Process exits in < 1 second with no output | Prompt exceeds OS argument length limit (~256KB on macOS). Consider shortening task descriptions. | +| All tasks skipped as "not marked Done" | CLI tool doesn't have Backlog MCP installed, or MCP server is not pointing to the correct backlog directory. | +| ComboBox only shows "LLM Provider" | CLI tools not saved — open Settings > Spec Driven Development > CLI Runners, verify entries, click Apply. | +| Grace timer skips tasks immediately | `ActionButtonsPanel.enableButtons()` firing in CLI mode — verify the `!runner.isCliMode()` guard is present. | diff --git a/docusaurus/blog/2026-02-09-devoxxgenie-goes-agentic.md b/docusaurus/blog/2026-02-09-devoxxgenie-goes-agentic.md index cd40a61d..df75b444 100644 --- a/docusaurus/blog/2026-02-09-devoxxgenie-goes-agentic.md +++ b/docusaurus/blog/2026-02-09-devoxxgenie-goes-agentic.md @@ -76,7 +76,7 @@ The plugin prioritizes developer control across every dimension: ## What's Next -This is just the beginning. With [Spec Driven Development](/docs/features/spec-driven-development) (SDD) now available in v0.9.7, you can define tasks as structured specs and let the agent implement them autonomously — complete with acceptance criteria tracking and a visual Kanban board. +This is just the beginning. With [Spec-driven Development](/docs/features/spec-driven-development) (SDD) now available in v0.9.7, you can define tasks as structured specs and let the agent implement them autonomously — complete with acceptance criteria tracking and a visual Kanban board. We're not just chatting with AI anymore. We're collaborating with it. diff --git a/docusaurus/blog/2026-02-10-spec-driven-development.md b/docusaurus/blog/2026-02-10-spec-driven-development.md index 59c0d536..58ed07e4 100644 --- a/docusaurus/blog/2026-02-10-spec-driven-development.md +++ b/docusaurus/blog/2026-02-10-spec-driven-development.md @@ -1,21 +1,21 @@ --- slug: spec-driven-development -title: "Stop Prompting, Start Specifying: Introducing Spec Driven Development in DevoxxGenie" +title: "Stop Prompting, Start Specifying: Introducing Spec-driven Development in DevoxxGenie" authors: [stephanj] -tags: [spec driven development, SDD, agent mode, backlog.md, kanban, agentic AI, IntelliJ IDEA, LLM, Java, open source] +tags: [spec-driven development, SDD, agent mode, backlog.md, kanban, agentic AI, IntelliJ IDEA, LLM, Java, open source] date: 2026-02-10 -description: "Introducing Spec Driven Development (SDD) in DevoxxGenie v0.9.7 — define structured task specs with acceptance criteria and let the AI agent implement them." -keywords: [devoxxgenie, spec driven development, sdd, backlog.md, task specs, kanban board, agent mode, acceptance criteria, milestones, agentic programming] +description: "Introducing Spec-driven Development (SDD) in DevoxxGenie v0.9.7 — define structured task specs with acceptance criteria and let the AI agent implement them." +keywords: [devoxxgenie, spec-driven development, sdd, backlog.md, task specs, kanban board, agent mode, acceptance criteria, milestones, agentic programming] image: /img/devoxxgenie-social-card.jpg --- -# Stop Prompting, Start Specifying: Introducing Spec Driven Development in DevoxxGenie +# Stop Prompting, Start Specifying: Introducing Spec-driven Development in DevoxxGenie We've all been there. You open your AI coding assistant, type a prompt, get a result, realise it missed half the requirements, rephrase, try again. Rephrase and repeat. It works (kind of) but it doesn't scale and we lose history. What if instead of ad-hoc prompting, you could define exactly what needs to be built as a structured spec, and then let the AI agent implement it autonomously — checking off acceptance criteria as it goes? -That's the idea behind **Spec Driven Development (SDD)**, the latest feature in DevoxxGenie v0.9.7. +That's the idea behind **Spec-driven Development (SDD)**, the latest feature in DevoxxGenie v0.9.7. @@ -24,7 +24,7 @@ That's the idea behind **Spec Driven Development (SDD)**, the latest feature in width="100%" style={{aspectRatio: '16/9', maxWidth: '720px', borderRadius: '8px'}} src="https://www.youtube.com/embed/t1MOHCfsdvk" - title="Spec Driven Development Demo" + title="Spec-driven Development Demo" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen @@ -105,7 +105,7 @@ I could also have used a local model like GLM-4.7-Flash via Ollama, which works ## The Bigger Picture -DevoxxGenie started as a simple LLM chat plugin. With [Agent Mode](/docs/features/agent-mode), [MCP support](/docs/features/mcp_expanded), and now [Spec Driven Development](/docs/features/spec-driven-development), it's evolving into something more — an AI-augmented development environment where structured specifications replace ad-hoc prompts, and autonomous agents replace copy-paste workflows. +DevoxxGenie started as a simple LLM chat plugin. With [Agent Mode](/docs/features/agent-mode), [MCP support](/docs/features/mcp_expanded), and now [Spec-driven Development](/docs/features/spec-driven-development), it's evolving into something more — an AI-augmented development environment where structured specifications replace ad-hoc prompts, and autonomous agents replace copy-paste workflows. We're not just chatting with AI anymore. We're collaborating with it. diff --git a/docusaurus/docs/features/agent-mode.md b/docusaurus/docs/features/agent-mode.md index 368e6cae..536d8ea1 100644 --- a/docusaurus/docs/features/agent-mode.md +++ b/docusaurus/docs/features/agent-mode.md @@ -263,7 +263,7 @@ Final Response (synthesized by main agent) ## Batch Task Execution (Agent Loop) -Agent Mode powers the [Agent Loop](sdd-agent-loop.md), which lets you run multiple tasks sequentially in a single batch. Combined with [Spec Driven Development](spec-driven-development.md), you can define structured task specs and have the agent implement them one after another — with dependency ordering, progress tracking, and automatic advancement to the next task. +Agent Mode powers the [Agent Loop](sdd-agent-loop.md), which lets you run multiple tasks sequentially in a single batch. Combined with [Spec-driven Development](spec-driven-development.md), you can define structured task specs and have the agent implement them one after another — with dependency ordering, progress tracking, and automatic advancement to the next task. ## Best Practices diff --git a/docusaurus/docs/features/cli-runners.md b/docusaurus/docs/features/cli-runners.md new file mode 100644 index 00000000..fb26a7a1 --- /dev/null +++ b/docusaurus/docs/features/cli-runners.md @@ -0,0 +1,105 @@ +--- +sidebar_position: 5 +title: CLI Runners +description: Execute spec tasks via external CLI tools like Claude Code, GitHub Copilot CLI, OpenAI Codex CLI, or Google Gemini CLI. DevoxxGenie manages the task lifecycle while your preferred tool does the implementation. +keywords: [devoxxgenie, cli runners, claude code, copilot, codex, gemini, cli tools, spec-driven development, sdd, agent loop] +image: /img/devoxxgenie-social-card.jpg +--- + +# CLI Runners + +Instead of using the built-in LLM provider, you can execute spec tasks via **external CLI tools** — such as Claude Code, GitHub Copilot CLI, OpenAI Codex CLI, or Google Gemini CLI. This lets you leverage the tool you already use and trust, while DevoxxGenie manages the task lifecycle. + +CLI Runners integrate with both [Spec-driven Development](spec-driven-development.md) (single task execution) and the [Agent Loop](sdd-agent-loop.md) (batch task execution). + +## How It Works + +Each CLI tool is launched as an external process with your task prompt piped in. A **Backlog MCP config** is auto-generated and passed to the CLI so it can read and update tasks using the same backlog tools as the built-in agent. The output streams live into an IntelliJ **Run** tool window console. + +``` +┌──────────────────┐ ┌──────────────────────┐ ┌──────────────────┐ +│ DevoxxGenie │────▶│ CLI Process │────▶│ Backlog MCP │ +│ (Task Runner) │ │ (Claude/Copilot/...) │ │ (task updates) │ +└──────────────────┘ └──────────────────────┘ └──────────────────┘ + │ │ │ + Launches process Implements the task Sets status to Done, + with prompt + MCP using its own tools checks off criteria + config via stdin via MCP tools +``` + +## Supported CLI Tools + +| CLI Tool | Prompt Delivery | MCP Support | Default Args | +|----------|----------------|-------------|--------------| +| **Claude Code** | stdin (`-p` flag) | Auto-generated `--mcp-config` | `-p --dangerously-skip-permissions --model opus --allowedTools Backlog.md` | +| **GitHub Copilot** | stdin | Auto-generated `--additional-mcp-config` (with `@` prefix) | `--allow-all` | +| **OpenAI Codex** | Trailing argument | Not supported | `exec --model gpt-5.3-codex --full-auto` | +| **Google Gemini** | stdin | Auto-generated `--mcp-config` | *(none)* | +| **Custom** | stdin | Configurable | *(user-defined)* | + +:::note +Codex CLI does not support MCP, so it cannot update task status directly. The task runner detects completion via the spec file watcher and terminates the Codex process automatically. +::: + +## Setup + +1. Open **Settings** > **Tools** > **DevoxxGenie** > **Spec Driven Dev** +2. Scroll to the **CLI Runners** section +3. Click **+** to add a new CLI tool +4. Select the **Type** from the dropdown — all fields are pre-filled with sensible defaults +5. Adjust the **Executable path** if your CLI is installed in a different location +6. Optionally add **Env vars** (e.g., `ANTHROPIC_API_KEY=sk-...`) for tools that need API keys when launched from IntelliJ +7. Click **Test Connection** to verify the tool is installed and authenticated +8. Click **OK**, then **Apply** + +You can configure **multiple CLI tools** — for example, Claude for complex tasks and Codex for quick fixes. Switch between them in the toolbar. + +![CLI Runners configuration list in Settings](/img/CLI-Runners-Setup.png) + +## Selecting the Execution Mode + +The **DevoxxGenie Specs** toolbar contains an execution mode dropdown: + +- **LLM Provider** — uses the built-in LLM agent (default) +- **CLI: Claude** / **CLI: Copilot** / etc. — uses the configured external CLI tool + +The selection is persisted across IDE restarts. When you click **Run Selected** or **Run All To Do**, tasks are executed using whichever mode is selected. + +![Selecting a CLI runner from the execution mode dropdown](/img/CLI-Runners-Selection.png) + +## Configuration Reference + +| Field | Description | +|-------|-------------| +| **Type** | Preset type (Claude, Copilot, Codex, Gemini, Custom). Selecting a type auto-fills the other fields. | +| **Executable path** | Absolute path to the CLI binary (e.g., `/opt/homebrew/bin/claude`) | +| **Extra args** | Command-line arguments passed to the CLI. These are split on whitespace — no shell quoting needed. | +| **Env vars** | Optional environment variables as `KEY=VALUE, KEY2=VALUE2`. Useful for API keys not inherited from the shell. | +| **MCP config flag** | Read-only. The CLI flag used to pass the auto-generated Backlog MCP config file. Set automatically per tool type. | +| **Enabled** | Toggle to enable/disable a tool without deleting its configuration | + +## Console Output + +When a CLI tool runs, its stdout and stderr stream into the **Run** tool window in IntelliJ. Each task execution shows: + +- A header with the task ID, title, and CLI tool name +- The full output from the CLI tool +- An exit summary with exit code and elapsed time + +## Adding a Custom CLI Tool + +If your CLI tool is not in the preset list: + +1. Select **Custom** as the Type +2. Enter the executable path and arguments manually +3. Set the **MCP config flag** if your tool supports an MCP config file (leave empty if not) +4. The prompt is piped via stdin by default — ensure your tool reads from stdin in non-interactive mode + +## Troubleshooting + +| Issue | Cause | Solution | +|-------|-------|----------| +| CLI tool fails immediately | Authentication error or wrong path | Check the executable path and env vars in **Settings → Spec Driven Dev → CLI Runners**, use **Test Connection** to verify | +| CLI process doesn't exit | Some CLI tools (e.g., Codex) don't self-exit after completing | The runner detects task completion via the file watcher and terminates the process automatically | +| No output in console | Process started but no stdout | Check that the executable path is correct and the tool is authenticated. Try running the command manually in a terminal | +| MCP tools not available | CLI tool doesn't receive the Backlog MCP config | Verify the tool type is set correctly — MCP config is auto-generated per tool type. Codex does not support MCP | diff --git a/docusaurus/docs/features/overview.md b/docusaurus/docs/features/overview.md index 1e37a1ae..46bc5cde 100644 --- a/docusaurus/docs/features/overview.md +++ b/docusaurus/docs/features/overview.md @@ -12,7 +12,7 @@ DevoxxGenie offers a comprehensive set of features designed to enhance your deve ## Core Features -### Spec Driven Development (SDD) +### Spec-driven Development (SDD) Define tasks as structured markdown specs with acceptance criteria, and let the LLM agent implement them autonomously: @@ -92,7 +92,7 @@ DevoxxGenie also includes experimental features that are being developed and ref For detailed information about specific features, check out the dedicated pages: -- [Spec Driven Development](spec-driven-development.md) +- [Spec-driven Development](spec-driven-development.md) - [Chat Interface](chat-interface.md) - [MCP Support](mcp_expanded.md) - [Agent Mode](agent-mode.md) diff --git a/docusaurus/docs/features/sdd-agent-loop.md b/docusaurus/docs/features/sdd-agent-loop.md index c5fed964..26de218e 100644 --- a/docusaurus/docs/features/sdd-agent-loop.md +++ b/docusaurus/docs/features/sdd-agent-loop.md @@ -2,7 +2,7 @@ sidebar_position: 4 title: Agent Loop — Batch Task Execution description: Run multiple SDD tasks sequentially with dependency ordering, progress tracking, and automatic task advancement. Each task gets a fresh conversation and the agent updates notes, summaries, and acceptance criteria as it works. -keywords: [devoxxgenie, sdd, agent loop, batch execution, dependencies, task runner, spec driven development] +keywords: [devoxxgenie, sdd, agent loop, batch execution, dependencies, task runner, spec-driven development, cli runners] image: /img/devoxxgenie-social-card.jpg --- @@ -12,6 +12,8 @@ The Agent Loop lets you run multiple tasks sequentially in a single batch. Each This is useful when you have a set of related tasks (e.g., "implement the auth module") and want the agent to work through them without manual intervention. +The Agent Loop works with both the **built-in LLM provider** and **external CLI tools** (Claude Code, Copilot, Codex, Gemini). See [CLI Runners](cli-runners.md) for setup instructions. +