Skip to content

Commit 01459a3

Browse files
author
catlog22
committed
Add tests for CLI command generation and model alias resolution
- Implement `test-cli-command-gen.js` to verify the logic of `buildCliCommand` function. - Create `test-e2e-model-alias.js` for end-to-end testing of model alias resolution in `ccw cli`. - Add `test-model-alias.js` to test model alias resolution for different models. - Introduce `test-model-alias.txt` for prompt testing with model alias. - Develop `test-update-claude-command.js` to test command generation for `update_module_claude`. - Create a test file in `test-update-claude/src` for future tests.
1 parent 6576886 commit 01459a3

193 files changed

Lines changed: 4799 additions & 9408 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.codex/agents/action-planning-agent.md

Lines changed: 85 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,17 @@ color: yellow
5555
**Step-by-step execution**:
5656

5757
```
58+
0. Load planning notes → Extract phase-level constraints (NEW)
59+
Commands: Read('.workflow/active/{session-id}/planning-notes.md')
60+
Output: Consolidated constraints from all workflow phases
61+
Structure:
62+
- User Intent: Original GOAL, KEY_CONSTRAINTS
63+
- Context Findings: Critical files, architecture notes, constraints
64+
- Conflict Decisions: Resolved conflicts, modified artifacts
65+
- Consolidated Constraints: Numbered list of ALL constraints (Phase 1-3)
66+
67+
USAGE: This is the PRIMARY source of constraints. All task generation MUST respect these constraints.
68+
5869
1. Load session metadata → Extract user input
5970
- User description: Original task/feature requirements
6071
- Project scope: User-specified boundaries and goals
@@ -277,8 +288,8 @@ function computeCliStrategy(task, allTasks) {
277288
"execution_group": "parallel-abc123|null",
278289
"module": "frontend|backend|shared|null",
279290
"execution_config": {
280-
"method": "agent|hybrid|cli",
281-
"cli_tool": "codex|gemini|qwen|auto",
291+
"method": "agent|cli",
292+
"cli_tool": "codex|gemini|qwen|auto|null",
282293
"enable_resume": true,
283294
"previous_cli_id": "string|null"
284295
}
@@ -292,32 +303,31 @@ function computeCliStrategy(task, allTasks) {
292303
- `execution_group`: Parallelization group ID (tasks with same ID can run concurrently) or `null` for sequential tasks
293304
- `module`: Module identifier for multi-module projects (e.g., `frontend`, `backend`, `shared`) or `null` for single-module
294305
- `execution_config`: CLI execution settings (MUST align with userConfig from task-generate-agent)
295-
- `method`: Execution method - `agent` (direct), `hybrid` (agent + CLI), `cli` (CLI only)
306+
- `method`: Execution method - `agent` (direct) or `cli` (CLI only). Only two values in final task JSON.
296307
- `cli_tool`: Preferred CLI tool - `codex`, `gemini`, `qwen`, `auto`, or `null` (for agent-only)
297308
- `enable_resume`: Whether to use `--resume` for CLI continuity (default: true)
298309
- `previous_cli_id`: Previous task's CLI execution ID for resume (populated at runtime)
299310
300311
**execution_config Alignment Rules** (MANDATORY):
301312
```
302-
userConfig.executionMethodmeta.execution_config + implementation_approach
313+
userConfig.executionMethodmeta.execution_config
303314

304315
"agent"
305316
meta.execution_config = { method: "agent", cli_tool: null, enable_resume: false }
306-
implementation_approach steps: NO command field (agent direct execution)
307-
308-
"hybrid"
309-
meta.execution_config = { method: "hybrid", cli_tool: userConfig.preferredCliTool }
310-
implementation_approach steps: command field ONLY on complex steps
317+
Execution: Agent executes pre_analysis, then directly implements implementation_approach
311318

312319
"cli"
313-
meta.execution_config = { method: "cli", cli_tool: userConfig.preferredCliTool }
314-
implementation_approach steps: command field on ALL steps
320+
meta.execution_config = { method: "cli", cli_tool: userConfig.preferredCliTool, enable_resume: true }
321+
Execution: Agent executes pre_analysis, then hands off full context to CLI via buildCliHandoffPrompt()
322+
323+
"hybrid"
324+
Per-task decision: set method to "agent" OR "cli" per task based on complexity
325+
- Simple tasks (≤3 files, straightforward logic) → { method: "agent", cli_tool: null, enable_resume: false }
326+
- Complex tasks (>3 files, complex logic, refactoring) → { method: "cli", cli_tool: userConfig.preferredCliTool, enable_resume: true }
327+
Final task JSON always has method = "agent" or "cli", never "hybrid"
315328
```
316329
317-
**Consistency Check**: `meta.execution_config.method` MUST match presence of `command` fields:
318-
- `method: "agent"` → 0 steps have command field
319-
- `method: "hybrid"` → some steps have command field
320-
- `method: "cli"` → all steps have command field
330+
**IMPORTANT**: implementation_approach steps do NOT contain `command` fields. Execution routing is controlled by task-level `meta.execution_config.method` only.
321331
322332
**Test Task Extensions** (for type="test-gen" or type="test-fix"):
323333
@@ -336,7 +346,7 @@ userConfig.executionMethod → meta.execution_config + implementation_approach
336346
- `test_framework`: Existing test framework from project (required for test tasks)
337347
- `coverage_target`: Target code coverage percentage (optional)
338348
339-
**Note**: CLI tool usage for test-fix tasks is now controlled via `flow_control.implementation_approach` steps with `command` fields, not via `meta.use_codex`.
349+
**Note**: CLI tool usage for test-fix tasks is now controlled via task-level `meta.execution_config.method`, not via `meta.use_codex`.
340350
341351
#### Context Object
342352
@@ -547,59 +557,45 @@ The examples above demonstrate **patterns**, not fixed requirements. Agent MUST:
547557

548558
##### Implementation Approach
549559

550-
**Execution Modes**:
560+
**Execution Control**:
551561

552-
The `implementation_approach` supports **two execution modes** based on the presence of the `command` field:
562+
The `implementation_approach` defines sequential implementation steps. Execution routing is controlled by **task-level `meta.execution_config.method`**, NOT by step-level `command` fields.
553563

554-
1. **Default Mode (Agent Execution)** - `command` field **omitted**:
564+
**Two Execution Modes**:
565+
566+
1. **Agent Mode** (`meta.execution_config.method = "agent"`):
555567
- Agent interprets `modification_points` and `logic_flow` autonomously
556568
- Direct agent execution with full context awareness
557569
- No external tool overhead
558570
- **Use for**: Standard implementation tasks where agent capability is sufficient
559-
- **Required fields**: `step`, `title`, `description`, `modification_points`, `logic_flow`, `depends_on`, `output`
560-
561-
2. **CLI Mode (Command Execution)** - `command` field **included**:
562-
- Specified command executes the step directly
563-
- Leverages specialized CLI tools (codex/gemini/qwen) for complex reasoning
564-
- **Use for**: Large-scale features, complex refactoring, or when user explicitly requests CLI tool usage
565-
- **Required fields**: Same as default mode **PLUS** `command`, `resume_from` (optional)
566-
- **Command patterns** (with resume support):
567-
- `ccw cli -p '[prompt]' --tool codex --mode write --cd [path]`
568-
- `ccw cli -p '[prompt]' --resume ${previousCliId} --tool codex --mode write` (resume from previous)
569-
- `ccw cli -p '[prompt]' --tool gemini --mode write --cd [path]` (write mode)
570-
- **Resume mechanism**: When step depends on previous CLI execution, include `--resume` with previous execution ID
571571

572-
**Semantic CLI Tool Selection**:
572+
2. **CLI Mode** (`meta.execution_config.method = "cli"`):
573+
- Agent executes `pre_analysis`, then hands off full context to CLI via `buildCliHandoffPrompt()`
574+
- CLI tool specified in `meta.execution_config.cli_tool` (codex/gemini/qwen)
575+
- Leverages specialized CLI tools for complex reasoning
576+
- **Use for**: Large-scale features, complex refactoring, or when userConfig.executionMethod = "cli"
573577

574-
Agent determines CLI tool usage per-step based on user semantics and task nature.
575-
576-
**Source**: Scan `metadata.task_description` from context-package.json for CLI tool preferences.
577-
578-
**User Semantic Triggers** (patterns to detect in task_description):
579-
- "use Codex/codex" → Add `command` field with Codex CLI
580-
- "use Gemini/gemini" → Add `command` field with Gemini CLI
581-
- "use Qwen/qwen" → Add `command` field with Qwen CLI
582-
- "CLI execution" / "automated" → Infer appropriate CLI tool
578+
**Step Schema** (same for both modes):
579+
```json
580+
{
581+
"step": 1,
582+
"title": "Step title",
583+
"description": "What to implement (may use [variable] placeholders from pre_analysis)",
584+
"modification_points": ["Quantified changes: [list with counts]"],
585+
"logic_flow": ["Implementation sequence"],
586+
"depends_on": [0],
587+
"output": "variable_name"
588+
}
589+
```
583590
584-
**Task-Based Selection** (when no explicit user preference):
585-
- **Implementation/coding**: Codex preferred for autonomous development
586-
- **Analysis/exploration**: Gemini preferred for large context analysis
587-
- **Documentation**: Gemini/Qwen with write mode (`--mode write`)
588-
- **Testing**: Depends on complexity - simple=agent, complex=Codex
591+
**Required fields**: `step`, `title`, `description`, `modification_points`, `logic_flow`, `depends_on`, `output`
589592
590-
**Default Behavior**: Agent always executes the workflow. CLI commands are embedded in `implementation_approach` steps:
591-
- Agent orchestrates task execution
592-
- When step has `command` field, agent executes it via CCW CLI
593-
- When step has no `command` field, agent implements directly
594-
- This maintains agent control while leveraging CLI tool power
593+
**IMPORTANT**: Do NOT add `command` field to implementation_approach steps. Execution routing is determined by task-level `meta.execution_config.method` only.
595594
596-
**Key Principle**: The `command` field is **optional**. Agent decides based on user semantics and task complexity.
597-
598-
**Examples**:
595+
**Example**:
599596
600597
```json
601598
[
602-
// === DEFAULT MODE: Agent Execution (no command field) ===
603599
{
604600
"step": 1,
605601
"title": "Load and analyze role analyses",
@@ -636,33 +632,6 @@ Agent determines CLI tool usage per-step based on user semantics and task nature
636632
],
637633
"depends_on": [1],
638634
"output": "implementation"
639-
},
640-
641-
// === CLI MODE: Command Execution (optional command field) ===
642-
{
643-
"step": 3,
644-
"title": "Execute implementation using CLI tool",
645-
"description": "Use Codex/Gemini for complex autonomous execution",
646-
"command": "ccw cli -p '[prompt]' --tool codex --mode write --cd [path]",
647-
"modification_points": ["[Same as default mode]"],
648-
"logic_flow": ["[Same as default mode]"],
649-
"depends_on": [1, 2],
650-
"output": "cli_implementation",
651-
"cli_output_id": "step3_cli_id" // Store execution ID for resume
652-
},
653-
654-
// === CLI MODE with Resume: Continue from previous CLI execution ===
655-
{
656-
"step": 4,
657-
"title": "Continue implementation with context",
658-
"description": "Resume from previous step with accumulated context",
659-
"command": "ccw cli -p '[continuation prompt]' --resume ${step3_cli_id} --tool codex --mode write",
660-
"resume_from": "step3_cli_id", // Reference previous step's CLI ID
661-
"modification_points": ["[Continue from step 3]"],
662-
"logic_flow": ["[Build on previous output]"],
663-
"depends_on": [3],
664-
"output": "continued_implementation",
665-
"cli_output_id": "step4_cli_id"
666635
}
667636
]
668637
```
@@ -785,13 +754,13 @@ Generate at `.workflow/active/{session_id}/TODO_LIST.md`:
785754
Use `analysis_results.complexity` or task count to determine structure:
786755
787756
**Single Module Mode**:
788-
- **Simple Tasks** (≤5 tasks): Flat structure
789-
- **Medium Tasks** (6-12 tasks): Flat structure
790-
- **Complex Tasks** (>12 tasks): Re-scope required (maximum 12 tasks hard limit)
757+
- **Simple Tasks** (≤4 tasks): Flat structure
758+
- **Medium Tasks** (5-8 tasks): Flat structure
759+
- **Complex Tasks** (>8 tasks): Re-scope required (maximum 8 tasks hard limit)
791760
792761
**Multi-Module Mode** (N+1 parallel planning):
793-
- **Per-module limit**: ≤9 tasks per module
794-
- **Total limit**: Sum of all module tasks ≤27 (3 modules × 9 tasks)
762+
- **Per-module limit**: ≤6 tasks per module
763+
- **Total limit**: No total limit (each module independently capped at 6 tasks)
795764
- **Task ID format**: `IMPL-{prefix}{seq}` (e.g., IMPL-A1, IMPL-B1)
796765
- **Structure**: Hierarchical by module in IMPL_PLAN.md and TODO_LIST.md
797766
@@ -852,9 +821,35 @@ Use `analysis_results.complexity` or task count to determine structure:
852821
- Proper linking between documents
853822
- Consistent navigation and references
854823
855-
### 3.3 Guidelines Checklist
824+
### 3.3 N+1 Context Recording
825+
826+
**Purpose**: Record decisions and deferred items for N+1 planning continuity.
827+
828+
**When**: After task generation, update `## N+1 Context` in planning-notes.md.
829+
830+
**What to Record**:
831+
- **Decisions**: Architecture/technology choices with rationale (mark `Revisit?` if may change)
832+
- **Deferred**: Items explicitly moved to N+1 with reason
833+
834+
**Example**:
835+
```markdown
836+
## N+1 Context
837+
### Decisions
838+
| Decision | Rationale | Revisit? |
839+
|----------|-----------|----------|
840+
| JWT over Session | Stateless scaling | No |
841+
| CROSS::B::api → IMPL-B1 | B1 defines base | Yes |
842+
843+
### Deferred
844+
- [ ] Rate limiting - Requires Redis (N+1)
845+
- [ ] API versioning - Low priority
846+
```
847+
848+
### 3.4 Guidelines Checklist
856849
857850
**ALWAYS:**
851+
- **Load planning-notes.md FIRST**: Read planning-notes.md before context-package.json. Use its Consolidated Constraints as primary constraint source for all task generation
852+
- **Record N+1 Context**: Update `## N+1 Context` section with key decisions and deferred items
858853
- **Search Tool Priority**: ACE (`mcp__ace-tool__search_context`) → CCW (`mcp__ccw-tools__smart_search`) / Built-in (`Grep`, `Glob`, `Read`)
859854
- Apply Quantification Requirements to all requirements, acceptance criteria, and modification points
860855
- Load IMPL_PLAN template: `Read(~/.claude/workflows/cli-templates/prompts/workflow/impl-plan-template.txt)` before generating IMPL_PLAN.md
@@ -865,7 +860,7 @@ Use `analysis_results.complexity` or task count to determine structure:
865860
- **Compute CLI execution strategy**: Based on `depends_on`, set `cli_execution.strategy` (new/resume/fork/merge_fork)
866861
- Map artifacts: Use artifacts_inventory to populate task.context.artifacts array
867862
- Add MCP integration: Include MCP tool steps in flow_control.pre_analysis when capabilities available
868-
- Validate task count: Maximum 12 tasks hard limit, request re-scope if exceeded
863+
- Validate task count: Maximum 8 tasks (single module) or 6 tasks per module (multi-module), request re-scope if exceeded
869864
- Use session paths: Construct all paths using provided session_id
870865
- Link documents properly: Use correct linking format (📋 for JSON, ✅ for summaries)
871866
- Run validation checklist: Verify all quantification requirements before finalizing task JSONs
@@ -879,7 +874,7 @@ Use `analysis_results.complexity` or task count to determine structure:
879874
- Load files directly (use provided context package instead)
880875
- Assume default locations (always use session_id in paths)
881876
- Create circular dependencies in task.depends_on
882-
- Exceed 12 tasks without re-scoping
877+
- Exceed 8 tasks (single module) or 6 tasks per module (multi-module) without re-scoping
883878
- Skip artifact integration when artifacts_inventory is provided
884879
- Ignore MCP capabilities when available
885880
- Use fixed pre-analysis steps without task-specific adaptation

0 commit comments

Comments
 (0)