feat(prd): Add comprehensive PRD management commands and versioning#293
Conversation
Implements a complete PRD management system for the codeframe CLI: Core PRD functions (codeframe/core/prd.py): - delete(workspace, prd_id) - Remove a PRD from workspace - export_to_file(workspace, prd_id, path, force) - Export PRD to file - create_new_version(workspace, prd_id, content, summary) - Create new version - get_versions(workspace, prd_id) - List all versions of a PRD - get_version(workspace, prd_id, version_number) - Get specific version - diff_versions(workspace, prd_id, v1, v2) - Generate unified diff CLI commands (codeframe/cli/app.py): - prd list - List all PRDs with IDs and timestamps - prd show [id] - Enhanced to accept optional PRD ID - prd delete <id> [--force] - Delete PRD with confirmation - prd export <id|latest> <file> [--force] - Export PRD to file - prd versions <id> - Show version history - prd diff <id> <v1> <v2> - Show diff between versions - prd update <id> <file> -m <message> - Create new version Database schema additions: - version (INTEGER) - Version number for PRD - parent_id (TEXT) - Links to previous version - change_summary (TEXT) - Description of changes Includes 68 tests covering core functions and CLI commands.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
Add PRD CLI commands for list, delete, export, versions, diff, and update and implement PRD versioning with chain_id in
|
| FOREIGN KEY (parent_id) REFERENCES prds(id) | ||
| ) | ||
| """) | ||
|
|
There was a problem hiding this comment.
Migration for prds isn’t resilient: new columns aren’t added and ALTER TABLE assumes the table exists. Suggest checking for the table and missing columns (e.g., sqlite_master, PRAGMA table_info) and only ALTER when needed, or document why prds is guaranteed to exist.
| # Migration: Add new columns to existing prds table | |
| cursor.execute("PRAGMA table_info(prds)") | |
| prds_columns = {row[1] for row in cursor.fetchall()} | |
| if "version" not in prds_columns: | |
| cursor.execute("ALTER TABLE prds ADD COLUMN version INTEGER DEFAULT 1") | |
| if "parent_id" not in prds_columns: | |
| cursor.execute("ALTER TABLE prds ADD COLUMN parent_id TEXT") | |
| if "change_summary" not in prds_columns: | |
| cursor.execute("ALTER TABLE prds ADD COLUMN change_summary TEXT") |
🚀 Want me to fix this? Reply ex: "fix it for me".
| if not start: | ||
| return [] | ||
|
|
||
| conn = get_db_connection(workspace) |
There was a problem hiding this comment.
DB connections can leak on exceptions. Suggest always closing via try/finally or with sqlite3.connect(...) as conn:.
🚀 Want me to fix this? Reply ex: "fix it for me".
Review SummaryThis PR adds comprehensive PRD versioning capabilities including deletion, export, version history, and diff functionality. The implementation follows the v2 headless architecture pattern and includes excellent test coverage (68 new tests). ✅ Strengths
|
|
Posted comprehensive review comment to the PR. The review covers code quality, performance considerations (missing indexes on |
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
Remaining Review ItemsFollowing up on the comprehensive reviews already posted, here are the key issues still needing attention: 🚨 Must Fix Before Merge1. Missing Index on The # Add after line 96 in _ensure_schema_upgrades():
cursor.execute("CREATE INDEX IF NOT EXISTS idx_prds_parent ON prds(parent_id)")2. Missing Cycle Protection in The "walk up" logic to find the root version has no cycle detection. While schema invariants should prevent cycles, defensive code is better: # Replace lines 446-456 with:
current_id = prd_id
visited_up = set()
max_depth = 1000 # Safety limit
while len(visited_up) < max_depth:
cursor.execute(
"SELECT parent_id FROM prds WHERE workspace_id = ? AND id = ?",
(workspace.id, current_id),
)
row = cursor.fetchone()
if not row or not row[0]:
break
if row[0] in visited_up:
# Cycle detected - treat current as root
break
visited_up.add(row[0])
current_id = row[0]📋 Documentation Gap3. Missing CLI Documentation Per AGENTS.md, this file is the authoritative command mapping. Add entries for:
Summary: The core implementation is solid with excellent test coverage. Addressing the index and cycle protection will improve robustness and performance. Documentation update ensures the command mapping stays authoritative. |
| for line in diff.splitlines(): | ||
| if line.startswith("+") and not line.startswith("+++"): | ||
| console.print(f"[green]{line}[/green]") | ||
| elif line.startswith("-") and not line.startswith("---"): | ||
| console.print(f"[red]{line}[/red]") | ||
| elif line.startswith("@@"): | ||
| console.print(f"[cyan]{line}[/cyan]") | ||
| else: | ||
| console.print(line) |
There was a problem hiding this comment.
Diff lines may contain [/], which Rich treats as markup, causing wrong rendering or errors. Consider disabling markup on these prints and use style= for color.
| for line in diff.splitlines(): | |
| if line.startswith("+") and not line.startswith("+++"): | |
| console.print(f"[green]{line}[/green]") | |
| elif line.startswith("-") and not line.startswith("---"): | |
| console.print(f"[red]{line}[/red]") | |
| elif line.startswith("@@"): | |
| console.print(f"[cyan]{line}[/cyan]") | |
| else: | |
| console.print(line) | |
| for line in diff.splitlines(): | |
| if line.startswith("+") and not line.startswith("+++"): | |
| console.print(line, style="green", markup=False) | |
| elif line.startswith("-") and not line.startswith("---"): | |
| console.print(line, style="red", markup=False) | |
| elif line.startswith("@@"): | |
| console.print(line, style="cyan", markup=False) | |
| else: | |
| console.print(line, markup=False) |
🚀 Want me to fix this? Reply ex: "fix it for me".
|
Review posted to PR #293. |
- Add chain_id field to PrdRecord and prds table schema - Add database indexes on parent_id and chain_id columns - Make version number increment atomic with explicit transactions - Optimize get_versions() to use single query with chain_id - Add list_chains() function to list unique PRD chains - Add delete validation with check_dependencies parameter - Add PrdHasDependentTasksError exception for dependent tasks - Update CLI_WIREFRAME.md with new PRD commands documentation Fixes from code review: 1. Performance: Added idx_prds_parent and idx_prds_chain indexes 2. Architecture: Added chain_id for version grouping 3. Concurrency: Wrapped version creation in explicit transaction 4. N+1 queries: get_versions() now uses single query via chain_id 5. Documentation: Added 7 new PRD commands to CLI_WIREFRAME.md 6. Validation: delete() now checks for dependent tasks
Follow-up Review: Most Issues Addressed ✅I reviewed the current state of the code and found that most issues from previous reviews have been addressed. The implementation has evolved significantly since the initial reviews. ✅ Issues Fixed in Current Code
|
|
Review posted successfully. Found that most issues from earlier reviews have been addressed, with only one minor rendering issue remaining in the diff output. |
| INNER JOIN ( | ||
| SELECT chain_id, MAX(version) as max_version | ||
| FROM prds | ||
| WHERE workspace_id = ? | ||
| GROUP BY chain_id | ||
| ) latest ON p.chain_id = latest.chain_id AND p.version = latest.max_version |
There was a problem hiding this comment.
NULL chain_id is dropped by equality joins/filters. Consider a canonical key (e.g., COALESCE(chain_id, id)) used consistently, or branch with IS NULL.
| INNER JOIN ( | |
| SELECT chain_id, MAX(version) as max_version | |
| FROM prds | |
| WHERE workspace_id = ? | |
| GROUP BY chain_id | |
| ) latest ON p.chain_id = latest.chain_id AND p.version = latest.max_version | |
| INNER JOIN ( | |
| SELECT COALESCE(chain_id, id) AS chain_key, MAX(version) as max_version | |
| FROM prds | |
| WHERE workspace_id = ? | |
| GROUP BY chain_key | |
| ) latest ON COALESCE(p.chain_id, p.id) = latest.chain_key AND p.version = latest.max_version |
🚀 Want me to fix this? Reply ex: "fix it for me".
| except Exception: | ||
| cursor.execute("ROLLBACK") | ||
| raise |
There was a problem hiding this comment.
ROLLBACK in the except block can fail and mask the original error (e.g., if the connection is already closed). Consider wrapping ROLLBACK in its own try/except so the original exception is preserved.
| except Exception: | |
| cursor.execute("ROLLBACK") | |
| raise | |
| except Exception: | |
| try: | |
| cursor.execute("ROLLBACK") | |
| except Exception: | |
| pass | |
| raise |
🚀 Want me to fix this? Reply ex: "fix it for me".
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
Final Review: Critical Markup Issue Found
|
Final Review: Critical Markup Issue Found
|
|
Review comment posted successfully. The PR has excellent implementation with one critical markup injection issue that should be fixed. |
Review SummaryThis PR implements comprehensive PRD versioning functionality with excellent test coverage (68 new tests). The code is well-structured and follows the v2 headless architecture pattern. Most issues from prior reviews have been addressed, with only a few remaining items. ✅ Previously Addressed Issues
🐛 Remaining Issues1. Rich Markup Conflict in Diff Output (Priority: High) The diff output uses inline Rich markup ( Fix: Use for line in diff.splitlines():
if line.startswith("+") and not line.startswith("+++"):
console.print(line, style="green", markup=False)
elif line.startswith("-") and not line.startswith("---"):
console.print(line, style="red", markup=False)
elif line.startswith("@@"):
console.print(line, style="cyan", markup=False)
else:
console.print(line, markup=False)2. NULL chain_id Handling (Priority: Low) The Fix: Use SELECT p.id, p.workspace_id, p.title, p.content, p.metadata, p.created_at,
p.version, p.parent_id, p.change_summary, p.chain_id
FROM prds p
INNER JOIN (
SELECT COALESCE(chain_id, id) as chain_key, MAX(version) as max_version
FROM prds
WHERE workspace_id = ?
GROUP BY chain_key
) latest ON COALESCE(p.chain_id, p.id) = latest.chain_key AND p.version = latest.max_version
WHERE p.workspace_id = ?
ORDER BY p.created_at DESC3. ROLLBACK Error Masking (Priority: Low) If Fix: Wrap ROLLBACK in try/except: except Exception:
try:
cursor.execute("ROLLBACK")
except Exception:
pass
raise📝 Overall AssessmentThis is a high-quality PR with comprehensive functionality and excellent test coverage. The code follows v2 architecture principles well. Recommended actions before merge:
Nice to have (can be addressed in follow-up): The implementation is solid and ready to merge with the markup fix applied. |
|
I've posted a comprehensive review to the PR. The review identifies:
The PR is in excellent shape with comprehensive test coverage and clean architecture. Most prior issues have been addressed, with only the markup conflict requiring attention before merge. |
* chore(refactor) Quarantine web-ui and edit CLI_WIREFRAME
* Update AGENTS.md
* feat(core): implement Golden Path Phases 1-2 with CLI-first architecture
Phase 1 - Workspace & Events:
- New Typer CLI at codeframe/cli/app.py with domain-grouped commands
- Workspace management with SQLite state storage in .codeframe/
- Append-only event log for all workspace activity
- Updated pyproject.toml entry point
Phase 2 - PRD & Task Management:
- PRD storage with title extraction and metadata
- Task state machine (BACKLOG→READY→IN_PROGRESS→BLOCKED→DONE→MERGED)
- LLM-powered task generation from PRD (with simple fallback)
- Status transitions with validation
Test coverage:
- 28 state machine unit tests
- 17 workspace unit tests
- 11 integration tests covering full Phase 1-2 flow
* feat(cli): implement status command (Phase 4)
Shows workspace summary including:
- PRD info (title and date)
- Task counts by status with color-coding
- Recent activity from event log
- Configurable event count with --events/-e flag
Emits STATUS_VIEWED event for activity tracking.
* feat(core): implement work commands with runtime module (Phase 5)
New runtime module (codeframe/core/runtime.py):
- Run lifecycle management (start, stop, complete, fail, block, resume)
- RunStatus enum (RUNNING, COMPLETED, FAILED, BLOCKED)
- Stub agent execution loop that emits events
Work CLI commands:
- work start: Creates run, transitions task to IN_PROGRESS
- work stop: Gracefully stops run, returns task to READY
- work resume: Resumes a blocked run
- work status: Shows active runs
The --execute flag on work start runs the stub agent, emitting
AGENT_STEP_STARTED and AGENT_STEP_COMPLETED events for testing.
* feat(core): implement blocker commands (Phase 6)
New blockers module (codeframe/core/blockers.py):
- BlockerStatus enum (OPEN, ANSWERED, RESOLVED)
- Blocker CRUD operations
- Partial ID matching for convenience
Blocker CLI commands:
- blocker list: Show open blockers (--all for all)
- blocker show: View blocker details with question/answer
- blocker create: Manually create blockers for testing
- blocker answer: Provide answer to unblock work
- blocker resolve: Mark blocker as resolved
Emits BLOCKER_CREATED, BLOCKER_ANSWERED, BLOCKER_RESOLVED events.
* feat(core): implement review command with verification gates (Phase 7)
New gates module (codeframe/core/gates.py):
- Auto-detect available gates (pytest, ruff, mypy, npm-test, npm-lint)
- Run gates with configurable verbosity
- Capture output, exit codes, and timing
- GateStatus enum (PASSED, FAILED, SKIPPED, ERROR)
Review CLI command:
- codeframe review: Run all detected gates
- --gate/-g: Run specific gates only
- --verbose/-v: Show full gate output
Emits GATES_STARTED and GATES_COMPLETED events.
Also: Added .codeframe/ to .gitignore
* feat(core): implement patch and commit commands (Phase 8)
New artifacts module (codeframe/core/artifacts.py):
- export_patch: Export git diff as a .patch file
- create_commit: Create git commits with proper validation
- get_status: Get git status summary
- list_patches: List previously exported patches
Patch CLI commands:
- patch export: Export changes to .codeframe/patches/
- patch list: List exported patches
- patch status: Show git status summary
Commit CLI commands:
- commit create: Create commits with -m message
- commit create --all: Stage all changes before committing
Emits PATCH_EXPORTED and COMMIT_CREATED events.
* feat(core): implement checkpoint and summary commands (Phase 9)
Adds checkpoint module for state snapshots and updates summary command
to display workspace overview. Completes Golden Path CLI implementation.
* docs: add agent implementation task list
Tracks the work needed to replace execute_stub() with a fully
functional agent that can read context, plan, and execute code changes.
* feat(adapters): implement LLM adapter with Anthropic and Mock providers
Adds codeframe/adapters/llm/ with:
- base.py: Protocol, ModelSelector, LLMResponse, Tool/ToolCall types
- anthropic.py: Claude provider with tool use and streaming support
- mock.py: Test provider with call tracking and queued responses
Task-based model selection heuristic:
- Planning/reasoning → Sonnet
- Execution → Sonnet
- Generation → Haiku
* feat(core): implement task context loader for agent execution
Adds codeframe/core/context.py with:
- TaskContext: dataclass holding task, PRD, blockers, and file contents
- ContextLoader: loads and scores relevant files within token budget
- Keyword extraction and relevance scoring for file selection
- Token budgeting to maximize useful context
Also adds list_for_task() helper to blockers module.
* feat(core): implement agent planning module
Adds codeframe/core/planner.py with:
- Planner: transforms TaskContext into ImplementationPlan via LLM
- ImplementationPlan: structured plan with steps, files, complexity
- PlanStep: individual step with type, target, dependencies
- StepType enum: file_create, file_edit, shell_command, verification
Uses Purpose.PLANNING to select stronger model for reasoning tasks.
* feat(core): implement code execution engine
Adds codeframe/core/executor.py with:
- Executor: executes plan steps via LLM-driven code generation
- File operations: create, edit, delete with rollback tracking
- Shell commands: sandboxed execution with dangerous pattern blocking
- Dry-run mode for previewing changes without applying them
- Full rollback capability for all file changes
Uses Purpose.EXECUTION for balanced model selection during code generation.
* feat(core): implement agent orchestrator with blocker detection
Adds codeframe/core/agent.py with:
- Agent: main orchestrator coordinating context, planning, execution
- AgentState: serializable state for pause/resume
- Blocker detection: creates blockers for failures needing human input
- Gate integration: runs verification after file changes
- Event emission: callback-based event system for monitoring
Patterns detected for blocker creation:
- Consecutive failures exceeding threshold
- 'not found', 'missing', 'credentials' errors
- Verification failures after max attempts
* feat(runtime): wire agent orchestrator into work start command
Adds execute_agent() to runtime.py:
- Integrates full agent orchestration (context, plan, execute, verify)
- Requires ANTHROPIC_API_KEY for real execution
- Emits workspace events for monitoring
Updates CLI work start command:
- --execute: runs the real AI agent
- --dry-run: preview changes without applying
- --stub: legacy stub execution for testing
The Golden Path is now fully functional from PRD to committed code.
* fix(agent): correct GateResult attribute access
- GateResult has `passed` (bool), not `status`
- GateCheck has `name`, not `gate`
Fixes AttributeError during agent execution verification.
* fix(agent): remove duplicate task status update
Task status is now only updated by runtime.complete_run(),
avoiding DONE -> DONE transition error.
* docs: mark agent implementation tasks complete
* fix(runtime): avoid READY->READY transition in stop_run
* fix(agent): remove duplicate BLOCKED status updates
* fix(executor): handle verification steps intelligently
- Python files: check existence and syntax
- Commands: execute as shell
- Other paths: check existence
Fixes issue where 'task_tracker.py' was run as a command instead of verified.
* docs(readme): update for v2 agent implementation
- Update status badge to reflect v2 completion
- Add "What's New" section for v2 agent implementation
- Document CLI-first workflow as recommended approach
- Update architecture diagram to show CLI/Agent orchestrator
- Add complete CLI command reference
- Move previous updates to collapsible sections
- Update roadmap with completed items
- Add links to v2 documentation (Golden Path, Agent Tasks)
* docs(claude): update for v2 agent implementation complete
- Update status to v2 Agent Implementation Complete
- Add agent system architecture section with component table
- Add execution flow diagram for agent orchestration
- Document critical state separation pattern (Agent→AgentState, Runtime→TaskStatus)
- Add recent updates section with bug fixes
* feat(agent): add error classification and self-correction for technical errors
Previously, the agent would create blockers for any error matching patterns
like "not found" or "missing". This caused technical errors (syntax errors,
file not found, import errors) to block execution when the agent should
solve them automatically.
Changes:
- Add HUMAN_INPUT_PATTERNS for genuine human-needed situations (credentials,
unclear requirements, design decisions)
- Add TECHNICAL_ERROR_PATTERNS for errors agent can self-correct (file not
found, syntax errors, import errors)
- Add _classify_error() to categorize errors
- Add _attempt_self_correction() to use LLM to fix technical errors
- Update _execute_plan() to try self-correction before creating blockers
- Update tests to reflect new behavior
The agent now:
1. Classifies errors as "technical" or "human"
2. For technical errors: tries self-correction (up to 2 attempts)
3. Only creates blockers for human-input-needed situations or after
exhausting self-correction attempts
* feat(blockers): auto-reset task to READY when blocker is answered
When a blocker is answered, the associated task is now automatically
reset to READY status. This eliminates the need for separate "work stop"
and "work resume" commands.
Flow is now:
1. Task runs → hits blocker → status becomes BLOCKED
2. User answers blocker: `cf blocker answer <id> "answer"`
3. Task automatically resets to READY
4. User can restart: `cf work start <id> --execute`
The blocker answer includes the user's input, so the agent will have
access to it when the task is restarted.
* fix(agent): prevent infinite loop when self-correction returns None
The previous code used Python's while...else construct, but when
_attempt_self_correction returned None, we'd break out of the loop
and skip the else block, which meant current_step was never incremented
and the same step would be retried forever.
Fixed by using a flag to track self-correction success and handling
the failure case unconditionally after the loop ends.
* fix(agent): trigger self-correction when verification fails after file edit
Previously, when a file was written successfully but verification (ruff)
detected a syntax error, the agent would:
1. Try ruff --fix (which can't fix syntax errors)
2. Just increment consecutive_failures and move on
This left broken code in the file and continued to the next step.
Now the agent:
1. Detects verification failure after successful file write
2. Triggers self-correction to fix the syntax/code error
3. Re-runs verification after each correction attempt
4. Creates a blocker if self-correction can't fix it
This ensures syntax errors caught by linting get the same self-correction
treatment as other technical errors.
* fix(agent): convert failed VERIFICATION steps to FILE_EDIT for self-correction
When a VERIFICATION step fails (e.g., ast.parse catches a syntax error),
we were trying to "self-correct" the verification step itself, which
doesn't make sense. Now we convert it to a FILE_EDIT step targeting
the same file, so self-correction actually fixes the broken code.
This fixes the case where:
1. File is written with syntax error
2. Ruff doesn't catch it (ruff misses some errors that ast catches)
3. Verification step catches the syntax error
4. Self-correction can now actually fix the file
* docs: add batch execution implementation plan
- Add BATCH_EXECUTION_PLAN.md with phased approach:
- Phase 1: Serial batch execution via conductor
- Phase 2: Parallel execution with dependency analysis
- Phase 3: Observability and websocket streaming
- Update CLI_WIREFRAME.md:
- Add conductor.py and dependency_analyzer.py to module layout
- Add cf work batch commands (batch, status, cancel)
- Update implementation order with batch phases
Design decisions:
- Subprocess-based execution (isolation, crash-safe)
- No server required (CLI-first)
- Serial by default, parallel opt-in
* docs: organize planning docs, mark Golden Path complete
- Move completed planning docs to docs/finished/:
- AGENT_IMPLEMENTATION_TASKS.md (all 8 tasks done)
- REFACTOR_PLAN_FOR_AGENT.md (Steps 0-6 complete)
- Update GOLDEN_PATH.md:
- Mark acceptance checklist as complete (2025-01-14)
- Reference BATCH_EXECUTION_PLAN.md as next phase
- Add docs/finished/README.md explaining folder purpose
Active docs remaining:
- GOLDEN_PATH.md (architecture contract)
- CLI_WIREFRAME.md (command reference)
- BATCH_EXECUTION_PLAN.md (next phase)
* feat(batch): implement Phase 1 batch execution
Add multi-task batch execution support with serial execution strategy.
New components:
- core/conductor.py: Batch orchestration with subprocess execution
- BatchRun model with status tracking (PENDING, RUNNING, COMPLETED, PARTIAL, FAILED, CANCELLED)
- On-failure behavior (continue or stop)
CLI commands:
- cf work batch <task-ids...> - Execute multiple tasks
- cf work batch --all-ready - Execute all READY tasks
- cf work batch-status [batch-id] - Show batch status
- cf work batch-cancel <batch-id> - Cancel running batch
Schema updates:
- batch_runs table with auto-migration for existing workspaces
- Batch event types (BATCH_STARTED, BATCH_TASK_*, BATCH_COMPLETED, etc.)
Tests:
- 23 new tests for conductor module (all passing)
Phase 2 will add parallel execution with dependency analysis.
* refactor(cli): restructure batch commands to use subcommand group
Changed from hyphenated commands to proper subcommand structure:
- cf work batch-status -> cf work batch status
- cf work batch-cancel -> cf work batch cancel
- cf work batch <ids> -> cf work batch run <ids>
Created batch_app Typer subcommand group with run, status, cancel.
Updated CLI_WIREFRAME.md and BATCH_EXECUTION_PLAN.md to reflect changes.
Marked Phase 1 as complete in both docs.
* test(conductor): add integration tests for batch failure scenarios
Added 9 new tests in TestBatchExecution class:
- test_all_tasks_succeed: verifies COMPLETED status
- test_some_tasks_fail_continue: PARTIAL status with on_failure=continue
- test_task_fails_stop: stops execution with on_failure=stop
- test_all_tasks_fail: FAILED status when all tasks fail
- test_task_blocked: handles BLOCKED tasks correctly
- test_mixed_results: tracks COMPLETED, FAILED, BLOCKED together
- test_first_task_fails_stop: stops immediately on first failure
- test_batch_completed_at_set: timestamp set after execution
- test_on_event_callback_called: callback receives all events
Total: 32 tests (was 23)
* feat(agent): add self-correction capabilities and model flexibility
LLM adapter changes:
- Add CORRECTION purpose for self-correction (uses stronger model)
- Add environment variable overrides for all model selections:
CODEFRAME_PLANNING_MODEL, CODEFRAME_EXECUTION_MODEL,
CODEFRAME_GENERATION_MODEL, CODEFRAME_CORRECTION_MODEL
- Default correction model: claude-opus-4-5 for fixing errors
Agent changes:
- Add _extract_file_from_command() to parse verification targets
- Add debug logging capability with --debug flag
- Improve self-correction flow when verification fails
- Convert failed VERIFICATION steps to FILE_EDIT for re-attempt
These changes support automatic error recovery during batch execution.
* docs: add retry/self-correction future enhancements section
Updated BATCH_EXECUTION_PLAN.md:
- Added "Future Enhancements: Retry & Self-Correction" section
- Documented three retry options: --retry flag, resume command, escalation
- Added decision points for Phase 2 planning
- Updated references to point to finished/ folder
Updated CLI_WIREFRAME.md:
- Renamed Phase 2 to "Parallel Execution & Retry"
- Added --retry N flag and batch resume command to roadmap
- Renumbered Phase 3 items
* feat(batch): implement batch resume command
Added resume_batch() function to conductor.py:
- Re-runs failed/blocked tasks from a previous batch
- --force flag re-runs all tasks including completed ones
- Merges results into existing batch record
- Preserves completed task results when not using force
Added CLI command:
- cf work batch resume <batch-id> [--force]
- Supports partial batch ID matching
- Shows helpful output about what will be re-run
Added 9 tests for resume scenarios:
- Resume PARTIAL/FAILED batches
- Force mode re-runs all tasks
- Handles blocked tasks
- Preserves completed results
- Edge cases (no failed tasks, still failing)
Updated docs:
- CLI_WIREFRAME.md with resume command details
- BATCH_EXECUTION_PLAN.md marks Option B as implemented
- Phase 2 shows resume as complete
Total tests: 41 (was 32)
* feat(batch): add --retry N flag for automatic task retry
- Add _execute_retries() function in conductor.py for retry loop
- Add max_retries parameter to start_batch()
- Add --retry/-r option to CLI batch run command
- Retry only FAILED tasks (not BLOCKED which need human intervention)
- Stop early if all tasks succeed before exhausting retries
- Add 8 tests for retry functionality (49 total conductor tests)
- Update docs to mark retry flag as implemented
* feat(tasks): add depends_on field for task dependencies
- Add depends_on field to Task dataclass (default empty list)
- Add depends_on column to tasks table schema with migration
- Add update_depends_on() function to modify task dependencies
- Add get_dependents() function to find tasks that depend on a given task
- Validate against self-references and nonexistent dependencies
- Add 15 tests for dependency functionality
- Update docs to mark this Phase 2 item as complete
* feat(batch): add dependency graph analysis for parallel execution
- Create dependency_graph.py module for DAG operations
- Implement build_graph() to construct dependency graph from tasks
- Implement detect_cycle() for circular dependency detection
- Implement topological_sort() for execution order
- Implement group_by_level() for parallel execution groups
- Create ExecutionPlan dataclass with groups, task_order, and graph
- Add validate_dependencies() for pre-execution validation
- Add CycleDetectedError exception class
- Add 34 tests for all graph operations
- Update docs to mark this Phase 2 item as complete
* feat(batch): implement parallel execution with worker pool
- Add _execute_parallel() using ThreadPoolExecutor for concurrent tasks
- Create execution plan using dependency graph to group tasks by level
- Tasks in the same group run in parallel, groups execute sequentially
- Add _execute_single_task() and _execute_group_parallel() helpers
- Respect max_parallel limit for worker pool size
- Fall back to serial execution if circular dependencies detected
- Add 7 tests for parallel execution scenarios
- Update existing test that expected "not implemented" warning
- Update docs to mark parallel execution as complete
Phase 2 now complete: batch resume, retry, depends_on, dependency
graph, and parallel execution all implemented and tested.
* feat(batch): add --strategy auto for LLM-based dependency inference
Adds intelligent dependency analysis using LLM to automatically infer
task dependencies from descriptions when --strategy auto is used.
- Add dependency_analyzer.py with LLM-powered task analysis
- Integrate auto strategy into conductor with fallback to serial
- Update CLI help text to describe strategy options
- Mark Phase 2 as complete in documentation
* docs: update all v2 documentation for Phase 2 completion
- Update status badges and test counts in README.md
- Add Phase 2 batch features to "What's New" section
- Add batch execution CLI commands to both README.md and CLAUDE.md
- Update roadmap to show Phase 2 complete, Phase 3 in progress
- Add new modules (conductor, dependency_graph, dependency_analyzer) to repo structure
- Mark Phase 2 acceptance criteria as complete in BATCH_EXECUTION_PLAN.md
* feat(batch): add live streaming via batch_follow command
Phase 3 observability features:
- BatchProgress class for ETA calculation based on task durations
- `cf work batch follow <id>` for real-time terminal streaming
- Rich Live display with progress panel and event log
- Handles terminal events (COMPLETED, FAILED, PARTIAL, CANCELLED)
- 27 unit tests for BatchProgress class
* feat(cli): add bulk status update with --all and --from flags
New usage:
cf tasks set status READY --all # All tasks to READY
cf tasks set status READY --all --from BACKLOG # Only BACKLOG -> READY
cf tasks set status READY abc123 # Single task (unchanged)
Skips tasks already at target status and reports counts.
* test(cli): add comprehensive tests for tasks set bulk operations
Tests for --all and --from flags:
- Bulk update all tasks to a status
- Filter updates by source status with --from
- Skip tasks already at target status
- Single task updates (backward compatibility)
- Error handling (missing args, invalid status, empty workspace)
Also fixes typer.Exit being caught by generic exception handler.
* feat(cli): add Deps column to tasks list output
Shows task dependencies in the table:
- "-" for no dependencies
- Short IDs (6 chars) for 1-2 dependencies
- "N tasks" for 3+ dependencies
* feat(tasks): add delete command and generate --overwrite flag
New functionality:
- `cf tasks delete <id>` - delete single task (with --force to skip confirm)
- `cf tasks delete --all` - delete all tasks (with confirmation)
- `cf tasks generate --overwrite` - clear existing tasks before generating
Core module additions:
- tasks.delete(workspace, task_id) -> bool
- tasks.delete_all(workspace) -> int
The delete command warns when deleting tasks that others depend on.
Without --overwrite, tasks generate appends (supports multi-PRD projects).
15 new tests covering all CRUD operations.
* test: add v2 marker for CLI-first tests
- Register `v2` marker in pytest.ini
- Auto-mark all tests/core/ as v2 via conftest.py
- Add pytestmark to v2 CLI test files
- Document convention in CLAUDE.md
Run v2 tests only: `uv run pytest -m v2`
Currently 411 v2 tests covering headless functionality.
* fix(cli): correct argument order for tasks set status command
The command now uses natural order: `cf tasks set status <task_id> <value>`
instead of `<value> <task_id>`. This matches user expectations and other
CLI conventions.
Changes:
- Swap task_id and value argument positions in function signature
- Add argument parsing logic to handle both single task and --all modes
- Fix variable references from task_id to actual_task_id
- Update tests to use corrected argument order
* feat(agent): add autonomous decision-making and AGENTS.md support
Add comprehensive improvements to reduce false blockers and enable
autonomous agent decision-making for tactical code decisions.
Key changes:
- Add AGENTS.md/CLAUDE.md preferences loading (agents_config.py)
- Split blocker patterns into tactical/human/technical categories
- Add autonomy directives to planning and execution prompts
- Add Purpose.SUPERVISION for supervisor model selection
- Add --all-blocked option to batch run command
- Add --reset flag to batch resume command
- Add reset_blocked_run() to clear blocked runs for re-execution
Agents now make autonomous decisions for tactical choices like:
- File handling (overwrite, merge, extend)
- Package manager and version selection
- Test framework configuration
- Code style decisions
Blockers are only created for true requirements ambiguity,
access/credential issues, or technical errors after exhausting
self-correction attempts.
* fix(agent): prevent tactical questions from becoming blockers
The previous implementation still created blockers for tactical decisions
because:
1. _generate_blocker_question didn't tell the LLM to avoid tactical questions
2. _create_verification_blocker always created blockers for pytest failures
3. No filtering of generated questions before creating blockers
Fixes:
- Update _generate_blocker_question prompt to explicitly instruct LLM to:
- Return "RESOLVE_AUTONOMOUSLY: <decision>" for tactical decisions
- Return "TECHNICAL_FIX: <fix>" for technical issues
- Only generate questions for true human-required decisions
- Update _create_blocker_from_failure to:
- Detect RESOLVE_AUTONOMOUSLY and TECHNICAL_FIX directives
- Filter tactical patterns (venv, pip, pytest.ini, fixture scope, etc.)
- Auto-resolve instead of creating blockers
- Update _create_verification_blocker to:
- Mark verification failures as FAILED (not BLOCKED)
- Let retry mechanism handle technical test failures
- Stop creating "pytest failed, what should I do?" blockers
This should eliminate blockers for:
- Virtual environment creation questions
- Package manager choices
- Asyncio fixture scope configuration
- Pytest verification failures
* feat(conductor): add supervisor-level blocker resolution
Add SupervisorResolver to handle tactical blockers at the conductor level
instead of letting each worker agent create blockers independently.
Key changes:
- Add SupervisorResolver class with:
- Decision cache for deduplication across workers
- Pattern-based tactical question detection
- Supervision model classification for uncertain cases
- Auto-answer with cached decisions
- Integrate supervisor into all execution paths:
- _execute_serial: intercepts BLOCKED, tries resolution, retries
- _execute_single_task: same pattern for parallel execution
- execute_agent (runtime.py): single task execution also uses supervisor
- Benefits:
- No duplicate questions (cached per workspace)
- Stronger model (SUPERVISION) makes classification decisions
- Workers create blockers, supervisor filters tactical ones
- Only true human-required decisions surface as blockers
Flow: Worker -> BLOCKED -> Supervisor evaluates ->
Tactical? Auto-resolve + retry : Surface to user
* test(supervisor): add comprehensive tests for SupervisorResolver
Adds 27 tests covering:
- Tactical pattern detection (venv, package managers, config, questions)
- Decision cache key generation for deduplication
- Tactical resolution generation
- Blocker resolution with cache usage
- Supervisor singleton management
- LLM classification fallback with graceful error handling
Also fixes cache key generation to recognize "virtualenv" pattern.
* feat(batch): add stop command with graceful and force modes
Adds `cf work batch stop <id>` command to interrupt running batches:
- Graceful stop (default): Sets batch to CANCELLED, current task finishes
- Force stop (--force): Terminates running processes with SIGTERM immediately
Implementation details:
- Added process tracking via _active_processes dict in conductor.py
- Modified _execute_task_subprocess to use Popen and track processes
- Added stop_batch() function with force parameter
- Added 6 tests for stop functionality
This allows users to safely interrupt stuck batches from another terminal.
* refactor(cli): remove duplicate batch cancel command
The 'batch stop' command supersedes 'batch cancel':
- stop (default): graceful stop, same as cancel was
- stop --force: terminates running processes
Keeping cancel_batch() in conductor.py for internal use.
* fix(runtime): add FAILED status and fix fail_run() state management
- Add FAILED status to TaskStatus enum with transitions to READY/IN_PROGRESS
- Fix fail_run() to update task status (was leaving tasks stuck in IN_PROGRESS)
- Add supervisor handling for FAILED tasks with auto-retry on tactical errors
- Fix load_preferences() to fall back to defaults when no AGENTS.md exists
- Add new tactical patterns: externally-managed, no module named, __main__
- Add --review flag to batch run for verification gates after completion
- Add CLI test report and quickstart guide documentation
* fix(planner): include AGENTS.md preferences in planning prompt
The preferences from ~/.codeframe/AGENTS.md were being loaded into
the TaskContext but never included in the prompt sent to the LLM.
This meant agents were using pip instead of uv despite the global
config specifying uv as the package manager.
Now the planner's _build_prompt() includes the preferences section
from context.preferences.to_prompt_section() right after the task
information, ensuring the LLM sees tooling preferences like:
- package_manager: uv
- Commands: uv sync, uv run pytest, etc.
* fix(runtime): extract error message from AgentState correctly
AgentState doesn't have an 'error' attribute. The fix now extracts
error info from:
1. state.blocker.reason if there's a blocker
2. Last step result's error/output
3. Gate results failure output
This fixes the AttributeError when supervisor tries to help with
failed tasks.
* fix(schema): add FAILED status to tasks table CHECK constraint
The state_machine.py was updated with FAILED status but the database
CHECK constraint in workspace.py wasn't updated, causing
IntegrityError when trying to set task status to FAILED.
* fix(runtime): remove invalid context parameter from blockers.create()
* feat(agent): implement verification self-correction loop
Add LLM-powered self-correction during final verification:
- Convert _run_final_verification to use retry loop with max_attempts
- Add _attempt_verification_fix method that collects gate errors and uses
LLM to generate targeted file edits
- Try ruff --fix first for quick lint fixes
- LLM generates JSON fix plan with file edits
- Apply fixes and re-run verification in loop
- Gracefully give up when LLM can't generate more fixes
Also adds diagnostic logging to runtime.py for supervisor intervention
analysis.
The self-correction loop now:
1. Detects verification failures (pytest, ruff)
2. Calls LLM with error messages for targeted fixes
3. Applies fixes (file edits/creates)
4. Re-runs verification up to max_attempts
5. Falls through to FAILED if unfixable
* feat(cli): add --verbose flag for self-correction diagnostics
Add --verbose / -v flag to control diagnostic output:
- CLI: work start --verbose prints detailed verification progress
- Agent: _verbose_print() helper for conditional output
- Runtime: pass verbose flag through to agent
Diagnostic messages now only appear when --verbose is enabled:
- [VERIFY] verification attempt status
- [SELFCORRECT] LLM fix generation progress
This keeps normal output clean while allowing detailed tracing when needed.
* docs(readme): update for self-correction loop and verbose mode
- Add 2026-01-16 "What's New" section with self-correction features
- Document --verbose flag for observability
- Move batch execution to collapsible "Previous" section
- Add QUICKSTART.md and CLI_V2_TEST_REPORT.md to documentation links
- Update Key Features with self-correction and verbose mode
- Update roadmap with completed items and current phase focus
* docs(claude.md): update for self-correction loop and verbose mode
- Update status to Phase 2+ with self-correction and observability
- Add new features: verbose mode, self-correction loop, FAILED status
- Update execution flow diagram with self-correction details
- Add --verbose flag to CLI commands section
- Add 2026-01-16 Recent Updates section with new methods
* docs: add comprehensive feature roadmap for v2
Planned outward from existing functionality toward fully autonomous
agentic coding system. 10 phases covering:
- Phase 3: Agent Reliability (env config, error surfacing, self-correction)
- Phase 4: Continuous Execution (watch mode, streaming, graceful interrupts)
- Phase 5: Idea → PRD Generation (interactive creation, config collection)
- Phase 6: Git Integration (passthrough, smart defaults, PR workflow)
- Phase 7: Multi-Agent Coordination (roles, handoff, parallel execution)
- Phase 8: Observability & History (timeline, replay, debug)
- Phase 9: TUI Dashboard (Rich/Textual, interactive control)
- Phase 10: Remote Access & Metrics (webhooks, API, cost tracking)
Key decisions: CLI-first, user-configured environment, branch-per-batch,
git passthrough over reimplementation, multi-agent before TUI, FastAPI
only for webhooks/external access.
* chore(beads): add Phase 3 Agent Reliability issues
Closed all v1 legacy issues (superseded by v2 roadmap).
Created Phase 3 epic with 4 features and 14 tasks:
- 3.1 Environment Configuration (4 tasks)
- 3.2 Error Surfacing (3 tasks)
- 3.3 Smarter Context Loading (2 tasks)
- 3.4 Enhanced Self-Correction (3 tasks)
- Phase 3 test coverage (1 task)
All dependencies configured for proper execution order.
* feat(config): add v2 environment configuration with YAML support
Implements EnvironmentConfig dataclass for project environment settings:
- Package manager (uv, pip, poetry, npm, pnpm, yarn)
- Python/Node version configuration
- Test framework (pytest, jest, vitest, etc.)
- Lint tools (ruff, eslint, prettier, etc.)
- Context loading limits (max_files, max_tokens)
- Custom command overrides
Features:
- YAML serialization/deserialization (.codeframe/config.yaml)
- Validation for known values with helpful error messages
- Command generation (get_install_command, get_test_command, get_lint_command)
- Coexists with legacy v1 JSON config
31 tests passing covering all functionality.
Closes: codeframe-5r7n
* feat(cli): add config subcommand for v2 environment configuration
Add cf config init|show|set commands for managing project environment
configuration stored in .codeframe/config.yaml:
- config init: Interactive or auto-detect setup (--detect, --force flags)
- config show: Display current configuration
- config set: Set individual config values (package_manager, test_framework, etc.)
Includes auto-detection for package managers (uv/pip/poetry/npm/yarn/pnpm),
test frameworks (pytest/jest/vitest), and lint tools (ruff/eslint/prettier).
* feat(agent): integrate environment config into agent execution
Updates context loader and planner to use project environment configuration:
- context.py: Load EnvironmentConfig as part of TaskContext
- context.py: Include environment section in to_prompt_context()
- planner.py: Include config in planning prompt with exact commands
- Tests: Add 3 new tests for environment config integration
The agent now knows the correct package manager, test framework,
and lint commands to use based on .codeframe/config.yaml.
* docs: add environment configuration documentation
Update all key documentation to explain the new config workflow:
- README.md: Add config commands to CLI section, "What's New", Quick Start
- QUICKSTART.md: Add Step 2 for environment configuration
- CLAUDE.md: Add Phase 3.1 update, config commands in CLI section
- CLI_WIREFRAME.md: Add Configuration section with command mapping
The happy path now includes:
1. cf init
2. cf config init --detect (auto-detect package manager, test framework)
3. cf prd add
4. cf tasks generate
5. cf work start --execute
* fix(config): improve UX for greenfield projects with no files to detect
- Refactor _detect_environment_config() to return tuple (config, detected_items)
- Track what was actually detected vs defaulted
- Show different messages based on detection results:
- When detected: "Detected from project files:" with bullet list
- When nothing found: "No project files found to detect from."
with guidance on using defaults and customization options
* refactor(config): replace structured config with natural language tech_stack
- Add tech_stack field to Workspace model with database migration
- Add --tech-stack, --detect, --tech-stack-interactive flags to init command
- Remove cf config subcommand entirely (was Python-centric)
- Update TaskContext and Planner to use natural language tech_stack
- Simplify configuration: users describe stack, agent adapts
Design philosophy: Instead of hardcoded package_manager, test_framework,
lint_tools enums, users describe their stack in natural language
(e.g., "Rust project using cargo", "TypeScript monorepo with pnpm").
Works with any technology without code changes.
Future work: Multi-round interactive discovery (bead: codeframe-8d80)
* feat(agent): add enhanced self-correction with fix tracking and quick fixes
Implements three capabilities to improve agent self-correction:
1. Fix Attempt Tracking (fix_tracker.py):
- Normalize and hash errors for deduplication
- Track attempted fixes to prevent repeating failures
- Escalation thresholds: 3 same-error, 3 same-file, 5 total
2. Pattern-Based Quick Fixes (quick_fixes.py):
- Match common errors without LLM calls
- ModuleNotFoundError → install package (with package aliases)
- ImportError/NameError → add missing imports
- SyntaxError/IndentationError → apply common fixes
- Auto-detect package manager (uv, pip, npm, yarn, etc.)
3. Escalation to Blocker:
- Create informative blockers when self-correction exhausted
- Include error type, attempted fixes, and guidance questions
- Prevents infinite fix loops
Closes: codeframe-5ned, codeframe-4tjy, codeframe-l2lm, codeframe-uwbu
* feat(agent): enhanced self-correction with project context and shell commands
Self-correction improvements:
- Add _build_self_correction_context() to include project structure,
config files, tech stack, and modified files in fix prompts
- Add FixScope enum (LOCAL/GLOBAL) and _classify_fix_scope() to
determine coordination requirements for parallel agents
- Enable shell command execution during self-correction (uv pip install, etc.)
- Fix StepResult attribute access (file_changes instead of files_created)
Coordination infrastructure:
- Add GlobalFixCoordinator class for thread-safe fix deduplication
- Coordinator tracks pending/completed fixes to prevent conflicts
- Wire coordinator through runtime.execute_agent()
Gate fixes:
- Update _run_ruff() to use 'uv run ruff' like pytest does
- Ensures ruff runs in target project's environment, not system-wide
* docs: add agent tool system to roadmap (codeframe-p77g)
- Mark Phase 3.4 Enhanced Self-Correction as complete
- Document shell command execution and FixScope classification
- Add Phase 3.5 placeholder for future Agent Tool System
- References bead codeframe-p77g for full spec
* feat: Transform CodeFRAME v2 MVP from basic task automation to AI-driven development orchestration
## 🎯 Enhanced MVP Definition
- Replace basic "Add a PRD" with AI-driven interactive PRD generation
- Upgrade single-task execution to intelligent batch orchestration
- Integrate complete Git workflow with PR management instead of basic artifact export
- Add comprehensive checkpointing with state restoration capabilities
## 🚀 Key Architectural Shifts
### AI-Driven Project Discovery
- Interactive AI sessions gather requirements, constraints, and success criteria
- Generates comprehensive PRD with technical specs, user stories, and acceptance criteria
- Supports iterative refinement with versioning and change tracking
- Enhanced `prd generate`, `prd refine` commands replace basic `prd add`
### Batch-First Execution Model
- Main orchestrator agent coordinates multiple tasks (not single task execution)
- Dependency-aware scheduling with serial/parallel/auto strategies
- Real-time progress monitoring with event streaming
- Inter-task communication and resource management
### Integrated Git/PR Workflow
- Automatic branch creation per task/batch with naming conventions
- AI-generated comprehensive PR descriptions with business impact analysis
- Automated verification gates and multi-strategy merging
- New `pr create`, `pr merge`, enhanced `work start --create-branch` commands
### Enhanced Quality Gates & Checkpointing
- Comprehensive test suite: unit, integration, security, performance
- AI-assisted code review with best practices compliance
- Rich checkpoint snapshots with complete workspace state and git refs
- Executive reporting with progress metrics and risk assessment
## 📋 Updated State Machine
- Added IN_REVIEW, MERGED, FAILED statuses for complete lifecycle
- Comprehensive transition mapping for PR workflow integration
- Automated state transitions triggered by Git/PR operations
## 🔄 Implementation Priority Reordering
- Phase 0: Enhanced PRD & Discovery (NEW HIGH PRIORITY)
- Phase 1: Enhanced Task Generation (NEW HIGH PRIORITY)
- Phase 2: Git Integration & PR Workflow (NEW HIGH PRIORITY)
- Maintains backward compatibility with existing Golden Path features
## 📚 Documentation Updates
- GOLDEN_PATH.md: Transforms from 7-step basic workflow to 9-phase advanced MVP
- CLI_WIREFRAME.md: Adds new commands and reorders implementation priorities
- Enhanced acceptance checklist with 28 detailed validation criteria
- Complete module layout updates including `git_integration.py`
This redefines CodeFRAME v2 from a task automation tool to an AI-driven
development orchestration platform capable of end-to-end software project management.
* analysis: Identify critical CLI workflow gaps and implementation roadmap
## 🔍 Gap Analysis Summary
**Most Critical Finding**: Missing credential management system would impact 100% of users
- Authentication failures at PRD generation, batch execution, and PR creation
- Users must manually manage API keys across multiple providers
- No validation or health checking for configured credentials
## 📊 Complete Gap Matrix
### Critical (Showstopper) Issues:
1. **Credential Management** - No auth setup/list/validate commands
2. **Environment Validation** - No pre-flight tool checking
3. **Real-time State Backup** - No auto-checkpointing during batches
4. **Partial Recovery** - Only full rollback, no granular recovery
### Medium (High Frustration) Issues:
5. **Dependency Conflict Resolution** - Circular/hard dependency handling
6. **Integration Testing** - No pre-PR validation of changes
### Quality (Minor Annoyance) Issues:
7. **Rich Monitoring** - Limited debugging for failed tasks
8. **Template Management** - No reusable configurations
9. **Workflow Automation** - No pattern reuse capabilities
## 🚀 4-Week Implementation Plan
### Week 1-2: Foundation Infrastructure
- Week 1: Comprehensive credential management system (`codeframe auth`)
- Week 2: Environment validation + incremental state persistence
### Week 3-4: Robustness Enhancements
- Week 3: Granular recovery + dependency conflict resolution
- Week 4: Integration testing + enhanced monitoring
## 📋 Key Implementation Files
**Core Modules to Create**:
- `codeframe/core/credentials.py` - Secure credential storage
- `codeframe/core/environment.py` - Tool validation & auto-install
- `codeframe/core/integration_testing.py` - Pre-PR validation
**CLI Commands to Add**:
- `codeframe auth setup/list/validate/rotate/remove`
- `codeframe env check/doctor/auto-install`
- `codeframe rollback task/last/batch`
- `codeframe test integration/compatibility/breaking-changes`
## 🎯 Expected Impact
**Before**: Theoretically complete MVP but practically frustrating
**After**: Both theoretically complete AND practically reliable CLI
This addresses the critical gap between documented workflow and usable tool.
* update: Accurate CLI workflow implementation status for enhanced MVP
## 📋 Implementation Status Assessment
**Analysis Method**: Examined actual CLI functionality vs. checklist requirements
- Reviewed CLI command implementations in `/codeframe/cli/app.py`
- Verified core functionality by running commands directly
- Identified working features and missing gaps
## ✅ Confirmed Working Components
### Core Infrastructure
- [x] `codeframe init` - Basic and enhanced (detect, interactive) modes
- [x] `codeframe status` - Comprehensive workspace display with PRD, tasks, events
- [x] Core workspace management - State persistence and recovery
- [x] Event system - Rich logging and streaming capabilities
### Basic PRD & Task Management
- [x] `codeframe prd add <file.md>` - File-based PRD storage
- [x] `codeframe tasks generate` - LLM and simple extraction modes
- [x] `codeframe tasks list` - Task listing with status filtering
- [x] `codeframe tasks set status` - Manual state transitions
- [x] Task CRUD operations (create, update, delete)
- [x] Dependency management with state machine enforcement
### Batch Execution Framework
- [x] `codeframe work batch run` - Multi-strategy execution (serial, parallel, auto)
- [x] `codeframe work batch status` - Batch monitoring and reporting
- [x] `codeframe work batch follow` - Real-time event streaming
- [x] `codeframe work batch resume` - Failed task recovery
- [x] `codeframe work start <task-id>` - Individual task execution
- [x] `codeframe work stop/resume/status` - Task lifecycle management
- [x] Main orchestrator with comprehensive failure handling
- [x] Event-driven progress tracking and ETA calculation
### Quality Gates & Verification
- [x] `codeframe review` - Multi-gate execution framework
- [x] `codeframe summary` - Comprehensive workspace reporting
- [x] Gate framework with extensible architecture
- [x] Test execution with coverage and reporting
### Checkpointing & State Management
- [x] `codeframe checkpoint create` - Rich state snapshots
- [x] `codeframe checkpoint list/show/restore` - Complete checkpoint lifecycle
- [x] Git reference integration for branch tracking
- [x] State restoration and recovery procedures
### Human-in-the-Loop Features
- [x] `codeframe blockers list` - Rich blocker context display
- [x] `codeframe blocker answer <id>` - Interactive resolution system
- [x] Blocker learning and pattern recognition
- [x] Integration with task lifecycle management
### Cross-Cutting Requirements
- [x] **CLI-first operation** - All commands work without FastAPI dependency
- [x] **Event logging** - Comprehensive audit trail and observability
- [x] **Error handling** - Graceful failure recovery and user guidance
- [x] **Performance** - Efficient batch processing and parallel execution
## ⚠️ Identified Gaps (Critical vs. Minor)
### 🔥 Critical Gaps (Would Block Workflow)
1. **No `codeframe prd generate`** - Enhanced MVP requires AI-driven PRD generation
- **Current Status**: Only basic `prd add` exists
- **Impact**: 100% of users would hit this gap immediately
2. **No `codeframe auth` system** - Credential management infrastructure
- **Current Status**: Basic auth commands exist but lack comprehensive management
- **Impact**: Authentication failures would block entire workflow
3. **No environment validation** - Pre-flight tool checking
- **Current Status**: No validation commands exist
- **Impact**: Batch failures mid-execution due to missing tools
### ⚡ Medium Gaps (High Frustration)
4. **No `codeframe pr create/merge`** - Git/PR workflow CLI commands
- **Current Status**: GitHub integration exists but no CLI commands
- **Impact**: Manual PR creation required for final workflow step
5. **Limited dependency conflict resolution** - Advanced task dependency management
- **Current Status**: Basic dependency analysis exists
- **Impact**: Complex projects may have unresolvable dependency loops
### 🔧 Quality Gaps (Minor Annoyance)
6. **No AI-assisted code review** - Enhanced quality gates
- **Current Status**: Basic verification only
- **Impact**: Missed opportunities for automated code improvement
7. **No enhanced monitoring/debugging** - Rich CLI experience
- **Current Status**: Basic event streaming exists
- **Impact**: Difficult to debug complex failures
## 🎯 Overall Assessment
### Current State: **~60% Complete**
- **Foundation**: Strong - Core CLI, basic PRD, tasks, batch execution ✅
- **Enhanced Features**: Missing - AI PRD generation, Git/PR CLI, auth management ⚠️
- **Robustness**: Partial - Basic recovery exists, advanced recovery missing ⚠️
- **Quality**: Basic - Verification works, enhanced features missing ⚠️
### Critical Path Forward
1. **Immediate (Week 1-2)**: Implement `codeframe prd generate` and credential management
2. **Short-term (Week 3-4)**: Add Git/PR CLI commands and environment validation
3. **Medium-term (Month 2)**: Enhanced monitoring, AI code review, advanced recovery
**Assessment**: Enhanced MVP has solid foundation but requires critical gaps to be filled for truly usable CLI workflow.
## 📚 Recommendation
**Proceed with gap analysis implementation plan** - Address critical authentication and PRD generation gaps first, then advance to Git/PR integration.
The CLI foundation is production-ready for basic workflows but needs enhanced features to meet full MVP goals.
* docs: Add comprehensive implementation roadmap for enhanced MVP completion
Consolidate gap analysis into phase-wise implementation plan addressing critical credential management, AI-driven PRD generation, and advanced workflow automation features.
## Phase 1 (Weeks 1-2): Foundation Infrastructure
- AI-driven PRD generation system
- Comprehensive credential management
- Enhanced environment validation
## Phase 2 (Weeks 3-4): Core Enhancement
- Advanced task generation with dependency analysis
- Production-ready batch execution
- Enhanced quality gates with AI-assisted review
## Phase 3 (Weeks 5-6): User Experience
- Enhanced blocker resolution with AI suggestions
- Rich monitoring and debugging capabilities
- Performance profiling and observability
## Phase 4 (Weeks 7-8): Integration & Automation
- Complete Git/PR workflow automation
- Template and profile management systems
- Workflow automation and predictive analytics
Transforms CodeFRAME from basic automation tool to comprehensive AI development platform.
* feat(prd): Add comprehensive PRD management commands and versioning (#293)
* feat(prd): Add comprehensive PRD management commands and versioning
Implements a complete PRD management system for the codeframe CLI:
Core PRD functions (codeframe/core/prd.py):
- delete(workspace, prd_id) - Remove a PRD from workspace
- export_to_file(workspace, prd_id, path, force) - Export PRD to file
- create_new_version(workspace, prd_id, content, summary) - Create new version
- get_versions(workspace, prd_id) - List all versions of a PRD
- get_version(workspace, prd_id, version_number) - Get specific version
- diff_versions(workspace, prd_id, v1, v2) - Generate unified diff
CLI commands (codeframe/cli/app.py):
- prd list - List all PRDs with IDs and timestamps
- prd show [id] - Enhanced to accept optional PRD ID
- prd delete <id> [--force] - Delete PRD with confirmation
- prd export <id|latest> <file> [--force] - Export PRD to file
- prd versions <id> - Show version history
- prd diff <id> <v1> <v2> - Show diff between versions
- prd update <id> <file> -m <message> - Create new version
Database schema additions:
- version (INTEGER) - Version number for PRD
- parent_id (TEXT) - Links to previous version
- change_summary (TEXT) - Description of changes
Includes 68 tests covering core functions and CLI commands.
* Update codeframe/core/prd.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* fix: Address code review issues for PRD versioning
- Add chain_id field to PrdRecord and prds table schema
- Add database indexes on parent_id and chain_id columns
- Make version number increment atomic with explicit transactions
- Optimize get_versions() to use single query with chain_id
- Add list_chains() function to list unique PRD chains
- Add delete validation with check_dependencies parameter
- Add PrdHasDependentTasksError exception for dependent tasks
- Update CLI_WIREFRAME.md with new PRD commands documentation
Fixes from code review:
1. Performance: Added idx_prds_parent and idx_prds_chain indexes
2. Architecture: Added chain_id for version grouping
3. Concurrency: Wrapped version creation in explicit transaction
4. N+1 queries: get_versions() now uses single query via chain_id
5. Documentation: Added 7 new PRD commands to CLI_WIREFRAME.md
6. Validation: delete() now checks for dependent tasks
* Update codeframe/cli/app.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/prd.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
---------
Co-authored-by: Test User <test@example.com>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* feat(credentials): Add comprehensive credential management system (#294)
* feat(credentials): Add comprehensive credential management system
Implement secure credential storage and management for CodeFRAME:
Core Module (codeframe/core/credentials.py):
- CredentialProvider enum with env var mappings and display names
- Credential dataclass with expiration, masking, and serialization
- CredentialStore with keyring-first + encrypted file fallback
- CredentialManager as high-level API with env var priority
CLI Commands (codeframe/cli/auth_commands.py):
- setup: Interactive credential configuration with validation
- list: Show all configured credentials with masked values
- validate: Test credential with provider APIs
- rotate: Replace credential atomically with optional validation
- remove: Delete stored credential with confirmation
Workflow Validation (codeframe/core/credential_validator.py):
- Pre-workflow credential checks by workflow type
- require_credential() helper for fail-fast scenarios
- check_llm_credentials() for any-LLM-provider validation
Audit Logging (codeframe/core/credential_audit.py):
- Comprehensive audit trail for all credential operations
- Sensitive value filtering (never logs actual credentials)
- Log rotation support (10MB default)
Integration:
- AnthropicProvider accepts optional credential_manager
- GitHubIntegration accepts optional credential_manager
- Full backward compatibility with environment variables
Tests: 78 new tests covering all functionality
* fix(credentials): Address PR review feedback for security and code quality
Security improvements:
- Add chmod after atomic rename to ensure 600 permissions on all filesystems
- Enhance machine ID derivation to use /etc/machine-id (Linux) or registry
GUID (Windows) for more stable encryption keys
- Replace broad exception handling with specific handlers (InvalidToken,
JSONDecodeError, PermissionError, OSError) with actionable error messages
Code quality fixes:
- Update validate_credential_format() to check actual prefixes (sk-ant-,
sk-, glpat-) as documented in comments, with minimum length of 20 chars
- Clarify list_providers() docstring about keyring enumeration limitation
Bug fixes:
- Improve validation functions to distinguish auth failures from network
errors, timeouts, and rate limiting for better user feedback
- Update tests with appropriately long test credentials
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credential_audit.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* fix(credentials): Address remaining PR review issues
High priority fixes:
- Reject empty/whitespace-only credential values in setup command
- Fix remove command to check credential source before reporting success
(now warns when credential is only set via environment variable)
Medium priority fixes:
- Add salt file validation (must be exactly 16 bytes)
- Add error handling for malformed credential data in from_dict calls
(prevents crashes from corrupted keyring or encrypted store data)
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credential_audit.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credential_audit.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credential_audit.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/cli/auth_commands.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/cli/auth_commands.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/cli/auth_commands.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
---------
Co-authored-by: Test User <test@example.com>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* (via frankbria): Fix _load_encrypted_store to raise exceptions on read errors to prevent (#295)
* feat(credentials): Add comprehensive credential management system
Implement secure credential storage and management for CodeFRAME:
Core Module (codeframe/core/credentials.py):
- CredentialProvider enum with env var mappings and display names
- Credential dataclass with expiration, masking, and serialization
- CredentialStore with keyring-first + encrypted file fallback
- CredentialManager as high-level API with env var priority
CLI Commands (codeframe/cli/auth_commands.py):
- setup: Interactive credential configuration with validation
- list: Show all configured credentials with masked values
- validate: Test credential with provider APIs
- rotate: Replace credential atomically with optional validation
- remove: Delete stored credential with confirmation
Workflow Validation (codeframe/core/credential_validator.py):
- Pre-workflow credential checks by workflow type
- require_credential() helper for fail-fast scenarios
- check_llm_credentials() for any-LLM-provider validation
Audit Logging (codeframe/core/credential_audit.py):
- Comprehensive audit trail for all credential operations
- Sensitive value filtering (never logs actual credentials)
- Log rotation support (10MB default)
Integration:
- AnthropicProvider accepts optional credential_manager
- GitHubIntegration accepts optional credential_manager
- Full backward compatibility with environment variables
Tests: 78 new tests covering all functionality
* fix(credentials): Address PR review feedback for security and code quality
Security improvements:
- Add chmod after atomic rename to ensure 600 permissions on all filesystems
- Enhance machine ID derivation to use /etc/machine-id (Linux) or registry
GUID (Windows) for more stable encryption keys
- Replace broad exception handling with specific handlers (InvalidToken,
JSONDecodeError, PermissionError, OSError) with actionable error messages
Code quality fixes:
- Update validate_credential_format() to check actual prefixes (sk-ant-,
sk-, glpat-) as documented in comments, with minimum length of 20 chars
- Clarify list_providers() docstring about keyring enumeration limitation
Bug fixes:
- Improve validation functions to distinguish auth failures from network
errors, timeouts, and rate limiting for better user feedback
- Update tests with appropriately long test credentials
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credential_audit.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Update codeframe/core/credentials.py
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* Fix _load_encrypted_store to raise exceptions on read errors to prevent data loss
* Remove global keyring disable on store failure in CredentialStore.store()
---------
Co-authored-by: Test User <test@example.com>
Co-authored-by: Frank Bria <frank.bria@proton.me>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
* fix(cli): ensure consistent 'codeframe' usage in help text
Add __main__.py files to enable python -m invocation with proper
program name. Uses Typer's prog_name parameter for reliable usage
line display regardless of invocation method.
- Add codeframe/__main__.py for python -m codeframe
- Add codeframe/cli/__main__.py for python -m codeframe.cli
- Update legacy CLI __main__ blocks with sys.argv[0] fix
- Clarify expire_blockers.py is an internal scheduled task
* fix: address code review findings across LLM adapters and core modules
LLM Adapters:
- Fix conductor import (get_llm_provider -> get_provider)
- Fix _convert_messages to preserve user text with tool_results
- Fix ModelSelector __post_init__ to respect constructor values
- Update model constants to valid Anthropic identifiers
Core Security & Safety:
- Add file deletion safeguards in agent.py (path traversal protection)
- Add safe shell command parsing with allowlist validation
- Improve dangerous command detection in executor.py (regex patterns)
- Add timeout to git subprocess in checkpoints.py
Core Reliability:
- Fix dependency_analyzer to always update deps (clear stale edges)
- Fix dependency_analyzer to use valid loaded task IDs
- Fix f-string prefix insertion in quick_fixes.py
- Fix DB connection handling in runtime.py (try/finally)
- Fix DB connection in tasks.py and use LLM adapter
- Replace debug prints with logging in runtime.py supervisor block
Schema:
- Add depends_on column migration for prds table in workspace.py
* fix(core): Address code review findings for security and reliability
agent.py:
- Fix _try_auto_fix to check ruff returncode and log failures
- Add path safety validation to create/edit actions using _is_path_safe
- Reject shell commands when _parse_command_safely returns requires_shell=True
checkpoints.py:
- Add try/finally blocks to all DB operations for reliable connection cleanup
conductor.py:
- Add _active_processes_lock for thread-safe process tracking
- Add _batch_db_lock for thread-safe batch DB writes in _save_batch
- Fix misleading comment about "temporary" dependencies (they persist)
dependency_analyzer.py:
- Only update dependencies when inferred list is non-empty (preserve existing)
executor.py:
- Use shell=False with shlex.split when no shell operators are present
- Fall back to shell=True only for commands with pipes, redirects, etc.
quick_fixes.py:
- Fix Poetry detection by checking poetry.lock before pyproject.toml
- Add handling for unicode 'u' prefix (don't add 'f' to u-strings)
workspace.py:
- Add depends_on column to initial prds schema creation
- Add idx_prds_depends_on index to initial schema
* style: fix ruff lint errors across codebase
- Fix E741 ambiguous variable name 'l' → 'line' in artifacts.py and gates.py
- Fix E402 module-level import order in test_tasks_crud.py and test_tasks_set_bulk.py
- Remove F401 unused imports across 13 test files and 2 core modules
* ci: disable frontend tests during v2 CLI-first refactor
- Comment out frontend-tests, e2e-smoke-tests jobs (web-ui is legacy)
- Remove Node.js setup from code-quality job
- Add skip checks for web-ui/src in hardcoded-urls job
- Update test-summary to remove frontend-tests dependency
The web-ui package.json is missing; re-enable these jobs when
the frontend is restored.
* fix(core): Address code review findings for reliability and consistency
artifacts.py:
- Track which diff was actually used when falling back from staged to
unstaged, ensuring stats match the exported patch content
dependency_graph.py:
- Remove dead no-op loop in topological_sort that computed in_degree
but only contained pass statements
events.py:
- Add try/finally to emit() to ensure DB connection closes on exception
- Add try/finally to emit_for_workspace() for same reason
- Add try/finally to list_recent() to ensure DB connection closes
gates.py:
- Add ERROR status count to GateResult.summary property
- Fix GATES_STARTED event to report actual empty list vs ["auto"]
- Make unknown gates FAILED (not SKIPPED) when explicitly requested,
with helpful error message listing valid gate names
* fix(cli): Register auth_app and fix test failures
- Register auth_app from auth_commands.py in main CLI app
- Fix test_credential_commands.py tests to mock get_credential_source
- Skip test_serve_command.py tests (serve is stub during v2 refactor)
- Skip test_cli_session.py tests (session management not in v2 Golden Path)
* test: skip WebSocket integration tests during v2 refactor
These tests require a running FastAPI server with full WebSocket support,
but the v2 serve command is a stub. The server adapter will be implemented
post-Golden Path.
* fix(core): Improve stats accuracy and handle empty dependency lists
artifacts.py:
- When falling back to plain unstaged diff (git diff without HEAD),
parse stats directly from patch content via _parse_patch_content_stats()
- _get_diff_stats with staged_only=False runs "git diff HEAD --stat"
which may return zeros for pure working tree changes
dependency_graph.py:
- Fix ValueError when max() is called on empty generator in calculate_level()
- Use max(dep_levels, default=-1) to handle nodes with deps not in graph
- Nodes with no valid in-graph deps are treated as level 0 (root nodes)
* test: skip dashboard integration tests during …





Summary
Changes
Core PRD Functions (
codeframe/core/prd.py)delete(workspace, prd_id)export_to_file(workspace, prd_id, path, force)create_new_version(workspace, prd_id, content, summary)get_versions(workspace, prd_id)get_version(workspace, prd_id, version_number)diff_versions(workspace, prd_id, v1, v2)CLI Commands (
codeframe prd)prd listprd show [id]prd delete <id> [--force]prd export <id|latest> <file> [--force]prd versions <id>prd diff <id> <v1> <v2>prd update <id> <file> -m <msg>Database Schema
Added versioning columns to
prdstable:version(INTEGER DEFAULT 1) - Version numberparent_id(TEXT) - Links to previous versionchange_summary(TEXT) - Change descriptionIncludes automatic migration for existing databases.
Test plan
tests/core/test_prd.py)tests/cli/test_prd_commands.py)