Skip to content

feat(prd): Add comprehensive PRD management commands and versioning#293

Merged
frankbria merged 5 commits into
v2-refactorfrom
feature/prd-command-enhancements
Jan 19, 2026
Merged

feat(prd): Add comprehensive PRD management commands and versioning#293
frankbria merged 5 commits into
v2-refactorfrom
feature/prd-command-enhancements

Conversation

@frankbria

Copy link
Copy Markdown
Owner

Summary

  • Implements complete PRD management system following TDD principles
  • Adds core functions for PRD deletion, export, and versioning
  • Adds CLI commands for managing PRDs with version history support
  • Includes database schema migration for versioning columns

Changes

Core PRD Functions (codeframe/core/prd.py)

Function Purpose
delete(workspace, prd_id) Remove a PRD from workspace
export_to_file(workspace, prd_id, path, force) Export PRD to file
create_new_version(workspace, prd_id, content, summary) Create new version
get_versions(workspace, prd_id) List all versions of a PRD
get_version(workspace, prd_id, version_number) Get specific version
diff_versions(workspace, prd_id, v1, v2) Generate unified diff

CLI Commands (codeframe prd)

Command Description
prd list List all PRDs with truncated IDs and timestamps
prd show [id] Show PRD (optionally by specific ID)
prd delete <id> [--force] Delete PRD with confirmation
prd export <id|latest> <file> [--force] Export to file
prd versions <id> Show version history
prd diff <id> <v1> <v2> Show diff between versions
prd update <id> <file> -m <msg> Create new version from file

Database Schema

Added versioning columns to prds table:

  • version (INTEGER DEFAULT 1) - Version number
  • parent_id (TEXT) - Links to previous version
  • change_summary (TEXT) - Change description

Includes automatic migration for existing databases.

Test plan

  • 38 core PRD tests pass (tests/core/test_prd.py)
  • 30 CLI command tests pass (tests/cli/test_prd_commands.py)
  • Full v2 test suite passes (635 tests)
  • Ruff lint checks pass
  • Manual testing of CLI commands
  • Verify database migration on existing workspace

Implements a complete PRD management system for the codeframe CLI:

Core PRD functions (codeframe/core/prd.py):
- delete(workspace, prd_id) - Remove a PRD from workspace
- export_to_file(workspace, prd_id, path, force) - Export PRD to file
- create_new_version(workspace, prd_id, content, summary) - Create new version
- get_versions(workspace, prd_id) - List all versions of a PRD
- get_version(workspace, prd_id, version_number) - Get specific version
- diff_versions(workspace, prd_id, v1, v2) - Generate unified diff

CLI commands (codeframe/cli/app.py):
- prd list - List all PRDs with IDs and timestamps
- prd show [id] - Enhanced to accept optional PRD ID
- prd delete <id> [--force] - Delete PRD with confirmation
- prd export <id|latest> <file> [--force] - Export PRD to file
- prd versions <id> - Show version history
- prd diff <id> <v1> <v2> - Show diff between versions
- prd update <id> <file> -m <message> - Create new version

Database schema additions:
- version (INTEGER) - Version number for PRD
- parent_id (TEXT) - Links to previous version
- change_summary (TEXT) - Description of changes

Includes 68 tests covering core functions and CLI commands.
@coderabbitai

coderabbitai Bot commented Jan 18, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • develop

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/prd-command-enhancements

Comment @coderabbitai help to get the list of available commands and usage tips.

@macroscopeapp

macroscopeapp Bot commented Jan 18, 2026

Copy link
Copy Markdown
Contributor

Add PRD CLI commands for list, delete, export, versions, diff, and update and implement PRD versioning with chain_id in codeframe/cli/app.py, codeframe/core/prd.py, and codeframe/core/workspace.py

Introduce PRD versioning (version, parent_id, change_summary, chain_id) with schema upgrades and indexes, add CLI handlers for listing, deleting, exporting, viewing versions, diffing versions, and updating PRDs, and emit EventType.PRD_DELETED and PRD_UPDATED events. Key changes start in app.py and core logic in prd.py with workspace migrations in workspace.py.

📍Where to Start

Start with PRD command handlers in app.py, then review versioning data model and operations in prd.py, followed by schema upgrades in workspace.py.


Macroscope summarized 8109343.

Comment thread codeframe/core/prd.py
Comment thread codeframe/core/prd.py Outdated
FOREIGN KEY (parent_id) REFERENCES prds(id)
)
""")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration for prds isn’t resilient: new columns aren’t added and ALTER TABLE assumes the table exists. Suggest checking for the table and missing columns (e.g., sqlite_master, PRAGMA table_info) and only ALTER when needed, or document why prds is guaranteed to exist.

Suggested change
# Migration: Add new columns to existing prds table
cursor.execute("PRAGMA table_info(prds)")
prds_columns = {row[1] for row in cursor.fetchall()}
if "version" not in prds_columns:
cursor.execute("ALTER TABLE prds ADD COLUMN version INTEGER DEFAULT 1")
if "parent_id" not in prds_columns:
cursor.execute("ALTER TABLE prds ADD COLUMN parent_id TEXT")
if "change_summary" not in prds_columns:
cursor.execute("ALTER TABLE prds ADD COLUMN change_summary TEXT")

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment thread codeframe/core/prd.py
if not start:
return []

conn = get_db_connection(workspace)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB connections can leak on exceptions. Suggest always closing via try/finally or with sqlite3.connect(...) as conn:.

🚀 Want me to fix this? Reply ex: "fix it for me".

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

This PR adds comprehensive PRD versioning capabilities including deletion, export, version history, and diff functionality. The implementation follows the v2 headless architecture pattern and includes excellent test coverage (68 new tests).

✅ Strengths

  1. Test Coverage: 38 core tests + 30 CLI tests - comprehensive coverage of new functionality
  2. Clean Code: Well-structured, readable code with good docstrings
  3. Event System: Properly emits PRD_DELETED event for tracking
  4. Schema Migration: Handles adding versioning columns to existing databases
  5. Headless Design: No FastAPI/HTTP dependencies, follows core-first pattern
  6. Error Handling: Good use of exceptions and graceful None returns
  7. Idempotent Operations: Database operations can be safely retried

⚠️ Issues to Address

1. Performance: Missing Database Index

Location: codeframe/core/prd.py:424-509

The get_versions() function queries by parent_id to find children, but there's no index on this column. For workspaces with many PRD versions, this could be slow.

Recommendation: Add index in workspace.py:

cursor.execute("CREATE INDEX IF NOT EXISTS idx_prds_parent ON prds(parent_id)")

2. Architecture: No chain_id for Version Grouping

Location: codeframe/core/prd.py:360-421

The versioning system uses parent_id to chain versions, but there's no way to identify that multiple PRD records are versions of the "same" PRD. This makes it difficult to list all PRD chains in a workspace.

Recommendation: Add a chain_id field that stays constant across versions:

  • store() sets chain_id = prd_id
  • create_new_version() copies parent.chain_id
  • Add query: list_chains(workspace) to get all PRD chains

3. Concurrency: Version Number Not Atomic

Location: codeframe/core/prd.py:384

new_version = parent.version + 1  # Read
# ... later ...
(new_version, ...)  # Write

If two create_new_version calls run concurrently, they could get the same version number. While SQLite's transaction isolation helps, explicit transactions would be clearer.

Recommendation: Wrap the insert in an explicit transaction:

with conn:
    cursor.execute(...)
    conn.commit()

4. Multiple Queries for Version Chain

Location: codeframe/core/prd.py:448-503

The get_versions() function makes N+1 database queries (one per version in the chain). For long chains, this is inefficient.

Recommendation: Consider adding a chain_id field and using a single query:

SELECT * FROM prds WHERE workspace_id = ? AND chain_id = ? ORDER BY version DESC

5. Documentation: Commands Not in CLI_WIREFRAME.md

The following new commands are not documented in docs/CLI_WIREFRAME.md, which is the authoritative command mapping:

  • prd list
  • prd show [id]
  • prd delete
  • prd export
  • prd versions
  • prd diff
  • prd update

Recommendation: Update CLI_WIREFRAME.md to document these commands and their core function mappings.

6. Edge Case: diff_versions Inconsistent Return

Location: codeframe/core/prd.py:576-571

When versions are identical, difflib.unified_diff() returns an empty iterator, and the function returns an empty string. However, when versions don't exist, it returns None. CLI code at app.py:980-982 checks for None but empty strings just print "No differences" which is actually correct behavior.

This is actually working as intended, but consider making the API more explicit:

  • Return None when either version doesn't exist
  • Return empty string when versions are identical (current behavior is fine)

7. Missing Validation on Delete

Location: codeframe/core/prd.py:289-313

def delete(workspace: Workspace, prd_id: str) -> bool:
    # ... deletes PRD ...

Doesn't check if tasks depend on this PRD. Deleting a PRD that has generated tasks leaves orphaned task records.

Recommendation: Add a check:

# Check for dependent tasks
cursor.execute("SELECT COUNT(*) FROM tasks WHERE prd_id = ?", (prd_id,))
task_count = cursor.fetchone()[0]
if task_count > 0:
    raise ValueError(f"Cannot delete PRD with {task_count} dependent tasks")

📋 Code Quality Observations

  • ✅ No SQL injection (parameterized queries used everywhere)
  • ✅ Type hints present on all functions
  • ✅ Proper exception handling (FileNotFoundError, FileExistsError, etc.)
  • ✅ Good use of dataclasses for domain models
  • ✅ Follows repository conventions (no comments, headless design)

🧪 Testing

The test coverage is excellent with 68 new tests. However, consider adding:

  1. Concurrent version creation tests: Verify version number uniqueness
  2. Performance tests: Test with large version chains (100+ versions)
  3. Migration tests: Verify _ensure_schema_upgrades handles edge cases

🔒 Security

No security concerns identified:

  • No user input directly in SQL queries
  • File operations use Path (safe)
  • No secrets or credentials handled
  • Follows principle of least privilege

Summary

This is a high-quality implementation of PRD versioning features that adds significant functionality to CodeFRAME. The code is well-tested, clean, and follows the v2 architecture principles.

Recommended actions before merge:

  1. Add index on prds.parent_id for performance
  2. Update docs/CLI_WIREFRAME.md to document new commands
  3. Consider adding chain_id field for better version grouping (can be a follow-up)
  4. Add validation to prevent deleting PRDs with dependent tasks

Nice to have (follow-up):

  • Optimize get_versions() to use single query with chain_id
  • Add explicit transactions for version creation
  • Add performance tests for large version chains

Overall, this PR is ready to merge with minor improvements for performance and documentation completeness.

@github-actions

Copy link
Copy Markdown
Contributor

Posted comprehensive review comment to the PR. The review covers code quality, performance considerations (missing indexes on parent_id), architecture concerns (no chain_id for version grouping), concurrency issues, documentation gaps in CLI_WIREFRAME.md, edge cases, and validation gaps. The PR is well-implemented with excellent test coverage and follows v2 architecture principles, but would benefit from the recommended performance optimizations and documentation updates before merging.

New%20session%20-%202026-01-18T22%3A16%3A10.151Z
opencode session  |  github run

@frankbria frankbria linked an issue Jan 18, 2026 that may be closed by this pull request
22 tasks
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

Remaining Review Items

Following up on the comprehensive reviews already posted, here are the key issues still needing attention:

🚨 Must Fix Before Merge

1. Missing Index on parent_id (Performance)
Location: codeframe/core/workspace.py

The get_versions() function queries by parent_id to find children. Without an index, this degrades linearly with the number of PRD versions.

# Add after line 96 in _ensure_schema_upgrades():
cursor.execute("CREATE INDEX IF NOT EXISTS idx_prds_parent ON prds(parent_id)")

2. Missing Cycle Protection in get_versions()
Location: codeframe/core/prd.py:446-456

The "walk up" logic to find the root version has no cycle detection. While schema invariants should prevent cycles, defensive code is better:

# Replace lines 446-456 with:
current_id = prd_id
visited_up = set()
max_depth = 1000  # Safety limit

while len(visited_up) < max_depth:
    cursor.execute(
        "SELECT parent_id FROM prds WHERE workspace_id = ? AND id = ?",
        (workspace.id, current_id),
    )
    row = cursor.fetchone()
    if not row or not row[0]:
        break
    if row[0] in visited_up:
        # Cycle detected - treat current as root
        break
    visited_up.add(row[0])
    current_id = row[0]

📋 Documentation Gap

3. Missing CLI Documentation
Location: docs/CLI_WIREFRAME.md

Per AGENTS.md, this file is the authoritative command mapping. Add entries for:

  • prd listlist_prds()
  • prd show [id]get_by_id() / get_latest()
  • prd delete <id>delete()
  • prd export <id|latest> <file>export_to_file()
  • prd versions <id>get_versions()
  • prd diff <id> <v1> <v2>diff_versions()
  • prd update <id> <file> -m <msg>create_new_version()

Summary: The core implementation is solid with excellent test coverage. Addressing the index and cycle protection will improve robustness and performance. Documentation update ensures the command mapping stays authoritative.

Comment thread codeframe/cli/app.py
Comment on lines +986 to +994
for line in diff.splitlines():
if line.startswith("+") and not line.startswith("+++"):
console.print(f"[green]{line}[/green]")
elif line.startswith("-") and not line.startswith("---"):
console.print(f"[red]{line}[/red]")
elif line.startswith("@@"):
console.print(f"[cyan]{line}[/cyan]")
else:
console.print(line)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff lines may contain [/], which Rich treats as markup, causing wrong rendering or errors. Consider disabling markup on these prints and use style= for color.

Suggested change
for line in diff.splitlines():
if line.startswith("+") and not line.startswith("+++"):
console.print(f"[green]{line}[/green]")
elif line.startswith("-") and not line.startswith("---"):
console.print(f"[red]{line}[/red]")
elif line.startswith("@@"):
console.print(f"[cyan]{line}[/cyan]")
else:
console.print(line)
for line in diff.splitlines():
if line.startswith("+") and not line.startswith("+++"):
console.print(line, style="green", markup=False)
elif line.startswith("-") and not line.startswith("---"):
console.print(line, style="red", markup=False)
elif line.startswith("@@"):
console.print(line, style="cyan", markup=False)
else:
console.print(line, markup=False)

🚀 Want me to fix this? Reply ex: "fix it for me".

@github-actions

Copy link
Copy Markdown
Contributor

Review posted to PR #293.

New%20session%20-%202026-01-18T22%3A37%3A59.654Z
opencode session  |  github run

- Add chain_id field to PrdRecord and prds table schema
- Add database indexes on parent_id and chain_id columns
- Make version number increment atomic with explicit transactions
- Optimize get_versions() to use single query with chain_id
- Add list_chains() function to list unique PRD chains
- Add delete validation with check_dependencies parameter
- Add PrdHasDependentTasksError exception for dependent tasks
- Update CLI_WIREFRAME.md with new PRD commands documentation

Fixes from code review:
1. Performance: Added idx_prds_parent and idx_prds_chain indexes
2. Architecture: Added chain_id for version grouping
3. Concurrency: Wrapped version creation in explicit transaction
4. N+1 queries: get_versions() now uses single query via chain_id
5. Documentation: Added 7 new PRD commands to CLI_WIREFRAME.md
6. Validation: delete() now checks for dependent tasks
@github-actions

Copy link
Copy Markdown
Contributor

Follow-up Review: Most Issues Addressed ✅

I reviewed the current state of the code and found that most issues from previous reviews have been addressed. The implementation has evolved significantly since the initial reviews.

✅ Issues Fixed in Current Code

  1. Missing index on parent_id - FIXED

    • Added at and also added index on chain_id at line 280
  2. No chain_id for version grouping - FIXED

    • chain_id column fully implemented and used throughout
    • New function added for listing PRD chains
  3. Concurrency: Version number not atomic - FIXED

    • now uses explicit transaction with (prd.py:476)
  4. Multiple queries for version chain - FIXED

    • now uses single query with chain_id (prd.py:581-590)
  5. Missing validation on delete - FIXED

    • Added exception
    • function has parameter
  6. Migration resilience - FIXED

    • Migration code properly uses before each ALTER (workspace.py:257-276)
    • Includes backfill for chain_id for existing PRDs
  7. lineterm diff issue - FIXED

    • Uses instead of empty string (prd.py:667)
  8. DB connection leaks - FIXED

    • uses (prd.py:545)

⚠️ One Remaining Issue

Diff Lines May Contain Markup Characters
Location:

Diff lines can contain and characters (e.g., in code snippets), which Rich treats as markup. This can cause rendering issues or errors.

Current code:

Recommended fix:

This ensures diff output is rendered correctly regardless of content.

📝 Documentation

The was updated in this PR (+118/-3 lines), which addresses the documentation gap noted in earlier reviews.


Summary

This is excellent work! The implementation has matured well, addressing all major concerns from earlier reviews. The code now has:

  • Proper transaction handling for version creation
  • Efficient single-query version lookup using chain_id
  • Comprehensive dependency checking for deletions
  • Resilient database migrations with backfill
  • Proper connection management

Only one minor issue remains with diff output rendering. Otherwise, this PR is in excellent shape for merging.

@github-actions

Copy link
Copy Markdown
Contributor

Review posted successfully. Found that most issues from earlier reviews have been addressed, with only one minor rendering issue remaining in the diff output.

New%20session%20-%202026-01-18T22%3A41%3A02.372Z
opencode session  |  github run

Comment thread codeframe/cli/app.py Outdated
Comment thread codeframe/core/prd.py
Comment on lines +331 to +336
INNER JOIN (
SELECT chain_id, MAX(version) as max_version
FROM prds
WHERE workspace_id = ?
GROUP BY chain_id
) latest ON p.chain_id = latest.chain_id AND p.version = latest.max_version

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL chain_id is dropped by equality joins/filters. Consider a canonical key (e.g., COALESCE(chain_id, id)) used consistently, or branch with IS NULL.

Suggested change
INNER JOIN (
SELECT chain_id, MAX(version) as max_version
FROM prds
WHERE workspace_id = ?
GROUP BY chain_id
) latest ON p.chain_id = latest.chain_id AND p.version = latest.max_version
INNER JOIN (
SELECT COALESCE(chain_id, id) AS chain_key, MAX(version) as max_version
FROM prds
WHERE workspace_id = ?
GROUP BY chain_key
) latest ON COALESCE(p.chain_id, p.id) = latest.chain_key AND p.version = latest.max_version

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment thread codeframe/core/prd.py
Comment on lines +541 to +543
except Exception:
cursor.execute("ROLLBACK")
raise

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROLLBACK in the except block can fail and mask the original error (e.g., if the connection is already closed). Consider wrapping ROLLBACK in its own try/except so the original exception is preserved.

Suggested change
except Exception:
cursor.execute("ROLLBACK")
raise
except Exception:
try:
cursor.execute("ROLLBACK")
except Exception:
pass
raise

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment thread codeframe/core/prd.py Outdated
frankbria and others added 2 commits January 18, 2026 18:41
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
@frankbria frankbria merged commit 900fcec into v2-refactor Jan 19, 2026
5 checks passed
@frankbria frankbria deleted the feature/prd-command-enhancements branch January 19, 2026 01:42
@github-actions

Copy link
Copy Markdown
Contributor

Final Review: Critical Markup Issue Found ⚠️

This PR is well-implemented with excellent test coverage (81 tests). Most issues from prior reviews have been addressed. However, one critical issue remains that could affect user experience:

🐛 Critical: Diff Output Markup Injection

Location:

Diff lines can contain and characters (e.g., in code snippets or markdown), which Rich interprets as markup tags. This causes rendering errors or incorrect formatting.

Current code:

Fix: Add parameter to prevent markup interpretation:

✅ Confirmed Fixed Issues

  • Database indexes added on and
  • Version creation uses explicit transactions
  • Single-query version lookup with
  • Task dependency validation on delete
  • Delete result checked in CLI
  • Documentation updated in
  • Resilient schema migration with backfill

📝 Minor Defensive Issues (Low Priority)

  1. ROLLBACK error masking (): If ROLLBACK fails, it could mask the original exception. Wrap in its own try/except for defense in depth.

  2. NULL chain_id filtering (): INNER JOIN filters records with NULL chain_id. This is defensive since migration backfill and both set , but could use for extra safety.


Summary: Excellent implementation with one user-facing bug that should be fixed. The markup injection issue can cause confusing output when diffing PRDs containing code or markdown.

@github-actions

Copy link
Copy Markdown
Contributor

Final Review: Critical Markup Issue Found ⚠️

This PR is well-implemented with excellent test coverage (81 tests). Most issues from prior reviews have been addressed. However, one critical issue remains that could affect user experience:

🐛 Critical: Diff Output Markup Injection

Location: codeframe/cli/app.py:990-997

Diff lines can contain [ and ] characters (e.g., in code snippets or markdown), which Rich interprets as markup tags. This causes rendering errors or incorrect formatting.

Fix: Add markup=False parameter to prevent markup interpretation:

  • Line 991: console.print(line, style="green", markup=False)
  • Line 993: console.print(line, style="red", markup=False)
  • Line 995: console.print(line, style="cyan", markup=False)
  • Line 997: console.print(line, markup=False)

✅ Confirmed Fixed Issues

  • Database indexes added on parent_id and chain_id
  • Version creation uses explicit transactions
  • Single-query version lookup with chain_id
  • Task dependency validation on delete
  • Delete result checked in CLI
  • Documentation updated in CLI_WIREFRAME.md
  • Resilient schema migration with backfill

📝 Minor Defensive Issues (Low Priority)

  1. ROLLBACK error masking (prd.py:542): If ROLLBACK fails, it could mask the original exception. Wrap in its own try/except for defense in depth.

  2. NULL chain_id filtering (prd.py:331-336): INNER JOIN filters records with NULL chain_id. This is defensive since migration backfill and store() both set chain_id, but could use COALESCE(chain_id, id) for extra safety.


Summary: Excellent implementation with one user-facing bug that should be fixed. The markup injection issue can cause confusing output when diffing PRDs containing code or markdown.

@github-actions

Copy link
Copy Markdown
Contributor

Review comment posted successfully. The PR has excellent implementation with one critical markup injection issue that should be fixed.

New%20session%20-%202026-01-19T01%3A42%3A31.483Z
opencode session  |  github run

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

This PR implements comprehensive PRD versioning functionality with excellent test coverage (68 new tests). The code is well-structured and follows the v2 headless architecture pattern. Most issues from prior reviews have been addressed, with only a few remaining items.

✅ Previously Addressed Issues

  • ✅ Index on parent_id and chain_id added in workspace.py
  • chain_id field implemented for version grouping
  • list_chains() function added
  • ✅ Transaction handling in create_new_version()
  • ✅ Single-query version lookup using chain_id
  • ✅ Dependency checking with PrdHasDependentTasksError
  • ✅ Resilient database migrations with backfill
  • ✅ Delete command checks return value and handles not-found case
  • lineterm="\n" is correct with splitlines(keepends=True)

🐛 Remaining Issues

1. Rich Markup Conflict in Diff Output (Priority: High)
Location: codeframe/cli/app.py:990-997

The diff output uses inline Rich markup ([green]{line}[/green]) which can cause rendering issues when the diff content contains [ or ] characters (common in code snippets like print("hello") or [1, 2, 3]).

Fix: Use style= parameter instead of inline markup and add markup=False:

for line in diff.splitlines():
    if line.startswith("+") and not line.startswith("+++"):
        console.print(line, style="green", markup=False)
    elif line.startswith("-") and not line.startswith("---"):
        console.print(line, style="red", markup=False)
    elif line.startswith("@@"):
        console.print(line, style="cyan", markup=False)
    else:
        console.print(line, markup=False)

2. NULL chain_id Handling (Priority: Low)
Location: codeframe/core/prd.py:336

The list_chains() function uses INNER JOIN with p.chain_id = latest.chain_id. If chain_id is NULL (possible in legacy edge cases), rows will be dropped since NULL != NULL in SQL.

Fix: Use COALESCE(chain_id, id) for robust handling:

SELECT p.id, p.workspace_id, p.title, p.content, p.metadata, p.created_at,
       p.version, p.parent_id, p.change_summary, p.chain_id
FROM prds p
INNER JOIN (
    SELECT COALESCE(chain_id, id) as chain_key, MAX(version) as max_version
    FROM prds
    WHERE workspace_id = ?
    GROUP BY chain_key
) latest ON COALESCE(p.chain_id, p.id) = latest.chain_key AND p.version = latest.max_version
WHERE p.workspace_id = ?
ORDER BY p.created_at DESC

3. ROLLBACK Error Masking (Priority: Low)
Location: codeframe/core/prd.py:542-543

If cursor.execute("ROLLBACK") fails (e.g., connection closed), it will raise a new exception and mask the original error.

Fix: Wrap ROLLBACK in try/except:

except Exception:
    try:
        cursor.execute("ROLLBACK")
    except Exception:
        pass
    raise

📝 Overall Assessment

This is a high-quality PR with comprehensive functionality and excellent test coverage. The code follows v2 architecture principles well.

Recommended actions before merge:

  1. Fix the Rich markup conflict (Issue Add Claude Code GitHub Workflow #1) - This is a real bug that could cause rendering problems

Nice to have (can be addressed in follow-up):
2. Add COALESCE handling for NULL chain_id (Issue #2)
3. Wrap ROLLBACK in try/except (Issue #3)

The implementation is solid and ready to merge with the markup fix applied.

@github-actions

Copy link
Copy Markdown
Contributor

I've posted a comprehensive review to the PR. The review identifies:

  1. High priority bug: Rich markup conflict in diff output that could cause rendering issues with code containing [ or ] characters
  2. Low priority improvement: NULL chain_id handling in list_chains()
  3. Low priority improvement: ROLLBACK error masking in exception handling

The PR is in excellent shape with comprehensive test coverage and clean architecture. Most prior issues have been addressed, with only the markup conflict requiring attention before merge.

New%20session%20-%202026-01-19T01%3A41%3A54.201Z
opencode session  |  github run

frankbria added a commit that referenced this pull request Jan 22, 2026
* chore(refactor) Quarantine web-ui and edit CLI_WIREFRAME

* Update AGENTS.md

* feat(core): implement Golden Path Phases 1-2 with CLI-first architecture

Phase 1 - Workspace & Events:
- New Typer CLI at codeframe/cli/app.py with domain-grouped commands
- Workspace management with SQLite state storage in .codeframe/
- Append-only event log for all workspace activity
- Updated pyproject.toml entry point

Phase 2 - PRD & Task Management:
- PRD storage with title extraction and metadata
- Task state machine (BACKLOG→READY→IN_PROGRESS→BLOCKED→DONE→MERGED)
- LLM-powered task generation from PRD (with simple fallback)
- Status transitions with validation

Test coverage:
- 28 state machine unit tests
- 17 workspace unit tests
- 11 integration tests covering full Phase 1-2 flow

* feat(cli): implement status command (Phase 4)

Shows workspace summary including:
- PRD info (title and date)
- Task counts by status with color-coding
- Recent activity from event log
- Configurable event count with --events/-e flag

Emits STATUS_VIEWED event for activity tracking.

* feat(core): implement work commands with runtime module (Phase 5)

New runtime module (codeframe/core/runtime.py):
- Run lifecycle management (start, stop, complete, fail, block, resume)
- RunStatus enum (RUNNING, COMPLETED, FAILED, BLOCKED)
- Stub agent execution loop that emits events

Work CLI commands:
- work start: Creates run, transitions task to IN_PROGRESS
- work stop: Gracefully stops run, returns task to READY
- work resume: Resumes a blocked run
- work status: Shows active runs

The --execute flag on work start runs the stub agent, emitting
AGENT_STEP_STARTED and AGENT_STEP_COMPLETED events for testing.

* feat(core): implement blocker commands (Phase 6)

New blockers module (codeframe/core/blockers.py):
- BlockerStatus enum (OPEN, ANSWERED, RESOLVED)
- Blocker CRUD operations
- Partial ID matching for convenience

Blocker CLI commands:
- blocker list: Show open blockers (--all for all)
- blocker show: View blocker details with question/answer
- blocker create: Manually create blockers for testing
- blocker answer: Provide answer to unblock work
- blocker resolve: Mark blocker as resolved

Emits BLOCKER_CREATED, BLOCKER_ANSWERED, BLOCKER_RESOLVED events.

* feat(core): implement review command with verification gates (Phase 7)

New gates module (codeframe/core/gates.py):
- Auto-detect available gates (pytest, ruff, mypy, npm-test, npm-lint)
- Run gates with configurable verbosity
- Capture output, exit codes, and timing
- GateStatus enum (PASSED, FAILED, SKIPPED, ERROR)

Review CLI command:
- codeframe review: Run all detected gates
- --gate/-g: Run specific gates only
- --verbose/-v: Show full gate output

Emits GATES_STARTED and GATES_COMPLETED events.

Also: Added .codeframe/ to .gitignore

* feat(core): implement patch and commit commands (Phase 8)

New artifacts module (codeframe/core/artifacts.py):
- export_patch: Export git diff as a .patch file
- create_commit: Create git commits with proper validation
- get_status: Get git status summary
- list_patches: List previously exported patches

Patch CLI commands:
- patch export: Export changes to .codeframe/patches/
- patch list: List exported patches
- patch status: Show git status summary

Commit CLI commands:
- commit create: Create commits with -m message
- commit create --all: Stage all changes before committing

Emits PATCH_EXPORTED and COMMIT_CREATED events.

* feat(core): implement checkpoint and summary commands (Phase 9)

Adds checkpoint module for state snapshots and updates summary command
to display workspace overview. Completes Golden Path CLI implementation.

* docs: add agent implementation task list

Tracks the work needed to replace execute_stub() with a fully
functional agent that can read context, plan, and execute code changes.

* feat(adapters): implement LLM adapter with Anthropic and Mock providers

Adds codeframe/adapters/llm/ with:
- base.py: Protocol, ModelSelector, LLMResponse, Tool/ToolCall types
- anthropic.py: Claude provider with tool use and streaming support
- mock.py: Test provider with call tracking and queued responses

Task-based model selection heuristic:
- Planning/reasoning → Sonnet
- Execution → Sonnet
- Generation → Haiku

* feat(core): implement task context loader for agent execution

Adds codeframe/core/context.py with:
- TaskContext: dataclass holding task, PRD, blockers, and file contents
- ContextLoader: loads and scores relevant files within token budget
- Keyword extraction and relevance scoring for file selection
- Token budgeting to maximize useful context

Also adds list_for_task() helper to blockers module.

* feat(core): implement agent planning module

Adds codeframe/core/planner.py with:
- Planner: transforms TaskContext into ImplementationPlan via LLM
- ImplementationPlan: structured plan with steps, files, complexity
- PlanStep: individual step with type, target, dependencies
- StepType enum: file_create, file_edit, shell_command, verification

Uses Purpose.PLANNING to select stronger model for reasoning tasks.

* feat(core): implement code execution engine

Adds codeframe/core/executor.py with:
- Executor: executes plan steps via LLM-driven code generation
- File operations: create, edit, delete with rollback tracking
- Shell commands: sandboxed execution with dangerous pattern blocking
- Dry-run mode for previewing changes without applying them
- Full rollback capability for all file changes

Uses Purpose.EXECUTION for balanced model selection during code generation.

* feat(core): implement agent orchestrator with blocker detection

Adds codeframe/core/agent.py with:
- Agent: main orchestrator coordinating context, planning, execution
- AgentState: serializable state for pause/resume
- Blocker detection: creates blockers for failures needing human input
- Gate integration: runs verification after file changes
- Event emission: callback-based event system for monitoring

Patterns detected for blocker creation:
- Consecutive failures exceeding threshold
- 'not found', 'missing', 'credentials' errors
- Verification failures after max attempts

* feat(runtime): wire agent orchestrator into work start command

Adds execute_agent() to runtime.py:
- Integrates full agent orchestration (context, plan, execute, verify)
- Requires ANTHROPIC_API_KEY for real execution
- Emits workspace events for monitoring

Updates CLI work start command:
- --execute: runs the real AI agent
- --dry-run: preview changes without applying
- --stub: legacy stub execution for testing

The Golden Path is now fully functional from PRD to committed code.

* fix(agent): correct GateResult attribute access

- GateResult has `passed` (bool), not `status`
- GateCheck has `name`, not `gate`

Fixes AttributeError during agent execution verification.

* fix(agent): remove duplicate task status update

Task status is now only updated by runtime.complete_run(),
avoiding DONE -> DONE transition error.

* docs: mark agent implementation tasks complete

* fix(runtime): avoid READY->READY transition in stop_run

* fix(agent): remove duplicate BLOCKED status updates

* fix(executor): handle verification steps intelligently

- Python files: check existence and syntax
- Commands: execute as shell
- Other paths: check existence

Fixes issue where 'task_tracker.py' was run as a command instead of verified.

* docs(readme): update for v2 agent implementation

- Update status badge to reflect v2 completion
- Add "What's New" section for v2 agent implementation
- Document CLI-first workflow as recommended approach
- Update architecture diagram to show CLI/Agent orchestrator
- Add complete CLI command reference
- Move previous updates to collapsible sections
- Update roadmap with completed items
- Add links to v2 documentation (Golden Path, Agent Tasks)

* docs(claude): update for v2 agent implementation complete

- Update status to v2 Agent Implementation Complete
- Add agent system architecture section with component table
- Add execution flow diagram for agent orchestration
- Document critical state separation pattern (Agent→AgentState, Runtime→TaskStatus)
- Add recent updates section with bug fixes

* feat(agent): add error classification and self-correction for technical errors

Previously, the agent would create blockers for any error matching patterns
like "not found" or "missing". This caused technical errors (syntax errors,
file not found, import errors) to block execution when the agent should
solve them automatically.

Changes:
- Add HUMAN_INPUT_PATTERNS for genuine human-needed situations (credentials,
  unclear requirements, design decisions)
- Add TECHNICAL_ERROR_PATTERNS for errors agent can self-correct (file not
  found, syntax errors, import errors)
- Add _classify_error() to categorize errors
- Add _attempt_self_correction() to use LLM to fix technical errors
- Update _execute_plan() to try self-correction before creating blockers
- Update tests to reflect new behavior

The agent now:
1. Classifies errors as "technical" or "human"
2. For technical errors: tries self-correction (up to 2 attempts)
3. Only creates blockers for human-input-needed situations or after
   exhausting self-correction attempts

* feat(blockers): auto-reset task to READY when blocker is answered

When a blocker is answered, the associated task is now automatically
reset to READY status. This eliminates the need for separate "work stop"
and "work resume" commands.

Flow is now:
1. Task runs → hits blocker → status becomes BLOCKED
2. User answers blocker: `cf blocker answer <id> "answer"`
3. Task automatically resets to READY
4. User can restart: `cf work start <id> --execute`

The blocker answer includes the user's input, so the agent will have
access to it when the task is restarted.

* fix(agent): prevent infinite loop when self-correction returns None

The previous code used Python's while...else construct, but when
_attempt_self_correction returned None, we'd break out of the loop
and skip the else block, which meant current_step was never incremented
and the same step would be retried forever.

Fixed by using a flag to track self-correction success and handling
the failure case unconditionally after the loop ends.

* fix(agent): trigger self-correction when verification fails after file edit

Previously, when a file was written successfully but verification (ruff)
detected a syntax error, the agent would:
1. Try ruff --fix (which can't fix syntax errors)
2. Just increment consecutive_failures and move on

This left broken code in the file and continued to the next step.

Now the agent:
1. Detects verification failure after successful file write
2. Triggers self-correction to fix the syntax/code error
3. Re-runs verification after each correction attempt
4. Creates a blocker if self-correction can't fix it

This ensures syntax errors caught by linting get the same self-correction
treatment as other technical errors.

* fix(agent): convert failed VERIFICATION steps to FILE_EDIT for self-correction

When a VERIFICATION step fails (e.g., ast.parse catches a syntax error),
we were trying to "self-correct" the verification step itself, which
doesn't make sense. Now we convert it to a FILE_EDIT step targeting
the same file, so self-correction actually fixes the broken code.

This fixes the case where:
1. File is written with syntax error
2. Ruff doesn't catch it (ruff misses some errors that ast catches)
3. Verification step catches the syntax error
4. Self-correction can now actually fix the file

* docs: add batch execution implementation plan

- Add BATCH_EXECUTION_PLAN.md with phased approach:
  - Phase 1: Serial batch execution via conductor
  - Phase 2: Parallel execution with dependency analysis
  - Phase 3: Observability and websocket streaming

- Update CLI_WIREFRAME.md:
  - Add conductor.py and dependency_analyzer.py to module layout
  - Add cf work batch commands (batch, status, cancel)
  - Update implementation order with batch phases

Design decisions:
- Subprocess-based execution (isolation, crash-safe)
- No server required (CLI-first)
- Serial by default, parallel opt-in

* docs: organize planning docs, mark Golden Path complete

- Move completed planning docs to docs/finished/:
  - AGENT_IMPLEMENTATION_TASKS.md (all 8 tasks done)
  - REFACTOR_PLAN_FOR_AGENT.md (Steps 0-6 complete)

- Update GOLDEN_PATH.md:
  - Mark acceptance checklist as complete (2025-01-14)
  - Reference BATCH_EXECUTION_PLAN.md as next phase

- Add docs/finished/README.md explaining folder purpose

Active docs remaining:
- GOLDEN_PATH.md (architecture contract)
- CLI_WIREFRAME.md (command reference)
- BATCH_EXECUTION_PLAN.md (next phase)

* feat(batch): implement Phase 1 batch execution

Add multi-task batch execution support with serial execution strategy.

New components:
- core/conductor.py: Batch orchestration with subprocess execution
- BatchRun model with status tracking (PENDING, RUNNING, COMPLETED, PARTIAL, FAILED, CANCELLED)
- On-failure behavior (continue or stop)

CLI commands:
- cf work batch <task-ids...> - Execute multiple tasks
- cf work batch --all-ready - Execute all READY tasks
- cf work batch-status [batch-id] - Show batch status
- cf work batch-cancel <batch-id> - Cancel running batch

Schema updates:
- batch_runs table with auto-migration for existing workspaces
- Batch event types (BATCH_STARTED, BATCH_TASK_*, BATCH_COMPLETED, etc.)

Tests:
- 23 new tests for conductor module (all passing)

Phase 2 will add parallel execution with dependency analysis.

* refactor(cli): restructure batch commands to use subcommand group

Changed from hyphenated commands to proper subcommand structure:
- cf work batch-status -> cf work batch status
- cf work batch-cancel -> cf work batch cancel
- cf work batch <ids> -> cf work batch run <ids>

Created batch_app Typer subcommand group with run, status, cancel.
Updated CLI_WIREFRAME.md and BATCH_EXECUTION_PLAN.md to reflect changes.
Marked Phase 1 as complete in both docs.

* test(conductor): add integration tests for batch failure scenarios

Added 9 new tests in TestBatchExecution class:
- test_all_tasks_succeed: verifies COMPLETED status
- test_some_tasks_fail_continue: PARTIAL status with on_failure=continue
- test_task_fails_stop: stops execution with on_failure=stop
- test_all_tasks_fail: FAILED status when all tasks fail
- test_task_blocked: handles BLOCKED tasks correctly
- test_mixed_results: tracks COMPLETED, FAILED, BLOCKED together
- test_first_task_fails_stop: stops immediately on first failure
- test_batch_completed_at_set: timestamp set after execution
- test_on_event_callback_called: callback receives all events

Total: 32 tests (was 23)

* feat(agent): add self-correction capabilities and model flexibility

LLM adapter changes:
- Add CORRECTION purpose for self-correction (uses stronger model)
- Add environment variable overrides for all model selections:
  CODEFRAME_PLANNING_MODEL, CODEFRAME_EXECUTION_MODEL,
  CODEFRAME_GENERATION_MODEL, CODEFRAME_CORRECTION_MODEL
- Default correction model: claude-opus-4-5 for fixing errors

Agent changes:
- Add _extract_file_from_command() to parse verification targets
- Add debug logging capability with --debug flag
- Improve self-correction flow when verification fails
- Convert failed VERIFICATION steps to FILE_EDIT for re-attempt

These changes support automatic error recovery during batch execution.

* docs: add retry/self-correction future enhancements section

Updated BATCH_EXECUTION_PLAN.md:
- Added "Future Enhancements: Retry & Self-Correction" section
- Documented three retry options: --retry flag, resume command, escalation
- Added decision points for Phase 2 planning
- Updated references to point to finished/ folder

Updated CLI_WIREFRAME.md:
- Renamed Phase 2 to "Parallel Execution & Retry"
- Added --retry N flag and batch resume command to roadmap
- Renumbered Phase 3 items

* feat(batch): implement batch resume command

Added resume_batch() function to conductor.py:
- Re-runs failed/blocked tasks from a previous batch
- --force flag re-runs all tasks including completed ones
- Merges results into existing batch record
- Preserves completed task results when not using force

Added CLI command:
- cf work batch resume <batch-id> [--force]
- Supports partial batch ID matching
- Shows helpful output about what will be re-run

Added 9 tests for resume scenarios:
- Resume PARTIAL/FAILED batches
- Force mode re-runs all tasks
- Handles blocked tasks
- Preserves completed results
- Edge cases (no failed tasks, still failing)

Updated docs:
- CLI_WIREFRAME.md with resume command details
- BATCH_EXECUTION_PLAN.md marks Option B as implemented
- Phase 2 shows resume as complete

Total tests: 41 (was 32)

* feat(batch): add --retry N flag for automatic task retry

- Add _execute_retries() function in conductor.py for retry loop
- Add max_retries parameter to start_batch()
- Add --retry/-r option to CLI batch run command
- Retry only FAILED tasks (not BLOCKED which need human intervention)
- Stop early if all tasks succeed before exhausting retries
- Add 8 tests for retry functionality (49 total conductor tests)
- Update docs to mark retry flag as implemented

* feat(tasks): add depends_on field for task dependencies

- Add depends_on field to Task dataclass (default empty list)
- Add depends_on column to tasks table schema with migration
- Add update_depends_on() function to modify task dependencies
- Add get_dependents() function to find tasks that depend on a given task
- Validate against self-references and nonexistent dependencies
- Add 15 tests for dependency functionality
- Update docs to mark this Phase 2 item as complete

* feat(batch): add dependency graph analysis for parallel execution

- Create dependency_graph.py module for DAG operations
- Implement build_graph() to construct dependency graph from tasks
- Implement detect_cycle() for circular dependency detection
- Implement topological_sort() for execution order
- Implement group_by_level() for parallel execution groups
- Create ExecutionPlan dataclass with groups, task_order, and graph
- Add validate_dependencies() for pre-execution validation
- Add CycleDetectedError exception class
- Add 34 tests for all graph operations
- Update docs to mark this Phase 2 item as complete

* feat(batch): implement parallel execution with worker pool

- Add _execute_parallel() using ThreadPoolExecutor for concurrent tasks
- Create execution plan using dependency graph to group tasks by level
- Tasks in the same group run in parallel, groups execute sequentially
- Add _execute_single_task() and _execute_group_parallel() helpers
- Respect max_parallel limit for worker pool size
- Fall back to serial execution if circular dependencies detected
- Add 7 tests for parallel execution scenarios
- Update existing test that expected "not implemented" warning
- Update docs to mark parallel execution as complete

Phase 2 now complete: batch resume, retry, depends_on, dependency
graph, and parallel execution all implemented and tested.

* feat(batch): add --strategy auto for LLM-based dependency inference

Adds intelligent dependency analysis using LLM to automatically infer
task dependencies from descriptions when --strategy auto is used.

- Add dependency_analyzer.py with LLM-powered task analysis
- Integrate auto strategy into conductor with fallback to serial
- Update CLI help text to describe strategy options
- Mark Phase 2 as complete in documentation

* docs: update all v2 documentation for Phase 2 completion

- Update status badges and test counts in README.md
- Add Phase 2 batch features to "What's New" section
- Add batch execution CLI commands to both README.md and CLAUDE.md
- Update roadmap to show Phase 2 complete, Phase 3 in progress
- Add new modules (conductor, dependency_graph, dependency_analyzer) to repo structure
- Mark Phase 2 acceptance criteria as complete in BATCH_EXECUTION_PLAN.md

* feat(batch): add live streaming via batch_follow command

Phase 3 observability features:
- BatchProgress class for ETA calculation based on task durations
- `cf work batch follow <id>` for real-time terminal streaming
- Rich Live display with progress panel and event log
- Handles terminal events (COMPLETED, FAILED, PARTIAL, CANCELLED)
- 27 unit tests for BatchProgress class

* feat(cli): add bulk status update with --all and --from flags

New usage:
  cf tasks set status READY --all              # All tasks to READY
  cf tasks set status READY --all --from BACKLOG  # Only BACKLOG -> READY
  cf tasks set status READY abc123             # Single task (unchanged)

Skips tasks already at target status and reports counts.

* test(cli): add comprehensive tests for tasks set bulk operations

Tests for --all and --from flags:
- Bulk update all tasks to a status
- Filter updates by source status with --from
- Skip tasks already at target status
- Single task updates (backward compatibility)
- Error handling (missing args, invalid status, empty workspace)

Also fixes typer.Exit being caught by generic exception handler.

* feat(cli): add Deps column to tasks list output

Shows task dependencies in the table:
- "-" for no dependencies
- Short IDs (6 chars) for 1-2 dependencies
- "N tasks" for 3+ dependencies

* feat(tasks): add delete command and generate --overwrite flag

New functionality:
- `cf tasks delete <id>` - delete single task (with --force to skip confirm)
- `cf tasks delete --all` - delete all tasks (with confirmation)
- `cf tasks generate --overwrite` - clear existing tasks before generating

Core module additions:
- tasks.delete(workspace, task_id) -> bool
- tasks.delete_all(workspace) -> int

The delete command warns when deleting tasks that others depend on.
Without --overwrite, tasks generate appends (supports multi-PRD projects).

15 new tests covering all CRUD operations.

* test: add v2 marker for CLI-first tests

- Register `v2` marker in pytest.ini
- Auto-mark all tests/core/ as v2 via conftest.py
- Add pytestmark to v2 CLI test files
- Document convention in CLAUDE.md

Run v2 tests only: `uv run pytest -m v2`
Currently 411 v2 tests covering headless functionality.

* fix(cli): correct argument order for tasks set status command

The command now uses natural order: `cf tasks set status <task_id> <value>`
instead of `<value> <task_id>`. This matches user expectations and other
CLI conventions.

Changes:
- Swap task_id and value argument positions in function signature
- Add argument parsing logic to handle both single task and --all modes
- Fix variable references from task_id to actual_task_id
- Update tests to use corrected argument order

* feat(agent): add autonomous decision-making and AGENTS.md support

Add comprehensive improvements to reduce false blockers and enable
autonomous agent decision-making for tactical code decisions.

Key changes:
- Add AGENTS.md/CLAUDE.md preferences loading (agents_config.py)
- Split blocker patterns into tactical/human/technical categories
- Add autonomy directives to planning and execution prompts
- Add Purpose.SUPERVISION for supervisor model selection
- Add --all-blocked option to batch run command
- Add --reset flag to batch resume command
- Add reset_blocked_run() to clear blocked runs for re-execution

Agents now make autonomous decisions for tactical choices like:
- File handling (overwrite, merge, extend)
- Package manager and version selection
- Test framework configuration
- Code style decisions

Blockers are only created for true requirements ambiguity,
access/credential issues, or technical errors after exhausting
self-correction attempts.

* fix(agent): prevent tactical questions from becoming blockers

The previous implementation still created blockers for tactical decisions
because:
1. _generate_blocker_question didn't tell the LLM to avoid tactical questions
2. _create_verification_blocker always created blockers for pytest failures
3. No filtering of generated questions before creating blockers

Fixes:
- Update _generate_blocker_question prompt to explicitly instruct LLM to:
  - Return "RESOLVE_AUTONOMOUSLY: <decision>" for tactical decisions
  - Return "TECHNICAL_FIX: <fix>" for technical issues
  - Only generate questions for true human-required decisions

- Update _create_blocker_from_failure to:
  - Detect RESOLVE_AUTONOMOUSLY and TECHNICAL_FIX directives
  - Filter tactical patterns (venv, pip, pytest.ini, fixture scope, etc.)
  - Auto-resolve instead of creating blockers

- Update _create_verification_blocker to:
  - Mark verification failures as FAILED (not BLOCKED)
  - Let retry mechanism handle technical test failures
  - Stop creating "pytest failed, what should I do?" blockers

This should eliminate blockers for:
- Virtual environment creation questions
- Package manager choices
- Asyncio fixture scope configuration
- Pytest verification failures

* feat(conductor): add supervisor-level blocker resolution

Add SupervisorResolver to handle tactical blockers at the conductor level
instead of letting each worker agent create blockers independently.

Key changes:
- Add SupervisorResolver class with:
  - Decision cache for deduplication across workers
  - Pattern-based tactical question detection
  - Supervision model classification for uncertain cases
  - Auto-answer with cached decisions

- Integrate supervisor into all execution paths:
  - _execute_serial: intercepts BLOCKED, tries resolution, retries
  - _execute_single_task: same pattern for parallel execution
  - execute_agent (runtime.py): single task execution also uses supervisor

- Benefits:
  - No duplicate questions (cached per workspace)
  - Stronger model (SUPERVISION) makes classification decisions
  - Workers create blockers, supervisor filters tactical ones
  - Only true human-required decisions surface as blockers

Flow: Worker -> BLOCKED -> Supervisor evaluates ->
      Tactical? Auto-resolve + retry : Surface to user

* test(supervisor): add comprehensive tests for SupervisorResolver

Adds 27 tests covering:
- Tactical pattern detection (venv, package managers, config, questions)
- Decision cache key generation for deduplication
- Tactical resolution generation
- Blocker resolution with cache usage
- Supervisor singleton management
- LLM classification fallback with graceful error handling

Also fixes cache key generation to recognize "virtualenv" pattern.

* feat(batch): add stop command with graceful and force modes

Adds `cf work batch stop <id>` command to interrupt running batches:
- Graceful stop (default): Sets batch to CANCELLED, current task finishes
- Force stop (--force): Terminates running processes with SIGTERM immediately

Implementation details:
- Added process tracking via _active_processes dict in conductor.py
- Modified _execute_task_subprocess to use Popen and track processes
- Added stop_batch() function with force parameter
- Added 6 tests for stop functionality

This allows users to safely interrupt stuck batches from another terminal.

* refactor(cli): remove duplicate batch cancel command

The 'batch stop' command supersedes 'batch cancel':
- stop (default): graceful stop, same as cancel was
- stop --force: terminates running processes

Keeping cancel_batch() in conductor.py for internal use.

* fix(runtime): add FAILED status and fix fail_run() state management

- Add FAILED status to TaskStatus enum with transitions to READY/IN_PROGRESS
- Fix fail_run() to update task status (was leaving tasks stuck in IN_PROGRESS)
- Add supervisor handling for FAILED tasks with auto-retry on tactical errors
- Fix load_preferences() to fall back to defaults when no AGENTS.md exists
- Add new tactical patterns: externally-managed, no module named, __main__
- Add --review flag to batch run for verification gates after completion
- Add CLI test report and quickstart guide documentation

* fix(planner): include AGENTS.md preferences in planning prompt

The preferences from ~/.codeframe/AGENTS.md were being loaded into
the TaskContext but never included in the prompt sent to the LLM.
This meant agents were using pip instead of uv despite the global
config specifying uv as the package manager.

Now the planner's _build_prompt() includes the preferences section
from context.preferences.to_prompt_section() right after the task
information, ensuring the LLM sees tooling preferences like:
- package_manager: uv
- Commands: uv sync, uv run pytest, etc.

* fix(runtime): extract error message from AgentState correctly

AgentState doesn't have an 'error' attribute. The fix now extracts
error info from:
1. state.blocker.reason if there's a blocker
2. Last step result's error/output
3. Gate results failure output

This fixes the AttributeError when supervisor tries to help with
failed tasks.

* fix(schema): add FAILED status to tasks table CHECK constraint

The state_machine.py was updated with FAILED status but the database
CHECK constraint in workspace.py wasn't updated, causing
IntegrityError when trying to set task status to FAILED.

* fix(runtime): remove invalid context parameter from blockers.create()

* feat(agent): implement verification self-correction loop

Add LLM-powered self-correction during final verification:
- Convert _run_final_verification to use retry loop with max_attempts
- Add _attempt_verification_fix method that collects gate errors and uses
  LLM to generate targeted file edits
- Try ruff --fix first for quick lint fixes
- LLM generates JSON fix plan with file edits
- Apply fixes and re-run verification in loop
- Gracefully give up when LLM can't generate more fixes

Also adds diagnostic logging to runtime.py for supervisor intervention
analysis.

The self-correction loop now:
1. Detects verification failures (pytest, ruff)
2. Calls LLM with error messages for targeted fixes
3. Applies fixes (file edits/creates)
4. Re-runs verification up to max_attempts
5. Falls through to FAILED if unfixable

* feat(cli): add --verbose flag for self-correction diagnostics

Add --verbose / -v flag to control diagnostic output:
- CLI: work start --verbose prints detailed verification progress
- Agent: _verbose_print() helper for conditional output
- Runtime: pass verbose flag through to agent

Diagnostic messages now only appear when --verbose is enabled:
- [VERIFY] verification attempt status
- [SELFCORRECT] LLM fix generation progress

This keeps normal output clean while allowing detailed tracing when needed.

* docs(readme): update for self-correction loop and verbose mode

- Add 2026-01-16 "What's New" section with self-correction features
- Document --verbose flag for observability
- Move batch execution to collapsible "Previous" section
- Add QUICKSTART.md and CLI_V2_TEST_REPORT.md to documentation links
- Update Key Features with self-correction and verbose mode
- Update roadmap with completed items and current phase focus

* docs(claude.md): update for self-correction loop and verbose mode

- Update status to Phase 2+ with self-correction and observability
- Add new features: verbose mode, self-correction loop, FAILED status
- Update execution flow diagram with self-correction details
- Add --verbose flag to CLI commands section
- Add 2026-01-16 Recent Updates section with new methods

* docs: add comprehensive feature roadmap for v2

Planned outward from existing functionality toward fully autonomous
agentic coding system. 10 phases covering:

- Phase 3: Agent Reliability (env config, error surfacing, self-correction)
- Phase 4: Continuous Execution (watch mode, streaming, graceful interrupts)
- Phase 5: Idea → PRD Generation (interactive creation, config collection)
- Phase 6: Git Integration (passthrough, smart defaults, PR workflow)
- Phase 7: Multi-Agent Coordination (roles, handoff, parallel execution)
- Phase 8: Observability & History (timeline, replay, debug)
- Phase 9: TUI Dashboard (Rich/Textual, interactive control)
- Phase 10: Remote Access & Metrics (webhooks, API, cost tracking)

Key decisions: CLI-first, user-configured environment, branch-per-batch,
git passthrough over reimplementation, multi-agent before TUI, FastAPI
only for webhooks/external access.

* chore(beads): add Phase 3 Agent Reliability issues

Closed all v1 legacy issues (superseded by v2 roadmap).

Created Phase 3 epic with 4 features and 14 tasks:
- 3.1 Environment Configuration (4 tasks)
- 3.2 Error Surfacing (3 tasks)
- 3.3 Smarter Context Loading (2 tasks)
- 3.4 Enhanced Self-Correction (3 tasks)
- Phase 3 test coverage (1 task)

All dependencies configured for proper execution order.

* feat(config): add v2 environment configuration with YAML support

Implements EnvironmentConfig dataclass for project environment settings:
- Package manager (uv, pip, poetry, npm, pnpm, yarn)
- Python/Node version configuration
- Test framework (pytest, jest, vitest, etc.)
- Lint tools (ruff, eslint, prettier, etc.)
- Context loading limits (max_files, max_tokens)
- Custom command overrides

Features:
- YAML serialization/deserialization (.codeframe/config.yaml)
- Validation for known values with helpful error messages
- Command generation (get_install_command, get_test_command, get_lint_command)
- Coexists with legacy v1 JSON config

31 tests passing covering all functionality.

Closes: codeframe-5r7n

* feat(cli): add config subcommand for v2 environment configuration

Add cf config init|show|set commands for managing project environment
configuration stored in .codeframe/config.yaml:

- config init: Interactive or auto-detect setup (--detect, --force flags)
- config show: Display current configuration
- config set: Set individual config values (package_manager, test_framework, etc.)

Includes auto-detection for package managers (uv/pip/poetry/npm/yarn/pnpm),
test frameworks (pytest/jest/vitest), and lint tools (ruff/eslint/prettier).

* feat(agent): integrate environment config into agent execution

Updates context loader and planner to use project environment configuration:

- context.py: Load EnvironmentConfig as part of TaskContext
- context.py: Include environment section in to_prompt_context()
- planner.py: Include config in planning prompt with exact commands
- Tests: Add 3 new tests for environment config integration

The agent now knows the correct package manager, test framework,
and lint commands to use based on .codeframe/config.yaml.

* docs: add environment configuration documentation

Update all key documentation to explain the new config workflow:

- README.md: Add config commands to CLI section, "What's New", Quick Start
- QUICKSTART.md: Add Step 2 for environment configuration
- CLAUDE.md: Add Phase 3.1 update, config commands in CLI section
- CLI_WIREFRAME.md: Add Configuration section with command mapping

The happy path now includes:
1. cf init
2. cf config init --detect (auto-detect package manager, test framework)
3. cf prd add
4. cf tasks generate
5. cf work start --execute

* fix(config): improve UX for greenfield projects with no files to detect

- Refactor _detect_environment_config() to return tuple (config, detected_items)
- Track what was actually detected vs defaulted
- Show different messages based on detection results:
  - When detected: "Detected from project files:" with bullet list
  - When nothing found: "No project files found to detect from."
    with guidance on using defaults and customization options

* refactor(config): replace structured config with natural language tech_stack

- Add tech_stack field to Workspace model with database migration
- Add --tech-stack, --detect, --tech-stack-interactive flags to init command
- Remove cf config subcommand entirely (was Python-centric)
- Update TaskContext and Planner to use natural language tech_stack
- Simplify configuration: users describe stack, agent adapts

Design philosophy: Instead of hardcoded package_manager, test_framework,
lint_tools enums, users describe their stack in natural language
(e.g., "Rust project using cargo", "TypeScript monorepo with pnpm").
Works with any technology without code changes.

Future work: Multi-round interactive discovery (bead: codeframe-8d80)

* feat(agent): add enhanced self-correction with fix tracking and quick fixes

Implements three capabilities to improve agent self-correction:

1. Fix Attempt Tracking (fix_tracker.py):
   - Normalize and hash errors for deduplication
   - Track attempted fixes to prevent repeating failures
   - Escalation thresholds: 3 same-error, 3 same-file, 5 total

2. Pattern-Based Quick Fixes (quick_fixes.py):
   - Match common errors without LLM calls
   - ModuleNotFoundError → install package (with package aliases)
   - ImportError/NameError → add missing imports
   - SyntaxError/IndentationError → apply common fixes
   - Auto-detect package manager (uv, pip, npm, yarn, etc.)

3. Escalation to Blocker:
   - Create informative blockers when self-correction exhausted
   - Include error type, attempted fixes, and guidance questions
   - Prevents infinite fix loops

Closes: codeframe-5ned, codeframe-4tjy, codeframe-l2lm, codeframe-uwbu

* feat(agent): enhanced self-correction with project context and shell commands

Self-correction improvements:
- Add _build_self_correction_context() to include project structure,
  config files, tech stack, and modified files in fix prompts
- Add FixScope enum (LOCAL/GLOBAL) and _classify_fix_scope() to
  determine coordination requirements for parallel agents
- Enable shell command execution during self-correction (uv pip install, etc.)
- Fix StepResult attribute access (file_changes instead of files_created)

Coordination infrastructure:
- Add GlobalFixCoordinator class for thread-safe fix deduplication
- Coordinator tracks pending/completed fixes to prevent conflicts
- Wire coordinator through runtime.execute_agent()

Gate fixes:
- Update _run_ruff() to use 'uv run ruff' like pytest does
- Ensures ruff runs in target project's environment, not system-wide

* docs: add agent tool system to roadmap (codeframe-p77g)

- Mark Phase 3.4 Enhanced Self-Correction as complete
- Document shell command execution and FixScope classification
- Add Phase 3.5 placeholder for future Agent Tool System
- References bead codeframe-p77g for full spec

* feat: Transform CodeFRAME v2 MVP from basic task automation to AI-driven development orchestration

## 🎯 Enhanced MVP Definition
- Replace basic "Add a PRD" with AI-driven interactive PRD generation
- Upgrade single-task execution to intelligent batch orchestration
- Integrate complete Git workflow with PR management instead of basic artifact export
- Add comprehensive checkpointing with state restoration capabilities

## 🚀 Key Architectural Shifts

### AI-Driven Project Discovery
- Interactive AI sessions gather requirements, constraints, and success criteria
- Generates comprehensive PRD with technical specs, user stories, and acceptance criteria
- Supports iterative refinement with versioning and change tracking
- Enhanced `prd generate`, `prd refine` commands replace basic `prd add`

### Batch-First Execution Model
- Main orchestrator agent coordinates multiple tasks (not single task execution)
- Dependency-aware scheduling with serial/parallel/auto strategies
- Real-time progress monitoring with event streaming
- Inter-task communication and resource management

### Integrated Git/PR Workflow
- Automatic branch creation per task/batch with naming conventions
- AI-generated comprehensive PR descriptions with business impact analysis
- Automated verification gates and multi-strategy merging
- New `pr create`, `pr merge`, enhanced `work start --create-branch` commands

### Enhanced Quality Gates & Checkpointing
- Comprehensive test suite: unit, integration, security, performance
- AI-assisted code review with best practices compliance
- Rich checkpoint snapshots with complete workspace state and git refs
- Executive reporting with progress metrics and risk assessment

## 📋 Updated State Machine
- Added IN_REVIEW, MERGED, FAILED statuses for complete lifecycle
- Comprehensive transition mapping for PR workflow integration
- Automated state transitions triggered by Git/PR operations

## 🔄 Implementation Priority Reordering
- Phase 0: Enhanced PRD & Discovery (NEW HIGH PRIORITY)
- Phase 1: Enhanced Task Generation (NEW HIGH PRIORITY)
- Phase 2: Git Integration & PR Workflow (NEW HIGH PRIORITY)
- Maintains backward compatibility with existing Golden Path features

## 📚 Documentation Updates
- GOLDEN_PATH.md: Transforms from 7-step basic workflow to 9-phase advanced MVP
- CLI_WIREFRAME.md: Adds new commands and reorders implementation priorities
- Enhanced acceptance checklist with 28 detailed validation criteria
- Complete module layout updates including `git_integration.py`

This redefines CodeFRAME v2 from a task automation tool to an AI-driven
development orchestration platform capable of end-to-end software project management.

* analysis: Identify critical CLI workflow gaps and implementation roadmap

## 🔍 Gap Analysis Summary

**Most Critical Finding**: Missing credential management system would impact 100% of users
- Authentication failures at PRD generation, batch execution, and PR creation
- Users must manually manage API keys across multiple providers
- No validation or health checking for configured credentials

## 📊 Complete Gap Matrix

### Critical (Showstopper) Issues:
1. **Credential Management** - No auth setup/list/validate commands
2. **Environment Validation** - No pre-flight tool checking
3. **Real-time State Backup** - No auto-checkpointing during batches
4. **Partial Recovery** - Only full rollback, no granular recovery

### Medium (High Frustration) Issues:
5. **Dependency Conflict Resolution** - Circular/hard dependency handling
6. **Integration Testing** - No pre-PR validation of changes

### Quality (Minor Annoyance) Issues:
7. **Rich Monitoring** - Limited debugging for failed tasks
8. **Template Management** - No reusable configurations
9. **Workflow Automation** - No pattern reuse capabilities

## 🚀 4-Week Implementation Plan

### Week 1-2: Foundation Infrastructure
- Week 1: Comprehensive credential management system (`codeframe auth`)
- Week 2: Environment validation + incremental state persistence

### Week 3-4: Robustness Enhancements
- Week 3: Granular recovery + dependency conflict resolution
- Week 4: Integration testing + enhanced monitoring

## 📋 Key Implementation Files

**Core Modules to Create**:
- `codeframe/core/credentials.py` - Secure credential storage
- `codeframe/core/environment.py` - Tool validation & auto-install
- `codeframe/core/integration_testing.py` - Pre-PR validation

**CLI Commands to Add**:
- `codeframe auth setup/list/validate/rotate/remove`
- `codeframe env check/doctor/auto-install`
- `codeframe rollback task/last/batch`
- `codeframe test integration/compatibility/breaking-changes`

## 🎯 Expected Impact

**Before**: Theoretically complete MVP but practically frustrating
**After**: Both theoretically complete AND practically reliable CLI

This addresses the critical gap between documented workflow and usable tool.

* update: Accurate CLI workflow implementation status for enhanced MVP

## 📋 Implementation Status Assessment

**Analysis Method**: Examined actual CLI functionality vs. checklist requirements
- Reviewed CLI command implementations in `/codeframe/cli/app.py`
- Verified core functionality by running commands directly
- Identified working features and missing gaps

## ✅ Confirmed Working Components

### Core Infrastructure
- [x] `codeframe init` - Basic and enhanced (detect, interactive) modes
- [x] `codeframe status` - Comprehensive workspace display with PRD, tasks, events
- [x] Core workspace management - State persistence and recovery
- [x] Event system - Rich logging and streaming capabilities

### Basic PRD & Task Management
- [x] `codeframe prd add <file.md>` - File-based PRD storage
- [x] `codeframe tasks generate` - LLM and simple extraction modes
- [x] `codeframe tasks list` - Task listing with status filtering
- [x] `codeframe tasks set status` - Manual state transitions
- [x] Task CRUD operations (create, update, delete)
- [x] Dependency management with state machine enforcement

### Batch Execution Framework
- [x] `codeframe work batch run` - Multi-strategy execution (serial, parallel, auto)
- [x] `codeframe work batch status` - Batch monitoring and reporting
- [x] `codeframe work batch follow` - Real-time event streaming
- [x] `codeframe work batch resume` - Failed task recovery
- [x] `codeframe work start <task-id>` - Individual task execution
- [x] `codeframe work stop/resume/status` - Task lifecycle management
- [x] Main orchestrator with comprehensive failure handling
- [x] Event-driven progress tracking and ETA calculation

### Quality Gates & Verification
- [x] `codeframe review` - Multi-gate execution framework
- [x] `codeframe summary` - Comprehensive workspace reporting
- [x] Gate framework with extensible architecture
- [x] Test execution with coverage and reporting

### Checkpointing & State Management
- [x] `codeframe checkpoint create` - Rich state snapshots
- [x] `codeframe checkpoint list/show/restore` - Complete checkpoint lifecycle
- [x] Git reference integration for branch tracking
- [x] State restoration and recovery procedures

### Human-in-the-Loop Features
- [x] `codeframe blockers list` - Rich blocker context display
- [x] `codeframe blocker answer <id>` - Interactive resolution system
- [x] Blocker learning and pattern recognition
- [x] Integration with task lifecycle management

### Cross-Cutting Requirements
- [x] **CLI-first operation** - All commands work without FastAPI dependency
- [x] **Event logging** - Comprehensive audit trail and observability
- [x] **Error handling** - Graceful failure recovery and user guidance
- [x] **Performance** - Efficient batch processing and parallel execution

## ⚠️ Identified Gaps (Critical vs. Minor)

### 🔥 Critical Gaps (Would Block Workflow)
1. **No `codeframe prd generate`** - Enhanced MVP requires AI-driven PRD generation
   - **Current Status**: Only basic `prd add` exists
   - **Impact**: 100% of users would hit this gap immediately

2. **No `codeframe auth` system** - Credential management infrastructure
   - **Current Status**: Basic auth commands exist but lack comprehensive management
   - **Impact**: Authentication failures would block entire workflow

3. **No environment validation** - Pre-flight tool checking
   - **Current Status**: No validation commands exist
   - **Impact**: Batch failures mid-execution due to missing tools

### ⚡ Medium Gaps (High Frustration)
4. **No `codeframe pr create/merge`** - Git/PR workflow CLI commands
   - **Current Status**: GitHub integration exists but no CLI commands
   - **Impact**: Manual PR creation required for final workflow step

5. **Limited dependency conflict resolution** - Advanced task dependency management
   - **Current Status**: Basic dependency analysis exists
   - **Impact**: Complex projects may have unresolvable dependency loops

### 🔧 Quality Gaps (Minor Annoyance)
6. **No AI-assisted code review** - Enhanced quality gates
   - **Current Status**: Basic verification only
   - **Impact**: Missed opportunities for automated code improvement

7. **No enhanced monitoring/debugging** - Rich CLI experience
   - **Current Status**: Basic event streaming exists
   - **Impact**: Difficult to debug complex failures

## 🎯 Overall Assessment

### Current State: **~60% Complete**
- **Foundation**: Strong - Core CLI, basic PRD, tasks, batch execution ✅
- **Enhanced Features**: Missing - AI PRD generation, Git/PR CLI, auth management ⚠️
- **Robustness**: Partial - Basic recovery exists, advanced recovery missing ⚠️
- **Quality**: Basic - Verification works, enhanced features missing ⚠️

### Critical Path Forward
1. **Immediate (Week 1-2)**: Implement `codeframe prd generate` and credential management
2. **Short-term (Week 3-4)**: Add Git/PR CLI commands and environment validation
3. **Medium-term (Month 2)**: Enhanced monitoring, AI code review, advanced recovery

**Assessment**: Enhanced MVP has solid foundation but requires critical gaps to be filled for truly usable CLI workflow.

## 📚 Recommendation

**Proceed with gap analysis implementation plan** - Address critical authentication and PRD generation gaps first, then advance to Git/PR integration.

The CLI foundation is production-ready for basic workflows but needs enhanced features to meet full MVP goals.

* docs: Add comprehensive implementation roadmap for enhanced MVP completion

Consolidate gap analysis into phase-wise implementation plan addressing critical credential management, AI-driven PRD generation, and advanced workflow automation features.

## Phase 1 (Weeks 1-2): Foundation Infrastructure
- AI-driven PRD generation system
- Comprehensive credential management
- Enhanced environment validation

## Phase 2 (Weeks 3-4): Core Enhancement
- Advanced task generation with dependency analysis
- Production-ready batch execution
- Enhanced quality gates with AI-assisted review

## Phase 3 (Weeks 5-6): User Experience
- Enhanced blocker resolution with AI suggestions
- Rich monitoring and debugging capabilities
- Performance profiling and observability

## Phase 4 (Weeks 7-8): Integration & Automation
- Complete Git/PR workflow automation
- Template and profile management systems
- Workflow automation and predictive analytics

Transforms CodeFRAME from basic automation tool to comprehensive AI development platform.

* feat(prd): Add comprehensive PRD management commands and versioning (#293)

* feat(prd): Add comprehensive PRD management commands and versioning

Implements a complete PRD management system for the codeframe CLI:

Core PRD functions (codeframe/core/prd.py):
- delete(workspace, prd_id) - Remove a PRD from workspace
- export_to_file(workspace, prd_id, path, force) - Export PRD to file
- create_new_version(workspace, prd_id, content, summary) - Create new version
- get_versions(workspace, prd_id) - List all versions of a PRD
- get_version(workspace, prd_id, version_number) - Get specific version
- diff_versions(workspace, prd_id, v1, v2) - Generate unified diff

CLI commands (codeframe/cli/app.py):
- prd list - List all PRDs with IDs and timestamps
- prd show [id] - Enhanced to accept optional PRD ID
- prd delete <id> [--force] - Delete PRD with confirmation
- prd export <id|latest> <file> [--force] - Export PRD to file
- prd versions <id> - Show version history
- prd diff <id> <v1> <v2> - Show diff between versions
- prd update <id> <file> -m <message> - Create new version

Database schema additions:
- version (INTEGER) - Version number for PRD
- parent_id (TEXT) - Links to previous version
- change_summary (TEXT) - Description of changes

Includes 68 tests covering core functions and CLI commands.

* Update codeframe/core/prd.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* fix: Address code review issues for PRD versioning

- Add chain_id field to PrdRecord and prds table schema
- Add database indexes on parent_id and chain_id columns
- Make version number increment atomic with explicit transactions
- Optimize get_versions() to use single query with chain_id
- Add list_chains() function to list unique PRD chains
- Add delete validation with check_dependencies parameter
- Add PrdHasDependentTasksError exception for dependent tasks
- Update CLI_WIREFRAME.md with new PRD commands documentation

Fixes from code review:
1. Performance: Added idx_prds_parent and idx_prds_chain indexes
2. Architecture: Added chain_id for version grouping
3. Concurrency: Wrapped version creation in explicit transaction
4. N+1 queries: get_versions() now uses single query via chain_id
5. Documentation: Added 7 new PRD commands to CLI_WIREFRAME.md
6. Validation: delete() now checks for dependent tasks

* Update codeframe/cli/app.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/prd.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

---------

Co-authored-by: Test User <test@example.com>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* feat(credentials): Add comprehensive credential management system (#294)

* feat(credentials): Add comprehensive credential management system

Implement secure credential storage and management for CodeFRAME:

Core Module (codeframe/core/credentials.py):
- CredentialProvider enum with env var mappings and display names
- Credential dataclass with expiration, masking, and serialization
- CredentialStore with keyring-first + encrypted file fallback
- CredentialManager as high-level API with env var priority

CLI Commands (codeframe/cli/auth_commands.py):
- setup: Interactive credential configuration with validation
- list: Show all configured credentials with masked values
- validate: Test credential with provider APIs
- rotate: Replace credential atomically with optional validation
- remove: Delete stored credential with confirmation

Workflow Validation (codeframe/core/credential_validator.py):
- Pre-workflow credential checks by workflow type
- require_credential() helper for fail-fast scenarios
- check_llm_credentials() for any-LLM-provider validation

Audit Logging (codeframe/core/credential_audit.py):
- Comprehensive audit trail for all credential operations
- Sensitive value filtering (never logs actual credentials)
- Log rotation support (10MB default)

Integration:
- AnthropicProvider accepts optional credential_manager
- GitHubIntegration accepts optional credential_manager
- Full backward compatibility with environment variables

Tests: 78 new tests covering all functionality

* fix(credentials): Address PR review feedback for security and code quality

Security improvements:
- Add chmod after atomic rename to ensure 600 permissions on all filesystems
- Enhance machine ID derivation to use /etc/machine-id (Linux) or registry
  GUID (Windows) for more stable encryption keys
- Replace broad exception handling with specific handlers (InvalidToken,
  JSONDecodeError, PermissionError, OSError) with actionable error messages

Code quality fixes:
- Update validate_credential_format() to check actual prefixes (sk-ant-,
  sk-, glpat-) as documented in comments, with minimum length of 20 chars
- Clarify list_providers() docstring about keyring enumeration limitation

Bug fixes:
- Improve validation functions to distinguish auth failures from network
  errors, timeouts, and rate limiting for better user feedback
- Update tests with appropriately long test credentials

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credential_audit.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* fix(credentials): Address remaining PR review issues

High priority fixes:
- Reject empty/whitespace-only credential values in setup command
- Fix remove command to check credential source before reporting success
  (now warns when credential is only set via environment variable)

Medium priority fixes:
- Add salt file validation (must be exactly 16 bytes)
- Add error handling for malformed credential data in from_dict calls
  (prevents crashes from corrupted keyring or encrypted store data)

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credential_audit.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credential_audit.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credential_audit.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/cli/auth_commands.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/cli/auth_commands.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/cli/auth_commands.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

---------

Co-authored-by: Test User <test@example.com>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* (via frankbria): Fix _load_encrypted_store to raise exceptions on read errors to prevent  (#295)

* feat(credentials): Add comprehensive credential management system

Implement secure credential storage and management for CodeFRAME:

Core Module (codeframe/core/credentials.py):
- CredentialProvider enum with env var mappings and display names
- Credential dataclass with expiration, masking, and serialization
- CredentialStore with keyring-first + encrypted file fallback
- CredentialManager as high-level API with env var priority

CLI Commands (codeframe/cli/auth_commands.py):
- setup: Interactive credential configuration with validation
- list: Show all configured credentials with masked values
- validate: Test credential with provider APIs
- rotate: Replace credential atomically with optional validation
- remove: Delete stored credential with confirmation

Workflow Validation (codeframe/core/credential_validator.py):
- Pre-workflow credential checks by workflow type
- require_credential() helper for fail-fast scenarios
- check_llm_credentials() for any-LLM-provider validation

Audit Logging (codeframe/core/credential_audit.py):
- Comprehensive audit trail for all credential operations
- Sensitive value filtering (never logs actual credentials)
- Log rotation support (10MB default)

Integration:
- AnthropicProvider accepts optional credential_manager
- GitHubIntegration accepts optional credential_manager
- Full backward compatibility with environment variables

Tests: 78 new tests covering all functionality

* fix(credentials): Address PR review feedback for security and code quality

Security improvements:
- Add chmod after atomic rename to ensure 600 permissions on all filesystems
- Enhance machine ID derivation to use /etc/machine-id (Linux) or registry
  GUID (Windows) for more stable encryption keys
- Replace broad exception handling with specific handlers (InvalidToken,
  JSONDecodeError, PermissionError, OSError) with actionable error messages

Code quality fixes:
- Update validate_credential_format() to check actual prefixes (sk-ant-,
  sk-, glpat-) as documented in comments, with minimum length of 20 chars
- Clarify list_providers() docstring about keyring enumeration limitation

Bug fixes:
- Improve validation functions to distinguish auth failures from network
  errors, timeouts, and rate limiting for better user feedback
- Update tests with appropriately long test credentials

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credential_audit.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Update codeframe/core/credentials.py

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* Fix _load_encrypted_store to raise exceptions on read errors to prevent data loss

* Remove global keyring disable on store failure in CredentialStore.store()

---------

Co-authored-by: Test User <test@example.com>
Co-authored-by: Frank Bria <frank.bria@proton.me>
Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>

* fix(cli): ensure consistent 'codeframe' usage in help text

Add __main__.py files to enable python -m invocation with proper
program name. Uses Typer's prog_name parameter for reliable usage
line display regardless of invocation method.

- Add codeframe/__main__.py for python -m codeframe
- Add codeframe/cli/__main__.py for python -m codeframe.cli
- Update legacy CLI __main__ blocks with sys.argv[0] fix
- Clarify expire_blockers.py is an internal scheduled task

* fix: address code review findings across LLM adapters and core modules

LLM Adapters:
- Fix conductor import (get_llm_provider -> get_provider)
- Fix _convert_messages to preserve user text with tool_results
- Fix ModelSelector __post_init__ to respect constructor values
- Update model constants to valid Anthropic identifiers

Core Security & Safety:
- Add file deletion safeguards in agent.py (path traversal protection)
- Add safe shell command parsing with allowlist validation
- Improve dangerous command detection in executor.py (regex patterns)
- Add timeout to git subprocess in checkpoints.py

Core Reliability:
- Fix dependency_analyzer to always update deps (clear stale edges)
- Fix dependency_analyzer to use valid loaded task IDs
- Fix f-string prefix insertion in quick_fixes.py
- Fix DB connection handling in runtime.py (try/finally)
- Fix DB connection in tasks.py and use LLM adapter
- Replace debug prints with logging in runtime.py supervisor block

Schema:
- Add depends_on column migration for prds table in workspace.py

* fix(core): Address code review findings for security and reliability

agent.py:
- Fix _try_auto_fix to check ruff returncode and log failures
- Add path safety validation to create/edit actions using _is_path_safe
- Reject shell commands when _parse_command_safely returns requires_shell=True

checkpoints.py:
- Add try/finally blocks to all DB operations for reliable connection cleanup

conductor.py:
- Add _active_processes_lock for thread-safe process tracking
- Add _batch_db_lock for thread-safe batch DB writes in _save_batch
- Fix misleading comment about "temporary" dependencies (they persist)

dependency_analyzer.py:
- Only update dependencies when inferred list is non-empty (preserve existing)

executor.py:
- Use shell=False with shlex.split when no shell operators are present
- Fall back to shell=True only for commands with pipes, redirects, etc.

quick_fixes.py:
- Fix Poetry detection by checking poetry.lock before pyproject.toml
- Add handling for unicode 'u' prefix (don't add 'f' to u-strings)

workspace.py:
- Add depends_on column to initial prds schema creation
- Add idx_prds_depends_on index to initial schema

* style: fix ruff lint errors across codebase

- Fix E741 ambiguous variable name 'l' → 'line' in artifacts.py and gates.py
- Fix E402 module-level import order in test_tasks_crud.py and test_tasks_set_bulk.py
- Remove F401 unused imports across 13 test files and 2 core modules

* ci: disable frontend tests during v2 CLI-first refactor

- Comment out frontend-tests, e2e-smoke-tests jobs (web-ui is legacy)
- Remove Node.js setup from code-quality job
- Add skip checks for web-ui/src in hardcoded-urls job
- Update test-summary to remove frontend-tests dependency

The web-ui package.json is missing; re-enable these jobs when
the frontend is restored.

* fix(core): Address code review findings for reliability and consistency

artifacts.py:
- Track which diff was actually used when falling back from staged to
  unstaged, ensuring stats match the exported patch content

dependency_graph.py:
- Remove dead no-op loop in topological_sort that computed in_degree
  but only contained pass statements

events.py:
- Add try/finally to emit() to ensure DB connection closes on exception
- Add try/finally to emit_for_workspace() for same reason
- Add try/finally to list_recent() to ensure DB connection closes

gates.py:
- Add ERROR status count to GateResult.summary property
- Fix GATES_STARTED event to report actual empty list vs ["auto"]
- Make unknown gates FAILED (not SKIPPED) when explicitly requested,
  with helpful error message listing valid gate names

* fix(cli): Register auth_app and fix test failures

- Register auth_app from auth_commands.py in main CLI app
- Fix test_credential_commands.py tests to mock get_credential_source
- Skip test_serve_command.py tests (serve is stub during v2 refactor)
- Skip test_cli_session.py tests (session management not in v2 Golden Path)

* test: skip WebSocket integration tests during v2 refactor

These tests require a running FastAPI server with full WebSocket support,
but the v2 serve command is a stub. The server adapter will be implemented
post-Golden Path.

* fix(core): Improve stats accuracy and handle empty dependency lists

artifacts.py:
- When falling back to plain unstaged diff (git diff without HEAD),
  parse stats directly from patch content via _parse_patch_content_stats()
- _get_diff_stats with staged_only=False runs "git diff HEAD --stat"
  which may return zeros for pure working tree changes

dependency_graph.py:
- Fix ValueError when max() is called on empty generator in calculate_level()
- Use max(dep_levels, default=-1) to handle nodes with deps not in graph
- Nodes with no valid in-graph deps are treated as level 0 (root nodes)

* test: skip dashboard integration tests during …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement AI-driven PRD generation system

1 participant