code-surgeon operates as a sequential pipeline, with state saved after each phase for resumable sessions.
User Input (GitHub issue or requirement)
↓
┌──────────────────────────────────────────┐
│ PHASE 1: Analysis (2 minutes) │
│ ├─ Issue Analyzer: Parse requirements │
│ ├─ Framework Detector: Identify tech │
│ └─ Output: Type, requirements, frameworks│
└──────────────────────────────────────────┘
↓ [State Saved: Phase 1 Complete]
┌──────────────────────────────────────────┐
│ PHASE 2: Context Research (5 minutes) │
│ ├─ Analyze file structure │
│ ├─ Build dependency graph │
│ ├─ Extract patterns │
│ ├─ Find team guidelines │
│ └─ Smart file selection (3-tier) │
└──────────────────────────────────────────┘
↓ [State Saved: Phase 2 Complete]
┌──────────────────────────────────────────┐
│ PHASE 3: Planning (3 minutes) │
│ ├─ Generate 6-section plan │
│ ├─ Analyze breaking changes │
│ ├─ Order tasks logically │
│ └─ Estimate effort │
└──────────────────────────────────────────┘
↓ [State Saved: Phase 3 Complete]
┌──────────────────────────────────────────┐
│ PHASE 4: Surgical Prompts (2 minutes) │
│ ├─ Create 9-section prompts per task │
│ ├─ Apply framework-specific guidance │
│ ├─ Validate against team guidelines │
│ └─ Scan for PII/secrets │
└──────────────────────────────────────────┘
↓ [State Saved: Phase 4 Complete]
┌──────────────────────────────────────────┐
│ PHASE 5: Output Formatting (1 minute) │
│ ├─ Generate PLAN.md (human-readable) │
│ ├─ Generate plan.json (machine-readable)│
│ └─ Generate interactive.json (CLI mode) │
└──────────────────────────────────────────┘
↓
Three Ready-to-Use Output Files
code-surgeon orchestrates 5 specialized sub-skills:
Role: Parse requirements and detect issue type
Input: GitHub URL or plain text description Output:
{
"type": "feature" | "bug" | "refactor" | "perf",
"requirements": ["req1", "req2", ...],
"deadline": "2025-02-20",
"file_hints": ["src/auth.ts", ...]
}Role: Identify frameworks, versions, and tech stack
Scans: package.json, pyproject.toml, go.mod, Gemfile, Cargo.toml, etc. Output:
{
"frameworks": ["react", "express", ...],
"primary_language": "typescript",
"versions": {"react": "18.2.0", ...},
"is_monorepo": true,
"monorepo_info": {...}
}Role: Analyze codebase and intelligently select files
Algorithm: 3-Tier File Selection
Tier 1 (Direct): Files directly mentioned in requirements
Tier 2 (Dependencies): Files imported by Tier 1
Tier 3 (Patterns): Files matching architectural patterns
Token Budget Aware:
- QUICK: 30K tokens (Tier 1 only)
- STANDARD: 60K tokens (Tier 1 + 2)
- DEEP: 90K tokens (Tier 1 + 2 + 3)
Output:
{
"files": {
"tier_1": [...],
"tier_2": [...],
"tier_3": [...]
},
"dependency_graph": {...},
"patterns": [...],
"team_conventions": {...}
}Role: Create comprehensive 6-section implementation plan
6 Sections:
- Summary (strategy overview)
- Research (codebase findings)
- Design Choices (decisions + rationale)
- Phases (logical work chunks)
- Tasks (granular work items + dependencies)
- Verification (testing checklist)
Breaking Change Detection:
- API breaking changes
- Data schema breaking changes
- Behavior breaking changes
- Dependency breaking changes
Role: Create precise, actionable prompts per task
9-Section Prompt Structure:
- Objective
- Context
- Scope
- Approach
- Patterns
- Constraints
- Breaking Changes
- Success Criteria
- Common Mistakes
Framework-Specific:
- React: Hook patterns, component typing
- Django: Model patterns, view patterns
- Express: Middleware, routing patterns
- Rails: Model-View-Controller patterns
- Etc. for 35+ frameworks
.claude/planning/sessions/surgeon-20250212-abc123xyz/
├── state.json ← Complete state (resumable)
├── PLAN.md ← Human-readable plan
├── plan.json ← Machine-readable plan
├── interactive.json ← CLI mode data
└── logs/
└── execution.log ← Detailed logs
- After each phase completes, state is saved atomically
- If interrupted, session is recoverable from last completed phase
- Resume command:
/code-surgeon-resume surgeon-20250212-abc123xyz
.claude/planning/cache/
├── file-structure-<hash>.json ← Repo structure
├── dependency-graph-<hash>.json ← Dependency map
└── patterns-<hash>.json ← Pattern extraction
Caches save 25-30% of tokens on repeated analyses in same repo.
- Phase 1: Normal
- Phase 2: Skip Tier 3 pattern extraction
- Phase 3: Reduce to 2-4 tasks
- Phase 4: 5-section prompts
- Phase 5: Markdown only
- Phase 1: Normal
- Phase 2: Load Tier 1 + 2 + 3, extract 3-5 patterns
- Phase 3: Full 6-section plan, 5-8 tasks
- Phase 4: Full 9-section prompts
- Phase 5: All 3 output formats
- Phase 1: Normal
- Phase 2: Include file history, full dependency graph, 5-10 patterns
- Phase 3: 2-3 alternatives per decision, comprehensive breaking change analysis
- Phase 4: Extended prompts with 10-15 line code examples
- Phase 5: All 3 output formats with extended context
| Scenario | Detection | Recovery | Result |
|---|---|---|---|
| Empty requirement | Validation | Show error, stop | User provides requirement |
| Sub-skill timeout | Phase execution | Retry once, then save | User resumes |
| Token budget exceeded | Phase 2-3 | Offer options | User chooses action |
| PII/secrets detected | Phase 4 validation | Block generation | User sanitizes code |
| Output validation fails | Phase output | Retry once | User resumes or restarts |
| Repo not found | Phase 2 init | Show error, stop | User fixes path |
| User interrupt | Any time | Save state immediately | User resumes |
- User interrupts or error occurs
- State is saved atomically
- Session ID provided:
surgeon-<date>-<random> - User can resume:
/code-surgeon-resume surgeon-<id> - System loads state, finds highest completed phase
- Continues from next phase, reuses prior outputs
1. Check if .claude/team-guidelines.md exists
2. If found: Load and parse guidelines
3. If not found: Continue without guidelines
4. Apply guidelines to:
- Surgical prompt generation
- Breaking change detection
- Code example generation
5. Flag violations in output
# Team Guidelines
## Code Style
- [language-specific rules]
## Architecture Patterns
- [patterns team uses]
## Security & Compliance
- [requirements]- Scan package manager files (package.json, pyproject.toml, etc.)
- Parse dependencies and versions
- Match against known framework signatures
- Determine primary language
- Check for monorepo indicators
if framework == "React":
- Use Hook patterns in examples
- Reference React best practices
- Include TypeScript patterns if detected
if framework == "Django":
- Use Model-View patterns
- Reference Django ORM
- Include migration guidance
[Similar logic for 35+ frameworks]
- Expert knowledge first — 82% of content is expert-only
- Minimal activation content — Reminders kept brief
- No redundancy — Never explain what Claude already knows
- Smart layering — Load only what's needed per phase
- Caching — Avoid re-analyzing same codebase
Average invocation:
├─ SKILL.md loading: ~12K tokens (once per session)
├─ Phase execution: ~8K tokens (current phase only)
├─ User context: ~2-3K tokens
└─ Total context: ~22K tokens of ~100K available
Result: 78% context window free for work
| Phase | Time | Tokens | CPU |
|---|---|---|---|
| Phase 1 | 2 min | ~4K | Low |
| Phase 2 | 5 min | ~12K | Medium |
| Phase 3 | 3 min | ~8K | Medium |
| Phase 4 | 2 min | ~6K | Medium |
| Phase 5 | 1 min | ~2K | Low |
| Total | 13 min | ~32K | Medium |
(Varies by depth mode and codebase size)
Future enhancements:
- Custom framework templates
- Plugin system for specialized analysis
- Integration with CI/CD pipelines
- Advanced pattern detection
- Machine learning-based risk assessment
See README.md for user-facing documentation.