|
1 | 1 | # CodeFlash AI Agent Instructions |
2 | 2 |
|
3 | | -This file provides comprehensive guidance to any coding agent (Warp, GitHub Copilot, Claude, Gemini, etc.) when working with the CodeFlash repository. |
4 | | - |
5 | 3 | ## Project Overview |
6 | 4 |
|
7 | | -CodeFlash is an AI-powered Python code optimizer that automatically improves code performance while maintaining correctness. It uses LLMs to analyze code, generate optimization ideas, validate correctness through comprehensive testing, benchmark performance improvements, and create merge-ready pull requests. |
8 | | - |
9 | | -**Key Capabilities:** |
10 | | -- Optimize entire codebases with `codeflash --all` |
11 | | -- Optimize specific files or functions with targeted commands |
12 | | -- End-to-end workflow optimization with `codeflash optimize script.py` |
13 | | -- Automated GitHub Actions integration for CI/CD pipelines |
14 | | -- Comprehensive benchmarking and performance analysis |
15 | | -- Git worktree isolation for safe optimization |
16 | | - |
17 | | -## Core Architecture |
18 | | - |
19 | | -### Data Flow Pipeline |
20 | | -Discovery → Context → Optimization → Verification → Benchmarking → PR |
21 | | - |
22 | | -1. **Discovery** (`codeflash/discovery/`) - Find optimizable functions via static analysis or execution tracing |
23 | | -2. **Context Extraction** (`codeflash/context/`) - Extract dependencies, imports, and related code |
24 | | -3. **Optimization** (`codeflash/optimization/`) - Generate optimized code via AI service calls |
25 | | -4. **Verification** (`codeflash/verification/`) - Run deterministic tests with custom pytest plugin |
26 | | -5. **Benchmarking** (`codeflash/benchmarking/`) - Performance measurement and comparison |
27 | | -6. **GitHub Integration** (`codeflash/github/`) - Automated PR creation with detailed analysis |
28 | | - |
29 | | -### Key Components |
30 | | - |
31 | | -**Main Entry Points:** |
32 | | -- `codeflash/main.py` - CLI entry point and main orchestration |
33 | | -- `codeflash/cli_cmds/cli.py` - Command-line argument parsing and validation |
34 | | - |
35 | | -**Core Optimization Pipeline:** |
36 | | -- `codeflash/optimization/optimizer.py` - Main optimization orchestrator |
37 | | -- `codeflash/optimization/function_optimizer.py` - Individual function optimization |
38 | | -- `codeflash/tracing/` - Function call tracing and profiling |
39 | | - |
40 | | -**Code Analysis & Manipulation:** |
41 | | -- `codeflash/code_utils/` - Code parsing, AST manipulation, static analysis |
42 | | -- `codeflash/context/` - Code context extraction and analysis |
43 | | -- `codeflash/verification/` - Code correctness verification through testing |
44 | | - |
45 | | -**External Integrations:** |
46 | | -- `codeflash/api/aiservice.py` - LLM communication with rate limiting and retries |
47 | | -- `codeflash/github/` - GitHub integration for PR creation |
48 | | -- `codeflash/benchmarking/` - Performance benchmarking and measurement |
| 5 | +CodeFlash is an AI-powered Python code optimizer that automatically improves code performance while maintaining correctness. |
49 | 6 |
|
50 | | -**Supporting Systems:** |
51 | | -- `codeflash/models/models.py` - Pydantic models and type definitions |
52 | | -- `codeflash/telemetry/` - Usage analytics (PostHog) and error reporting (Sentry) |
53 | | -- `codeflash/ui/` - User interface components (Rich console output) |
54 | | -- `codeflash/lsp/` - Language Server Protocol support for IDE integration |
| 7 | +## Architecture |
55 | 8 |
|
56 | | -### Key Optimization Workflows |
57 | | - |
58 | | -**1. Full Codebase Optimization (`--all`)** |
59 | | -- Discovers all optimizable functions in the project |
60 | | -- Runs benchmarks if configured |
61 | | -- Optimizes functions in parallel |
62 | | -- Creates PRs for successful optimizations |
63 | | - |
64 | | -**2. Targeted Optimization (`--file`, `--function`)** |
65 | | -- Focuses on specific files or functions |
66 | | -- Performs detailed analysis and context extraction |
67 | | -- Applies targeted optimizations |
68 | | - |
69 | | -**3. Workflow Tracing (`optimize`)** |
70 | | -- Traces Python script execution |
71 | | -- Identifies performance bottlenecks |
72 | | -- Generates optimizations for traced functions |
73 | | -- Uses checkpoint system to resume interrupted runs |
| 9 | +``` |
| 10 | +codeflash/ |
| 11 | +├── main.py # CLI entry point |
| 12 | +├── cli_cmds/ # Command handling, console output (Rich) |
| 13 | +├── discovery/ # Find optimizable functions |
| 14 | +├── context/ # Extract code dependencies and imports |
| 15 | +├── optimization/ # Generate optimized code via AI |
| 16 | +├── verification/ # Run deterministic tests (pytest plugin) |
| 17 | +├── benchmarking/ # Performance measurement |
| 18 | +├── github/ # PR creation |
| 19 | +├── api/ # AI service communication |
| 20 | +├── code_utils/ # Code parsing, git utilities |
| 21 | +├── models/ # Pydantic models and types |
| 22 | +├── tracing/ # Function call tracing |
| 23 | +├── lsp/ # IDE integration |
| 24 | +├── telemetry/ # Sentry, PostHog |
| 25 | +└── either.py # Functional error handling |
| 26 | +``` |
74 | 27 |
|
75 | 28 | ## Critical Development Patterns |
76 | 29 |
|
77 | | -### Package Management with uv (NOT pip) |
| 30 | +### Use uv, not pip |
78 | 31 | ```bash |
79 | | -# Always use uv, never pip |
80 | 32 | uv sync # Install dependencies |
81 | | -uv sync --group dev # Install dev dependencies |
| 33 | +uv sync --group dev # Dev dependencies |
82 | 34 | uv run pytest # Run commands |
83 | | -uv add package # Add new packages |
84 | | -uv build # Build package |
| 35 | +uv add package # Add packages |
85 | 36 | ``` |
86 | 37 |
|
87 | | -### Code Manipulation with LibCST (NOT ast) |
88 | | -Always use `libcst` for code parsing/modification to preserve formatting: |
89 | | -```python |
90 | | -from libcst import parse_module, PartialPythonCodeGen |
91 | | -# Never use ast module for code transformations |
92 | | -``` |
93 | | - |
94 | | -### Testing with Deterministic Execution |
95 | | -Custom pytest plugin (`codeflash/verification/pytest_plugin.py`) ensures reproducible tests: |
96 | | -- Patches time, random, uuid for deterministic behavior |
97 | | -- Environment variables: `CODEFLASH_TEST_MODULE`, `CODEFLASH_TEST_CLASS`, `CODEFLASH_TEST_FUNCTION` |
98 | | -- Always use `uv run pytest`, never `python -m pytest` |
99 | | - |
100 | | -### Git Worktree Isolation |
101 | | -Optimizations run in isolated git worktrees to avoid affecting main repo: |
102 | | -```python |
103 | | -from codeflash.code_utils.git_utils import create_detached_worktree, remove_worktree |
104 | | -# Pattern: create_detached_worktree() → optimize → create_diff_patch_from_worktree() |
105 | | -``` |
| 38 | +### Use libcst, not ast |
| 39 | +Always use `libcst` for code parsing/modification to preserve formatting. |
106 | 40 |
|
107 | | -### Error Handling with Either Pattern |
108 | | -Use functional error handling instead of exceptions: |
| 41 | +### Use Either pattern for errors |
109 | 42 | ```python |
110 | | -from codeflash.either import is_successful, Either |
| 43 | +from codeflash.either import is_successful |
111 | 44 | result = aiservice_client.call_llm(...) |
112 | 45 | if is_successful(result): |
113 | 46 | optimized_code = result.value |
114 | 47 | else: |
115 | 48 | error = result.error |
116 | 49 | ``` |
117 | 50 |
|
118 | | -## Configuration |
119 | | - |
120 | | -All configuration in `pyproject.toml` under `[tool.codeflash]`: |
121 | | -```toml |
122 | | -[tool.codeflash] |
123 | | -module-root = "codeflash" # Source code location |
124 | | -tests-root = "tests" # Test directory |
125 | | -benchmarks-root = "tests/benchmarks" # Benchmark tests |
126 | | -test-framework = "pytest" # Always pytest |
127 | | -formatter-cmds = [ # Auto-formatting commands |
128 | | - "uvx ruff check --exit-zero --fix $file", |
129 | | - "uvx ruff format $file", |
130 | | -] |
131 | | -``` |
132 | | - |
133 | | -## Development Commands |
134 | | - |
135 | | -### Environment Setup |
136 | | -```bash |
137 | | -# Install dependencies (always use uv) |
138 | | -uv sync |
139 | | - |
140 | | -# Install development dependencies |
141 | | -uv sync --group dev |
142 | | - |
143 | | -# Install pre-commit hooks |
144 | | -uv run pre-commit install |
145 | | -``` |
146 | | - |
147 | | -### Code Quality & Linting |
148 | | -```bash |
149 | | -# Run linting and formatting with ruff (primary tool) |
150 | | -uv run ruff check --fix . |
151 | | -uv run ruff format . |
152 | | - |
153 | | -# Type checking with mypy (strict mode) |
154 | | -uv run mypy . |
155 | | - |
156 | | -# Clean Python cache files |
157 | | -uvx pyclean . |
158 | | -``` |
159 | | - |
160 | | -### Testing |
161 | | -```bash |
162 | | -# Run all tests |
163 | | -uv run pytest |
164 | | - |
165 | | -# Run tests with coverage |
166 | | -uv run coverage run -m pytest tests/ |
167 | | - |
168 | | -# Run specific test file |
169 | | -uv run pytest tests/test_code_utils.py |
170 | | - |
171 | | -# Run tests with verbose output |
172 | | -uv run pytest -v |
173 | | - |
174 | | -# Run benchmarks |
175 | | -uv run pytest tests/benchmarks/ |
176 | | - |
177 | | -# Run end-to-end tests |
178 | | -uv run pytest tests/scripts/ |
179 | | - |
180 | | -# Run with specific markers |
181 | | -uv run pytest -m "not ci_skip" |
182 | | -``` |
183 | | - |
184 | | -### Running CodeFlash |
185 | | -```bash |
186 | | -# Initialize CodeFlash in a project |
187 | | -uv run codeflash init |
188 | | - |
189 | | -# Optimize entire codebase |
190 | | -uv run codeflash --all |
191 | | - |
192 | | -# Optimize specific file |
193 | | -uv run codeflash --file path/to/file.py |
194 | | - |
195 | | -# Optimize specific function |
196 | | -uv run codeflash --file path/to/file.py --function function_name |
197 | | - |
198 | | -# Trace and optimize a workflow |
199 | | -uv run codeflash optimize script.py |
200 | | - |
201 | | -# Verify setup with test optimization |
202 | | -uv run codeflash --verify-setup |
203 | | - |
204 | | -# Run with verbose logging |
205 | | -uv run codeflash --verbose --all |
206 | | - |
207 | | -# Run with benchmarking enabled |
208 | | -uv run codeflash --benchmark --file target_file.py |
209 | | - |
210 | | -# Use replay tests for debugging |
211 | | -uv run codeflash --replay-test tests/specific_test.py |
212 | | -``` |
213 | | - |
214 | | -## Development Guidelines |
215 | | - |
216 | | -### Code Style |
217 | | -- Uses Ruff for linting and formatting (configured in pyproject.toml) |
218 | | -- Strict mypy type checking enabled |
219 | | -- Pre-commit hooks enforce code quality |
220 | | -- Line length: 120 characters |
221 | | -- Python 3.10+ syntax |
222 | | -- Keep comments and docstrings to a minimum |
223 | | -- **Inline comments**: Only add where the logic is not self-evident (explain "why", not "what") |
224 | | -- **Docstrings**: Do not add to functions unless explicitly requested - function names should be self-explanatory |
225 | | -- Code should be self-documenting through clear naming and structure |
226 | | - |
227 | | -### Testing Strategy |
228 | | -- Primary test framework: pytest |
229 | | -- Tests located in `tests/` directory |
230 | | -- End-to-end tests in `tests/scripts/` |
231 | | -- Benchmarks in `tests/benchmarks/` |
232 | | -- Extensive use of `@pytest.mark.parametrize` |
233 | | -- Shared fixtures in conftest.py |
234 | | -- Test isolation via custom pytest plugin |
235 | | - |
236 | | -### Key Dependencies |
237 | | -- **Core**: `libcst`, `jedi`, `gitpython`, `pydantic` |
238 | | -- **Testing**: `pytest`, `coverage`, `crosshair-tool` |
239 | | -- **Performance**: `line_profiler`, `timeout-decorator` |
240 | | -- **UI**: `rich`, `inquirer`, `click` |
241 | | -- **AI**: Custom API client for LLM interactions |
242 | | - |
243 | | -### Data Models & Types |
244 | | -- `codeflash/models/models.py` - Pydantic models for all data structures |
245 | | -- Extensive use of `@dataclass(frozen=True)` for immutable data |
246 | | -- Core types: `FunctionToOptimize`, `ValidCode`, `BenchmarkKey` |
247 | | - |
248 | | -## AI Service Integration |
249 | | - |
250 | | -### Rate Limiting & Retries |
251 | | -- Built-in rate limiting and exponential backoff |
252 | | -- Handle `Either` return types for error handling |
253 | | -- AI service endpoint: `codeflash/api/aiservice.py` |
254 | | - |
255 | | -### Telemetry & Monitoring |
256 | | -- **Sentry**: Error tracking with `codeflash.telemetry.sentry` |
257 | | -- **PostHog**: Usage analytics with `codeflash.telemetry.posthog_cf` |
258 | | -- **Environment Variables**: `CODEFLASH_EXPERIMENT_ID` for testing modes |
259 | | - |
260 | | -## Performance & Benchmarking |
261 | | - |
262 | | -### Line Profiler Integration |
263 | | -- Uses `line_profiler` for detailed performance analysis |
264 | | -- Instruments functions with `@profile` decorator |
265 | | -- Generates before/after profiling reports |
266 | | -- Calculates precise speedup measurements |
267 | | - |
268 | | -### Benchmark Test Framework |
269 | | -- Custom benchmarking in `tests/benchmarks/` |
270 | | -- Generates replay tests from execution traces |
271 | | -- Validates performance improvements statistically |
272 | | - |
273 | | -## Debugging & Development |
274 | | - |
275 | | -### Verbose Logging |
276 | | -```bash |
277 | | -uv run codeflash --verbose --file target_file.py |
| 51 | +### Git worktree isolation |
| 52 | +Optimizations run in isolated worktrees: |
| 53 | +```python |
| 54 | +from codeflash.code_utils.git_worktree_utils import create_detached_worktree, remove_worktree |
278 | 55 | ``` |
279 | 56 |
|
280 | | -### Important Environment Variables |
281 | | -- `CODEFLASH_TEST_MODULE` - Current test module during verification |
282 | | -- `CODEFLASH_TEST_CLASS` - Current test class during verification |
283 | | -- `CODEFLASH_TEST_FUNCTION` - Current test function during verification |
284 | | -- `CODEFLASH_LOOP_INDEX` - Current iteration in pytest loops |
285 | | -- `CODEFLASH_EXPERIMENT_ID` - Enables local AI service for testing |
286 | | - |
287 | | -### LSP Integration |
288 | | -Language Server Protocol support in `codeflash/lsp/` enables IDE integration during optimization. |
289 | | - |
290 | | -### Common Debugging Patterns |
291 | | -1. Use verbose logging to trace optimization flow |
292 | | -2. Check git worktree operations for isolation issues |
293 | | -3. Verify deterministic test execution with environment variables |
294 | | -4. Use replay tests to debug specific optimization scenarios |
295 | | -5. Monitor AI service calls with rate limiting logs |
296 | | - |
297 | | -## Best Practices |
298 | | - |
299 | | -### Path Handling |
300 | | -- Always use absolute paths |
301 | | -- Handle encoding explicitly (UTF-8) |
302 | | -- Extensive path validation and cleanup utilities in `codeflash/code_utils/` |
303 | | - |
304 | | -### Git Operations |
305 | | -- All optimizations run in isolated worktrees |
306 | | -- Never modify the main repository directly |
307 | | -- Use git utilities in `codeflash/code_utils/git_utils.py` |
308 | | - |
309 | | -### Code Transformations |
310 | | -- Always use libcst, never ast module |
311 | | -- Preserve code formatting and comments |
312 | | -- Validate transformations with deterministic tests |
313 | | - |
314 | | -### Error Handling |
315 | | -- Use Either pattern for functional error handling |
316 | | -- Log errors to Sentry for monitoring |
317 | | -- Provide clear user feedback via Rich console |
| 57 | +## Code Style & Conventions |
318 | 58 |
|
319 | | -### Performance Optimization |
320 | | -- Profile before and after changes |
321 | | -- Use benchmarks to validate improvements |
322 | | -- Generate detailed performance reports |
| 59 | +- **Tooling**: Ruff for linting/formatting, mypy strict mode, pre-commit hooks |
| 60 | +- **Line length**: 120 characters |
| 61 | +- **Python**: 3.9+ syntax |
| 62 | +- **Comments**: Minimal - only explain "why", not "what" |
| 63 | +- **Docstrings**: Do not add unless explicitly requested |
| 64 | +- **Naming**: Prefer public functions (no leading underscore) - Python doesn't have true private functions |
| 65 | +- **Paths**: Always use absolute paths, handle encoding explicitly (UTF-8) |
323 | 66 |
|
324 | | -### PR Review Comments |
325 | | -When reviewing pull requests: |
326 | | -- **Limit your review to the changes in the PR** - only review the code that was actually modified in the commits, not other parts of the codebase |
327 | | -- Make a **single comment** per PR review, consolidating all feedback into one comment |
328 | | -- If an existing review comment from you already exists on the PR, **edit that existing comment** instead of creating a new one |
329 | | -- Never spam PRs with multiple separate comments |
| 67 | +## PR Review Guidelines |
330 | 68 |
|
331 | | -### Naming Conventions |
332 | | -- Prefer public function names over private (no leading underscore) |
333 | | -- Python doesn't have true private functions, so avoid `_function_name` patterns unless there's a strong reason |
334 | | -- Use clear, descriptive names that make the code self-documenting |
| 69 | +- **Limit review scope** - only review code actually modified in the PR, not other parts of the codebase |
| 70 | +- **Single comment** - consolidate all feedback into one comment per review |
| 71 | +- **Edit existing comments** - if you already commented on the PR, edit that comment instead of creating a new one |
335 | 72 |
|
336 | 73 | # Agent Rules <!-- tessl-managed --> |
337 | 74 |
|
|
0 commit comments