You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: improve skills to 100% review score and bump to v0.2.0
- Add trigger hints and code snippets to both skills
- Add checkpoints after each step
- Extract module reference and troubleshooting into linked files
- Bump codeflash-skills tile to 0.2.0
description: Step-by-step workflow for adding a new feature to the codeflash codebase
3
+
description: >
4
+
Guides implementation of new functionality in the codeflash optimization engine.
5
+
Use when adding a feature, building new functionality, implementing a new
6
+
optimization strategy, adding a language backend, creating an API endpoint,
7
+
extending the verification pipeline, or developing any new codeflash capability.
8
+
Covers module identification, Result type patterns, config, types, tests, and
9
+
quality checks.
4
10
---
5
11
6
12
# Add Codeflash Feature
7
13
8
-
Use this workflow when implementing a new feature in the codeflash codebase.
14
+
Use this workflow when implementing new functionality in the codeflash codebase — new optimization strategies, language backends, API endpoints, CLI commands, config options, or pipeline extensions.
9
15
10
16
## Step 1: Identify Target Modules
11
17
12
-
Determine which module(s) need modification based on the feature:
18
+
Determine which module(s) need modification. See [MODULE_REFERENCE.md](MODULE_REFERENCE.md) for the full mapping of feature areas to modules and key files.
13
19
14
-
| Feature area | Primary module | Key files |
15
-
|-------------|----------------|-----------|
16
-
| New optimization strategy |`optimization/`|`function_optimizer.py`, `optimizer.py`|
17
-
| New test type |`verification/`, `models/`|`test_runner.py`, `pytest_plugin.py`, `test_type.py`|
18
-
| New AI service endpoint |`api/`|`aiservice.py`|
19
-
| New language support |`languages/`| Create new `languages/<lang>/support.py`|
**Checkpoint**: Read the target files and understand existing patterns before writing any code. Look for similar features already implemented as reference.
25
21
26
22
## Step 2: Follow Result Type Pattern
27
23
@@ -43,33 +39,76 @@ if not is_successful(result):
43
39
value = result.unwrap()
44
40
```
45
41
42
+
**Checkpoint**: Verify your function signatures match the `Result` pattern used in surrounding code. Not all functions use `Result` — match the convention of the module you're modifying.
43
+
46
44
## Step 3: Add Configuration Constants
47
45
48
46
If the feature needs configurable thresholds or limits:
49
47
50
48
1. Add constants to `code_utils/config_consts.py`
51
-
2. If effort-dependent, add to `EFFORT_VALUES` dict with values for `LOW`, `MEDIUM`, `HIGH`
52
-
3. Add a corresponding `EffortKeys` enum entry
53
-
4. Access via `get_effort_value(EffortKeys.MY_KEY, effort_level)`
49
+
2. If effort-dependent, add to `EFFORT_VALUES` dict with values for all three levels:
**Checkpoint**: Run the new tests in isolation before proceeding: `uv run pytest tests/path/to/test_file.py -x`
73
112
74
113
## Step 6: Run Quality Checks
75
114
@@ -86,11 +125,22 @@ uv run mypy codeflash/
86
125
uv run pytest tests/path/to/relevant/tests -x
87
126
```
88
127
128
+
**If checks fail**:
129
+
-`prek run` failures: Fix formatting/lint issues reported by ruff, then re-run
130
+
-`mypy` failures: Fix type errors — common issues are missing return types, wrong `Optional` usage, or missing imports in `TYPE_CHECKING` block
131
+
- Test failures: Fix the failing test or the implementation, then re-run
132
+
89
133
## Step 7: Language Support Considerations
90
134
91
135
If the feature needs to work across languages:
92
136
93
-
1.Check if the feature uses language-specific APIs — use `get_language_support(identifier)` from `languages/registry.py`
137
+
1.Use `get_language_support(identifier)` from `languages/registry.py` — never import language classes directly
94
138
2. Current language is a singleton: `set_current_language()` / `current_language()` from `languages/current.py`
95
139
3. Use `is_python()` / `is_javascript()` guards for language-specific branches
96
-
4. New language support classes must use `@register_language` decorator
140
+
4. New language support classes must use `@register_language` decorator and be instantiable without arguments
141
+
142
+
**Checkpoint**: Skip this step if the feature is Python-only. Most features don't need multi-language support.
143
+
144
+
## Troubleshooting
145
+
146
+
If you run into issues, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for common problems and fixes (circular imports, `UnsupportedLanguageError`, CI path failures, Pydantic validation errors, token limit exceeded).
| Circular import at startup | Importing from `models/` in a module loaded early | Move import into `TYPE_CHECKING` block or use lazy import |
6
+
|`UnsupportedLanguageError`| Language modules not registered yet | Call `_ensure_languages_registered()` or use `get_language_support()` which does it automatically |
7
+
| Tests pass locally but fail in CI | Path differences (absolute vs relative) | Always use `.resolve()` on Path objects |
8
+
|`ValidationError` from Pydantic | Invalid code passed to `CodeString`| Check that generated code passes syntax validation for the target language |
9
+
|`encoded_tokens_len` exceeds limit | Context too large | Reduce helper functions or split into read-only vs read-writable |
2. Verify the function file is under the configured `module-root` in `pyproject.toml`
26
+
3. Check if the function was previously optimized — look for it in the optimization history
20
27
21
-
**If not discovered**: Check config patterns, file location, and function size.
28
+
**Checkpoint**: If the function doesn't appear in discovery output, fix config patterns or file location before proceeding.
22
29
23
30
## Step 2: Check Ranking
24
31
25
32
If trace data is used, check if the function was ranked high enough.
26
33
27
-
1. Look at `benchmarking/function_ranker.py` output
28
-
2. The function's **addressable time** must exceed `DEFAULT_IMPORTANCE_THRESHOLD=0.001`
29
-
3. Addressable time = own time + callee time / call count
34
+
1. Look at `benchmarking/function_ranker.py` output for the function's addressable time
35
+
2. The function must exceed `DEFAULT_IMPORTANCE_THRESHOLD=0.001`:
36
+
```python
37
+
# Addressable time = own time + callee time / call count
38
+
# Grep for the function in ranking output:
39
+
# grep -i "function_name" in ranking logs
40
+
```
41
+
3. Functions below the threshold are silently skipped
30
42
31
-
**If ranked too low**: The function doesn't spend enough time to be worth optimizing.
43
+
**Checkpoint**: If ranked too low, the function doesn't spend enough time to be worth optimizing. No fix needed — this is expected.
32
44
33
45
## Step 3: Check Context Token Limits
34
46
35
47
Verify the function's context fits within token limits.
36
48
37
-
1. Check `OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000` and `TESTGEN_CONTEXT_TOKEN_LIMIT=16000` in `code_utils/config_consts.py`
38
-
2. Token counting is done by `encoded_tokens_len()` in `code_utils/code_utils.py`
39
-
3. Large helper function chains or deep dependency trees can blow the limit
49
+
1. Check thresholds in `code_utils/config_consts.py`:
50
+
```python
51
+
OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000# tokens
52
+
TESTGEN_CONTEXT_TOKEN_LIMIT=16000# tokens
53
+
```
54
+
2. Token counting uses `encoded_tokens_len()` from `code_utils/code_utils.py`
55
+
3. Common causes: large helper function chains, deep dependency trees, large class hierarchies
40
56
41
-
**If context too large**: The function has too many dependencies. Consider refactoring to reduce context size.
57
+
**Checkpoint**: If context exceeds limits, the function is rejected. Consider refactoring to reduce dependencies or splitting large modules.
42
58
43
59
## Step 4: Check AI Service Response
44
60
45
61
Verify the AI service returned valid candidates.
46
62
47
-
1. Check logs for `AiServiceClient` request/response
48
-
2. Look for HTTP errors (non-200 status codes)
49
-
3. Verify `_get_valid_candidates()` parsed the response — empty `code_strings` means invalid markdown code blocks
50
-
4. Check if all candidates were filtered out during parsing
63
+
1. Look for HTTP errors in logs:
64
+
```
65
+
# Error patterns to search for:
66
+
"Error generating optimized candidates"
67
+
"Error generating jit rewritten candidate"
68
+
"cli-optimize-error-caught"
69
+
"cli-optimize-error-response"
70
+
```
71
+
2. Check `_get_valid_candidates()` in `api/aiservice.py` — empty `code_strings` after `CodeStringsMarkdown.parse_markdown_code()` means the LLM returned malformed code blocks
72
+
3. Verify API key is valid (`get_codeflash_api_key()`)
51
73
52
-
**If no candidates returned**: Check API key, network connectivity, and service status.
74
+
**Checkpoint**: If no candidates returned, check API key, network, and service status before proceeding.
53
75
54
76
## Step 5: Check Test Failures
55
77
56
78
Determine if candidates failed behavioral or benchmark tests.
57
79
58
-
1.**Behavioral failures**: Compare return values, stdout, pass/fail status between original baseline and candidate
2.`normalize_code()` from `code_utils/deduplicate_code.py`strips comments/whitespace and normalizes the AST
99
+
3. If all candidates normalize to identical code, only the first is tested — the rest copy its results
74
100
75
-
**If all duplicates**: The LLM generated the same optimization multiple times. Try higher effort level.
101
+
**Checkpoint**: If all duplicates, the LLM generated the same optimization repeatedly. Try a higher effort level for more diverse candidates.
76
102
77
103
## Step 7: Check Repair/Refinement
78
104
79
105
If initial candidates failed, check repair and refinement stages.
80
106
81
-
1. Repair only runs if fewer than `MIN_CORRECT_CANDIDATES=2` passed
82
-
2. Repair sends `AIServiceCodeRepairRequest` with test diffs
83
-
3. Check `REPAIR_UNMATCHED_PERCENTAGE_LIMIT` — if too many tests failed, repair is skipped
84
-
4. Refinement only runs on top valid candidates
107
+
1. Repair only triggers if fewer than `MIN_CORRECT_CANDIDATES=2` passed behavioral tests
108
+
2. Repair sends `AIServiceCodeRepairRequest` with `TestDiff` objects showing what went wrong
109
+
3. Check `REPAIR_UNMATCHED_PERCENTAGE_LIMIT`(effort-dependent: 0.2/0.3/0.4) — if too many tests failed, repair is skipped entirely
110
+
4. Refinement only runs on the top valid candidates (count depends on effort level)
85
111
86
-
**If repair also failed**: The optimization approach may not work for this function.
112
+
**Checkpoint**: If repair also fails, the optimization approach likely doesn't work for this function. The function may rely on side effects or external state that the LLM can't safely optimize.
87
113
88
-
## Key Files to Check
114
+
## Key Files Reference
89
115
90
-
-`optimization/function_optimizer.py` — Main optimization loop, `determine_best_candidate()`
0 commit comments