Skip to content

Commit d578d99

Browse files
authored
Merge pull request #1494 from codeflash-ai/add-private-tessl-tiles
feat: add private tessl tiles for rules, docs, and skills
2 parents 90601c3 + 869fbe1 commit d578d99

60 files changed

Lines changed: 1892 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.codex/skills/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Managed by Tessl
2+
tessl:*

.gemini/skills/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Managed by Tessl
2+
tessl:*

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,5 @@ Discovery → Ranking → Context Extraction → Test Gen + Optimization → Bas
3333
# Agent Rules <!-- tessl-managed -->
3434

3535
@.tessl/RULES.md follow the [instructions](.tessl/RULES.md)
36+
37+
@AGENTS.md

tessl.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,18 @@
6363
},
6464
"tessl/pypi-filelock": {
6565
"version": "3.19.0"
66+
},
67+
"codeflash/codeflash-rules": {
68+
"version": "0.1.0"
69+
},
70+
"codeflash/codeflash-docs": {
71+
"version": "0.1.0"
72+
},
73+
"codeflash/codeflash-skills": {
74+
"version": "0.2.0"
75+
},
76+
"tessl-labs/tessl-skill-eval-scenarios": {
77+
"version": "0.0.5"
6678
}
6779
}
6880
}
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# AI Service
2+
3+
How codeflash communicates with the AI optimization backend.
4+
5+
## `AiServiceClient` (`api/aiservice.py`)
6+
7+
The client connects to the AI service at `https://app.codeflash.ai` (or `http://localhost:8000` when `CODEFLASH_AIS_SERVER=local`).
8+
9+
Authentication uses Bearer token from `get_codeflash_api_key()`. All requests go through `make_ai_service_request()` which handles JSON serialization via Pydantic encoder.
10+
11+
Timeout: 90s for production, 300s for local.
12+
13+
## Endpoints
14+
15+
### `/ai/optimize` — Generate Candidates
16+
17+
Method: `optimize_code()`
18+
19+
Sends source code + dependency context to generate optimization candidates.
20+
21+
Payload:
22+
- `source_code` — The read-writable code (markdown format)
23+
- `dependency_code` — Read-only context code
24+
- `trace_id` — Unique trace ID for the optimization run
25+
- `language``"python"`, `"javascript"`, or `"typescript"`
26+
- `n_candidates` — Number of candidates to generate (controlled by effort level)
27+
- `is_async` — Whether the function is async
28+
- `is_numerical_code` — Whether the code is numerical (affects optimization strategy)
29+
30+
Returns: `list[OptimizedCandidate]` with `source=OptimizedCandidateSource.OPTIMIZE`
31+
32+
### `/ai/optimize_line_profiler` — Line-Profiler-Guided Candidates
33+
34+
Method: `optimize_python_code_line_profiler()`
35+
36+
Like `/optimize` but includes `line_profiler_results` to guide the LLM toward hot lines.
37+
38+
Returns: candidates with `source=OptimizedCandidateSource.OPTIMIZE_LP`
39+
40+
### `/ai/refine` — Refine Existing Candidate
41+
42+
Method: `refine_code()`
43+
44+
Request type: `AIServiceRefinerRequest`
45+
46+
Sends an existing candidate with runtime data and line profiler results to generate an improved version.
47+
48+
Key fields:
49+
- `original_source_code` / `optimized_source_code` — Before and after
50+
- `original_code_runtime` / `optimized_code_runtime` — Timing data
51+
- `speedup` — Current speedup ratio
52+
- `original_line_profiler_results` / `optimized_line_profiler_results`
53+
54+
Returns: candidates with `source=OptimizedCandidateSource.REFINE` and `parent_id` set to the refined candidate's ID
55+
56+
### `/ai/repair` — Fix Failed Candidate
57+
58+
Method: `repair_code()`
59+
60+
Request type: `AIServiceCodeRepairRequest`
61+
62+
Sends a failed candidate with test diffs showing what went wrong.
63+
64+
Key fields:
65+
- `original_source_code` / `modified_source_code`
66+
- `test_diffs: list[TestDiff]` — Each with `scope` (return_value/stdout/did_pass), original vs candidate values, and test source code
67+
68+
Returns: candidates with `source=OptimizedCandidateSource.REPAIR` and `parent_id` set
69+
70+
### `/ai/adaptive_optimize` — Multi-Candidate Adaptive
71+
72+
Method: `adaptive_optimize()`
73+
74+
Request type: `AIServiceAdaptiveOptimizeRequest`
75+
76+
Sends multiple previous candidates with their speedups for the LLM to learn from and generate better candidates.
77+
78+
Key fields:
79+
- `candidates: list[AdaptiveOptimizedCandidate]` — Previous candidates with source code, explanation, source type, and speedup
80+
81+
Returns: candidates with `source=OptimizedCandidateSource.ADAPTIVE`
82+
83+
### `/ai/rewrite_jit` — JIT Rewrite
84+
85+
Method: `get_jit_rewritten_code()`
86+
87+
Rewrites code to use JIT compilation (e.g., Numba).
88+
89+
Returns: candidates with `source=OptimizedCandidateSource.JIT_REWRITE`
90+
91+
## Candidate Parsing
92+
93+
All endpoints return JSON with an `optimizations` array. Each entry has:
94+
- `source_code` — Markdown-formatted code blocks
95+
- `explanation` — LLM explanation
96+
- `optimization_id` — Unique ID
97+
- `parent_id` — Optional parent reference
98+
- `model` — Which LLM model was used
99+
100+
`_get_valid_candidates()` parses the markdown code via `CodeStringsMarkdown.parse_markdown_code()` and filters out entries with empty code blocks.
101+
102+
## `LocalAiServiceClient`
103+
104+
Used when `CODEFLASH_EXPERIMENT_ID` is set. Mirrors `AiServiceClient` but sends to a separate experimental endpoint for A/B testing optimization strategies.
105+
106+
## LLM Call Sequencing
107+
108+
`AiServiceClient` tracks call sequence via `llm_call_counter` (itertools.count). Each request includes a `call_sequence` number, used by the backend to maintain conversation context across multiple calls for the same function.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Configuration
2+
3+
Key configuration constants, effort levels, and thresholds.
4+
5+
## Constants (`code_utils/config_consts.py`)
6+
7+
### Test Execution
8+
9+
| Constant | Value | Description |
10+
|----------|-------|-------------|
11+
| `MAX_TEST_RUN_ITERATIONS` | 5 | Maximum test loop iterations |
12+
| `INDIVIDUAL_TESTCASE_TIMEOUT` | 15s | Timeout per individual test case |
13+
| `MAX_FUNCTION_TEST_SECONDS` | 60s | Max total time for function testing |
14+
| `MAX_TEST_FUNCTION_RUNS` | 50 | Max test function executions |
15+
| `MAX_CUMULATIVE_TEST_RUNTIME_NANOSECONDS` | 100ms | Max cumulative test runtime |
16+
| `TOTAL_LOOPING_TIME` | 10s | Candidate benchmarking budget |
17+
| `MIN_TESTCASE_PASSED_THRESHOLD` | 6 | Minimum test cases that must pass |
18+
19+
### Performance Thresholds
20+
21+
| Constant | Value | Description |
22+
|----------|-------|-------------|
23+
| `MIN_IMPROVEMENT_THRESHOLD` | 0.05 (5%) | Minimum speedup to accept a candidate |
24+
| `MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD` | 0.10 (10%) | Minimum async throughput improvement |
25+
| `MIN_CONCURRENCY_IMPROVEMENT_THRESHOLD` | 0.20 (20%) | Minimum concurrency ratio improvement |
26+
| `COVERAGE_THRESHOLD` | 60.0% | Minimum test coverage |
27+
28+
### Stability Thresholds
29+
30+
| Constant | Value | Description |
31+
|----------|-------|-------------|
32+
| `STABILITY_WINDOW_SIZE` | 0.35 | 35% of total iteration window |
33+
| `STABILITY_CENTER_TOLERANCE` | 0.0025 | ±0.25% around median |
34+
| `STABILITY_SPREAD_TOLERANCE` | 0.0025 | 0.25% window spread |
35+
36+
### Context Limits
37+
38+
| Constant | Value | Description |
39+
|----------|-------|-------------|
40+
| `OPTIMIZATION_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for optimization context |
41+
| `TESTGEN_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for test generation context |
42+
| `MAX_CONTEXT_LEN_REVIEW` | 1000 | Max context length for optimization review |
43+
44+
### Other
45+
46+
| Constant | Value | Description |
47+
|----------|-------|-------------|
48+
| `MIN_CORRECT_CANDIDATES` | 2 | Min correct candidates before skipping repair |
49+
| `REPEAT_OPTIMIZATION_PROBABILITY` | 0.1 | Probability of re-optimizing a function |
50+
| `DEFAULT_IMPORTANCE_THRESHOLD` | 0.001 | Minimum addressable time to consider a function |
51+
| `CONCURRENCY_FACTOR` | 10 | Number of concurrent executions for concurrency benchmark |
52+
| `REFINED_CANDIDATE_RANKING_WEIGHTS` | (2, 1) | (runtime, diff) weights — runtime 2x more important |
53+
54+
## Effort Levels
55+
56+
`EffortLevel` enum: `LOW`, `MEDIUM`, `HIGH`
57+
58+
Effort controls the number of candidates, repairs, and refinements:
59+
60+
| Key | LOW | MEDIUM | HIGH |
61+
|-----|-----|--------|------|
62+
| `N_OPTIMIZER_CANDIDATES` | 3 | 5 | 6 |
63+
| `N_OPTIMIZER_LP_CANDIDATES` | 4 | 6 | 7 |
64+
| `N_GENERATED_TESTS` | 2 | 2 | 2 |
65+
| `MAX_CODE_REPAIRS_PER_TRACE` | 2 | 3 | 5 |
66+
| `REPAIR_UNMATCHED_PERCENTAGE_LIMIT` | 0.2 | 0.3 | 0.4 |
67+
| `TOP_VALID_CANDIDATES_FOR_REFINEMENT` | 2 | 3 | 4 |
68+
| `ADAPTIVE_OPTIMIZATION_THRESHOLD` | 0 | 0 | 2 |
69+
| `MAX_ADAPTIVE_OPTIMIZATIONS_PER_TRACE` | 0 | 0 | 4 |
70+
71+
Use `get_effort_value(EffortKeys.KEY, effort_level)` to retrieve values.
72+
73+
## Project Configuration
74+
75+
Configuration is read from `pyproject.toml` under `[tool.codeflash]`. Key settings are auto-detected by `setup/detector.py`:
76+
- `module-root` — Root of the module to optimize
77+
- `tests-root` — Root of test files
78+
- `test-framework` — pytest, unittest, jest, etc.
79+
- `formatter-cmds` — Code formatting commands
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Context Extraction
2+
3+
How codeflash extracts and limits code context for optimization and test generation.
4+
5+
## Overview
6+
7+
Context extraction (`context/code_context_extractor.py`) builds a `CodeOptimizationContext` containing all code needed for the LLM to understand and optimize a function, split into:
8+
9+
- **Read-writable code** (`CodeContextType.READ_WRITABLE`): The function being optimized plus its helper functions — code the LLM is allowed to modify
10+
- **Read-only context** (`CodeContextType.READ_ONLY`): Dependency code for reference — imports, type definitions, base classes
11+
- **Testgen context** (`CodeContextType.TESTGEN`): Context for test generation, may include imported class definitions and external base class inits
12+
- **Hashing context** (`CodeContextType.HASHING`): Used for deduplication of optimization runs
13+
14+
## Token Limits
15+
16+
Both optimization and test generation contexts are token-limited:
17+
- `OPTIMIZATION_CONTEXT_TOKEN_LIMIT = 16000` tokens
18+
- `TESTGEN_CONTEXT_TOKEN_LIMIT = 16000` tokens
19+
20+
Token counting uses `encoded_tokens_len()` from `code_utils/code_utils.py`. Functions whose context exceeds these limits are skipped.
21+
22+
## Context Building Process
23+
24+
### 1. Helper Discovery
25+
26+
For the target function (`FunctionToOptimize`), the extractor finds:
27+
- **Helpers of the function**: Functions/classes in the same file that the target function calls
28+
- **Helpers of helpers**: Transitive dependencies of the helper functions
29+
30+
These are organized as `dict[Path, set[FunctionSource]]` — mapping file paths to the set of helper functions found in each file.
31+
32+
### 2. Code Extraction
33+
34+
`extract_code_markdown_context_from_files()` builds `CodeStringsMarkdown` from the helper dictionaries. Each file's relevant code is extracted as a `CodeString` with its file path.
35+
36+
### 3. Testgen Context Enrichment
37+
38+
`build_testgen_context()` extends the basic context with:
39+
- Imported class definitions (resolved from imports)
40+
- External base class `__init__` methods
41+
- External class `__init__` methods referenced in the context
42+
43+
### 4. Unused Definition Removal
44+
45+
`detect_unused_helper_functions()` and `remove_unused_definitions_by_function_names()` from `context/unused_definition_remover.py` prune definitions that are not transitively reachable from the target function, reducing token usage.
46+
47+
### 5. Deduplication
48+
49+
The hashing context (`hashing_code_context`) generates a hash (`hashing_code_context_hash`) used to detect when the same function context has already been optimized in a previous run, avoiding redundant work.
50+
51+
## Key Functions
52+
53+
| Function | Location | Purpose |
54+
|----------|----------|---------|
55+
| `build_testgen_context()` | `context/code_context_extractor.py` | Build enriched testgen context |
56+
| `extract_code_markdown_context_from_files()` | `context/code_context_extractor.py` | Convert helper dicts to `CodeStringsMarkdown` |
57+
| `detect_unused_helper_functions()` | `context/unused_definition_remover.py` | Find unused definitions |
58+
| `remove_unused_definitions_by_function_names()` | `context/unused_definition_remover.py` | Remove unused definitions |
59+
| `collect_top_level_defs_with_usages()` | `context/unused_definition_remover.py` | Analyze definition usage |
60+
| `encoded_tokens_len()` | `code_utils/code_utils.py` | Count tokens in code |

0 commit comments

Comments
 (0)