|
| 1 | +# AI Service |
| 2 | + |
| 3 | +How codeflash communicates with the AI optimization backend. |
| 4 | + |
| 5 | +## `AiServiceClient` (`api/aiservice.py`) |
| 6 | + |
| 7 | +The client connects to the AI service at `https://app.codeflash.ai` (or `http://localhost:8000` when `CODEFLASH_AIS_SERVER=local`). |
| 8 | + |
| 9 | +Authentication uses Bearer token from `get_codeflash_api_key()`. All requests go through `make_ai_service_request()` which handles JSON serialization via Pydantic encoder. |
| 10 | + |
| 11 | +Timeout: 90s for production, 300s for local. |
| 12 | + |
| 13 | +## Endpoints |
| 14 | + |
| 15 | +### `/ai/optimize` — Generate Candidates |
| 16 | + |
| 17 | +Method: `optimize_code()` |
| 18 | + |
| 19 | +Sends source code + dependency context to generate optimization candidates. |
| 20 | + |
| 21 | +Payload: |
| 22 | +- `source_code` — The read-writable code (markdown format) |
| 23 | +- `dependency_code` — Read-only context code |
| 24 | +- `trace_id` — Unique trace ID for the optimization run |
| 25 | +- `language` — `"python"`, `"javascript"`, or `"typescript"` |
| 26 | +- `n_candidates` — Number of candidates to generate (controlled by effort level) |
| 27 | +- `is_async` — Whether the function is async |
| 28 | +- `is_numerical_code` — Whether the code is numerical (affects optimization strategy) |
| 29 | + |
| 30 | +Returns: `list[OptimizedCandidate]` with `source=OptimizedCandidateSource.OPTIMIZE` |
| 31 | + |
| 32 | +### `/ai/optimize_line_profiler` — Line-Profiler-Guided Candidates |
| 33 | + |
| 34 | +Method: `optimize_python_code_line_profiler()` |
| 35 | + |
| 36 | +Like `/optimize` but includes `line_profiler_results` to guide the LLM toward hot lines. |
| 37 | + |
| 38 | +Returns: candidates with `source=OptimizedCandidateSource.OPTIMIZE_LP` |
| 39 | + |
| 40 | +### `/ai/refine` — Refine Existing Candidate |
| 41 | + |
| 42 | +Method: `refine_code()` |
| 43 | + |
| 44 | +Request type: `AIServiceRefinerRequest` |
| 45 | + |
| 46 | +Sends an existing candidate with runtime data and line profiler results to generate an improved version. |
| 47 | + |
| 48 | +Key fields: |
| 49 | +- `original_source_code` / `optimized_source_code` — Before and after |
| 50 | +- `original_code_runtime` / `optimized_code_runtime` — Timing data |
| 51 | +- `speedup` — Current speedup ratio |
| 52 | +- `original_line_profiler_results` / `optimized_line_profiler_results` |
| 53 | + |
| 54 | +Returns: candidates with `source=OptimizedCandidateSource.REFINE` and `parent_id` set to the refined candidate's ID |
| 55 | + |
| 56 | +### `/ai/repair` — Fix Failed Candidate |
| 57 | + |
| 58 | +Method: `repair_code()` |
| 59 | + |
| 60 | +Request type: `AIServiceCodeRepairRequest` |
| 61 | + |
| 62 | +Sends a failed candidate with test diffs showing what went wrong. |
| 63 | + |
| 64 | +Key fields: |
| 65 | +- `original_source_code` / `modified_source_code` |
| 66 | +- `test_diffs: list[TestDiff]` — Each with `scope` (return_value/stdout/did_pass), original vs candidate values, and test source code |
| 67 | + |
| 68 | +Returns: candidates with `source=OptimizedCandidateSource.REPAIR` and `parent_id` set |
| 69 | + |
| 70 | +### `/ai/adaptive_optimize` — Multi-Candidate Adaptive |
| 71 | + |
| 72 | +Method: `adaptive_optimize()` |
| 73 | + |
| 74 | +Request type: `AIServiceAdaptiveOptimizeRequest` |
| 75 | + |
| 76 | +Sends multiple previous candidates with their speedups for the LLM to learn from and generate better candidates. |
| 77 | + |
| 78 | +Key fields: |
| 79 | +- `candidates: list[AdaptiveOptimizedCandidate]` — Previous candidates with source code, explanation, source type, and speedup |
| 80 | + |
| 81 | +Returns: candidates with `source=OptimizedCandidateSource.ADAPTIVE` |
| 82 | + |
| 83 | +### `/ai/rewrite_jit` — JIT Rewrite |
| 84 | + |
| 85 | +Method: `get_jit_rewritten_code()` |
| 86 | + |
| 87 | +Rewrites code to use JIT compilation (e.g., Numba). |
| 88 | + |
| 89 | +Returns: candidates with `source=OptimizedCandidateSource.JIT_REWRITE` |
| 90 | + |
| 91 | +## Candidate Parsing |
| 92 | + |
| 93 | +All endpoints return JSON with an `optimizations` array. Each entry has: |
| 94 | +- `source_code` — Markdown-formatted code blocks |
| 95 | +- `explanation` — LLM explanation |
| 96 | +- `optimization_id` — Unique ID |
| 97 | +- `parent_id` — Optional parent reference |
| 98 | +- `model` — Which LLM model was used |
| 99 | + |
| 100 | +`_get_valid_candidates()` parses the markdown code via `CodeStringsMarkdown.parse_markdown_code()` and filters out entries with empty code blocks. |
| 101 | + |
| 102 | +## `LocalAiServiceClient` |
| 103 | + |
| 104 | +Used when `CODEFLASH_EXPERIMENT_ID` is set. Mirrors `AiServiceClient` but sends to a separate experimental endpoint for A/B testing optimization strategies. |
| 105 | + |
| 106 | +## LLM Call Sequencing |
| 107 | + |
| 108 | +`AiServiceClient` tracks call sequence via `llm_call_counter` (itertools.count). Each request includes a `call_sequence` number, used by the backend to maintain conversation context across multiple calls for the same function. |
0 commit comments