refactor: move context extraction to languages/python/context/ by KRRT7 · Pull Request #1498 · codeflash-ai/codeflash

KRRT7 · 2026-02-16T19:51:51Z

Summary

Move code_context_extractor.py and unused_definition_remover.py from codeflash/context/ to codeflash/languages/python/context/
Update all import sites across production code and tests
Remove the now-empty codeflash/context/ package

These modules are Python-specific (Jedi, ast, libcst) and belong under languages/python/. There was duplicate code in languages/python/ attempting to replicate this functionality with incomplete / no feature parity — this move consolidates to the canonical implementation.

Test plan

pytest tests/test_code_context_extractor.py tests/test_get_read_writable_code.py tests/test_get_read_only_code.py tests/test_get_testgen_code.py tests/test_remove_unused_definitions.py tests/test_unused_helper_revert.py tests/test_get_helper_code.py — 173 passed
pytest tests/test_languages/ — 731 passed
pytest tests/test_code_replacement.py tests/test_function_dependencies.py — 66 passed

Consolidate three enricher functions (get_imported_class_definitions, get_external_base_class_inits, get_external_class_inits) into a single enrich_testgen_context that parses code context once. Extract shared helpers, unify prune_cst variants, deduplicate loop bodies, and remove dead UsedNameCollector class.

Move code_context_extractor.py and unused_definition_remover.py from codeflash/context/ to codeflash/languages/python/context/ and update all import sites.

Replace duplicate implementations in extract_code_context() and find_helper_functions() with calls to get_code_optimization_context() and get_function_sources_from_jedi() from the canonical context module.

claude · 2026-02-16T20:09:53Z

PR Review Summary

Prek Checks

✅ Passed — One formatting issue in unused_definition_remover.py was auto-fixed and pushed (commit 633acce4).

Mypy: 191 errors across 9 files, but these are pre-existing issues (missing generic type parameters, complex union-attr patterns in function_optimizer.py). No new mypy errors were introduced by this PR.

Code Review

This is a re-review after new commits (707703ca, 633acce4). The progressive fallback issue from the prior review has been fixed.

Remaining questions (not blockers):

MAX_TRANSITIVE_DEPTH increased from 2 → 5 (code_context_extractor.py:813): The BFS for transitive type dependencies now traverses 5 levels deep. Was this motivated by specific cases where depth=2 was insufficient? (existing comment still open)
Token limits tripled (config_consts.py): OPTIMIZATION_CONTEXT_TOKEN_LIMIT and TESTGEN_CONTEXT_TOKEN_LIMIT changed from 16,000 → 48,000. This is a 3x increase in context sent to the AI service. Was this benchmarked for quality/cost tradeoffs?
Default language changed (current.py:34): _current_language changed from None to Language.PYTHON. This removes the ability to detect when language hasn't been set yet. JS/TS pipelines should call set_current_language() before context extraction runs.

No critical bugs, security issues, or breaking API changes found. All callers of the refactored build_testgen_context() already use keyword arguments. The file moves and import updates are clean.

Test Coverage

Overall: 78.6% (PR) vs 78.7% (main) — no significant regression

File	PR	Main	Delta
`config_consts.py`	87.9%	87.9%	+0.0%
`base.py`	99.1%	99.1%	+0.0%
`current.py`	94.7%	94.7%	+0.0%
`javascript/support.py`	73.9%	73.9%	+0.1%
`python/context/__init__.py`	100.0%	100.0%	+0.0%
`code_context_extractor.py`	91.3%	92.4%	-1.1%
`unused_definition_remover.py`	94.1%	91.0%	+3.1%
`python/support.py`	51.4%	54.2%	-2.7%
`function_optimizer.py`	18.4%	18.4%	+0.0%

unused_definition_remover.py improved by +3.1%
code_context_extractor.py dropped slightly by -1.1% (some moved code paths not fully exercised)
python/support.py dropped by -2.7% (likely due to deduplication moving code paths)
Pre-existing test failures in test_tracer.py (8 failures) — unrelated to this PR, present on main as well

Last updated: 2026-02-16T22:20Z

Update stale context/ paths in mypy_allowlist.txt to match the languages/python/context/ move. Add assert to narrow BaseSuite to IndentedBlock in prune_cst for mypy.

Re-add graceful degradation when context exceeds token limits instead of raising ValueError immediately. Read-only context falls back to removing docstrings then removing entirely. Testgen context falls back to removing docstrings then removing enrichment before raising.

The optimization achieves a **68% runtime improvement** (23.5ms → 14.0ms) by replacing the expensive `ast.walk()` traversal with a targeted recursive collection strategy. **Key Performance Improvement:** The original code uses `ast.walk(tree)` which visits **every single node** in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never contain `ImportFrom` statements. The optimized version implements a custom `collect_imports()` function that: 1. **Only traverses module body and control flow structures** where imports can legally appear (function/class definitions, if/while/for blocks, try/except) 2. **Skips irrelevant AST nodes** like expressions, literals, and operators entirely 3. **Recursively processes nested bodies** (body, orelse, finalbody, handlers) in a depth-first manner **Why This Works:** In Python, `from X import Y` statements can only appear: - At module level - Inside function/class definitions - Within control flow blocks (if/while/for/try) By checking `isinstance()` for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness. **Test Case Performance:** The optimization excels across all scales: - **Small imports** (single statements): 60-77% faster - **Large import lists** (100-500 items): 74-104% faster - **Many code blocks** (500-1000 lines): 70-77% faster - **Mixed code/imports** at scale: 70% faster The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the `test_mixed_imports_and_code_large_scale` case improving from 9.31ms to 5.45ms (70.8% faster). **Impact on Workloads:** Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.

codeflash-ai · 2026-02-16T20:49:42Z

⚡️ Codeflash found optimizations for this PR

📄 69% (0.69x) speedup for `_parse_and_collect_imports` in `codeflash/languages/python/context/code_context_extractor.py`

⏱️ Runtime : 23.5 milliseconds → 14.0 milliseconds (best of 30 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction) #1499

If you approve, it will be merged into this PR (branch cf-simplify-context-extraction).

The optimized code achieves a **350% speedup** (2.36ms → 523μs) by replacing the generic `ast.walk()` traversal with a targeted stack-based iteration that only visits nodes where class definitions can appear. **Key Performance Improvement:** The original implementation uses `ast.walk(tree)`, which performs an exhaustive depth-first traversal of **every single node** in the AST—including expressions, literals, operators, and other leaf nodes that can never contain class definitions. For a typical Python module, this means checking thousands of irrelevant nodes. The optimized version uses a stack-based approach that only descends into structural nodes (ClassDef, FunctionDef, If, For, While, With, Try blocks) where classes can actually be defined. This dramatically reduces the number of nodes visited and `isinstance()` checks performed. **Why This Matters:** From the test results, we see consistent 200-700% speedups across all scenarios: - Empty modules: 579% faster (5.37μs → 791ns) - minimal traversal overhead - Simple cases: 200-400% faster - fewer nodes to check - Complex nested structures: 405% faster (37.2μs → 7.37μs) - targeted descent pays off - Large modules (500 classes): 280% faster (869μs → 228μs) - scales better - Mixed workloads: 558% faster (799μs → 121μs) - avoids non-class nodes **Impact on Workloads:** Based on the function references showing this is called from `build_testgen_context`, this optimization benefits test generation workflows that analyze Python code structure. Since class extraction is likely performed repeatedly during code analysis, the 4x speedup directly improves overall test generation throughput. The optimization is particularly effective for large codebases with many classes and complex nesting patterns, as demonstrated by the benchmark results.

codeflash-ai · 2026-02-16T20:53:49Z

⚡️ Codeflash found optimizations for this PR

📄 351% (3.51x) speedup for `collect_existing_class_names` in `codeflash/languages/python/context/code_context_extractor.py`

⏱️ Runtime : 2.36 milliseconds → 523 microseconds (best of 17 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function collect_existing_class_names by 351% in PR #1498 (cf-simplify-context-extraction) #1500

If you approve, it will be merged into this PR (branch cf-simplify-context-extraction).

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

…2026-02-16T20.53.40 ⚡️ Speed up function `collect_existing_class_names` by 351% in PR #1498 (`cf-simplify-context-extraction`)

codeflash-ai · 2026-02-16T20:59:34Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function collect_existing_class_names by 351% in PR #1498 (cf-simplify-context-extraction) #1500

The optimized collect_imports missed match/case statements where imports can legally appear. Add hasattr-guarded handling for ast.Match nodes. Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

…2026-02-16T20.49.33 ⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`)

codeflash-ai · 2026-02-16T21:03:24Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction) #1499

Extract shared helpers and remove dead code across the language support area: - Extract `is_assignment_used()` and move `recurse_sections` to unused_definition_remover.py, replacing duplicated logic in both context files - Extract `function_sources_to_helpers()` in support.py to unify identical HelperFunction construction - Remove dead `get_comment_prefix()` method from protocol and all implementations (comment_prefix property serves all callers)

KRRT7 added 3 commits February 16, 2026 13:34

refactor: move context extraction modules to languages/python/context/

547c02e

Move code_context_extractor.py and unused_definition_remover.py from codeflash/context/ to codeflash/languages/python/context/ and update all import sites.

refactor: delegate PythonSupport context methods to canonical pipeline

b1ec824

Replace duplicate implementations in extract_code_context() and find_helper_functions() with calls to get_code_optimization_context() and get_function_sources_from_jedi() from the canonical context module.

claude Bot reviewed Feb 16, 2026

View reviewed changes

Comment thread codeflash/languages/python/context/code_context_extractor.py

claude Bot reviewed Feb 16, 2026

View reviewed changes

Comment thread codeflash/languages/python/context/code_context_extractor.py

KRRT7 and others added 3 commits February 16, 2026 15:10

fix: update mypy allowlist paths and fix BaseSuite type narrowing

8566cf0

Update stale context/ paths in mypy_allowlist.txt to match the languages/python/context/ move. Add assert to narrow BaseSuite to IndentedBlock in prune_cst for mypy.

codeflash-ai Bot mentioned this pull request Feb 16, 2026

⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction) #1499

Merged

github-actions Bot and others added 3 commits February 16, 2026 20:51

style: auto-fix linting issues

73e71d0

fix: resolve mypy type errors in collect_imports

29c0a66

codeflash-ai Bot mentioned this pull request Feb 16, 2026

⚡️ Speed up function collect_existing_class_names by 351% in PR #1498 (cf-simplify-context-extraction) #1500

Merged

github-actions Bot and others added 3 commits February 16, 2026 20:55

style: auto-fix linting issues

69d3268

Update codeflash/languages/python/context/code_context_extractor.py

ea14b2f

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Merge pull request #1500 from codeflash-ai/codeflash/optimize-pr1498-…

cc77394

…2026-02-16T20.53.40 ⚡️ Speed up function `collect_existing_class_names` by 351% in PR #1498 (`cf-simplify-context-extraction`)

github-actions Bot and others added 2 commits February 16, 2026 21:02

fix: handle ast.Match (Python 3.10+) in collect_imports traversal

bfa55cb

The optimized collect_imports missed match/case statements where imports can legally appear. Add hasattr-guarded handling for ast.Match nodes. Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

Merge pull request #1499 from codeflash-ai/codeflash/optimize-pr1498-…

82b4002

…2026-02-16T20.49.33 ⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`)

KRRT7 and others added 2 commits February 16, 2026 16:55

style: auto-fix linting issues

633acce

KRRT7 merged commit 805d946 into main Feb 17, 2026
25 of 27 checks passed

KRRT7 deleted the cf-simplify-context-extraction branch February 17, 2026 03:41

KRRT7 mentioned this pull request Feb 20, 2026

chore: sync main into omni-java (batch 3/4) #1558

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move context extraction to languages/python/context/#1498

refactor: move context extraction to languages/python/context/#1498
KRRT7 merged 16 commits into
mainfrom
cf-simplify-context-extraction

KRRT7 commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`) #1499

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

⚡️ Speed up function `collect_existing_class_names` by 351% in PR #1498 (`cf-simplify-context-extraction`) #1500

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Feb 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Test Coverage

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

⚡️ Codeflash found optimizations for this PR

📄 69% (0.69x) speedup for _parse_and_collect_imports in codeflash/languages/python/context/code_context_extractor.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction) #1499

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

⚡️ Codeflash found optimizations for this PR

📄 351% (3.51x) speedup for collect_existing_class_names in codeflash/languages/python/context/code_context_extractor.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function collect_existing_class_names by 351% in PR #1498 (cf-simplify-context-extraction) #1500

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

Uh oh!

codeflash-ai Bot commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Feb 16, 2026 •

edited

Loading

📄 69% (0.69x) speedup for `_parse_and_collect_imports` in `codeflash/languages/python/context/code_context_extractor.py`

⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`) #1499

📄 351% (3.51x) speedup for `collect_existing_class_names` in `codeflash/languages/python/context/code_context_extractor.py`

⚡️ Speed up function `collect_existing_class_names` by 351% in PR #1498 (`cf-simplify-context-extraction`) #1500