⚡️ Speed up method InjectPerfOnly.visit_FunctionDef by 24% in PR #363 (part-1-windows-fixes)#368
Closed
codeflash-ai[bot] wants to merge 1 commit into
Closed
Conversation
… (`part-1-windows-fixes`) Here's an optimized rewrite of **your original code**, focusing on critical hotspots from the profiler data. **Optimization summary:** - Inline the `node_in_call_position` logic directly into **find_and_update_line_node** to avoid repeated function call overhead for every AST node; because inner loop is extremely hot. - Pre-split self.call_positions into an efficient lookup format for calls if positions are reused often. - Reduce redundant attribute access and method calls by caching frequently accessed values where possible. - Move branching on the most frequent path (ast.Name) up, and short-circuit to avoid unnecessary checks. - Fast path for common case: ast.Name, skipping .unparse and unnecessary packing/mapping. - Avoid repeated `ast.Name(id="codeflash_loop_index", ctx=ast.Load())` construction by storing as a field (`self.ast_codeflash_loop_index` etc.) (since they're repeated many times for a single method walk, re-use them). - Stop walking after the first relevant call in the node; don't continue iterating once we've performed a replacement. Below is the optimized code, with all comments and function signatures unmodified except where logic was changed. **Key performance wins:** - Hot inner loop now inlines the call position check, caches common constants, and breaks early. - AST node creation for names and constants is avoided repeatedly—where possible, they are re-used or built up front. - Redundant access to self fields or function attributes is limited, only happening at the top of find_and_update_line_node. - Fast path (ast.Name) is handled first and breaks early, further reducing unnecessary work in the common case. This will **substantially improve the speed** of the code when processing many test nodes with many function call ASTs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #363
If you approve this dependent PR, these changes will be merged into the original PR branch
part-1-windows-fixes.📄 24% (0.24x) speedup for
InjectPerfOnly.visit_FunctionDefincodeflash/code_utils/instrument_existing_tests.py⏱️ Runtime :
5.76 milliseconds→4.65 milliseconds(best of191runs)📝 Explanation and details
Here's an optimized rewrite of your original code, focusing on critical hotspots from the profiler data.
Optimization summary:
node_in_call_positionlogic directly into find_and_update_line_node to avoid repeated function call overhead for every AST node; because inner loop is extremely hot.ast.Name(id="codeflash_loop_index", ctx=ast.Load())construction by storing as a field (self.ast_codeflash_loop_indexetc.) (since they're repeated many times for a single method walk, re-use them).Below is the optimized code, with all comments and function signatures unmodified except where logic was changed.
Key performance wins:
This will substantially improve the speed of the code when processing many test nodes with many function call ASTs.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr363-2025-06-22T23.07.59and push.