fix(git-integration): optimize commit range to exclude previously processed commits [CM-724]#3524
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull Request Overview
Optimizes commit processing by introducing logic to skip already-processed commits based on last_processed_commit. The changes add an optimization method and extend the git log execution path to use it.
- Added _get_optimized_commit_range to compute a potentially shorter commit range.
- Updated _execute_git_log to accept last_processed_commit and use the optimization.
- Passed repository.last_processed_commit into the batch processing flow.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| if last_processed_commit and last_processed_commit != edge_commit: | ||
| try: | ||
| self.logger.info("Checking last processed commit existance in current batch") |
There was a problem hiding this comment.
Corrected spelling of 'existance' to 'existence'.
| self.logger.info("Checking last processed commit existance in current batch") | |
| self.logger.info("Checking last processed commit existence in current batch") |
| Returns: | ||
| Git commit range string (e.g., "commit_a..commit_b") | ||
| """ | ||
|
|
There was a problem hiding this comment.
If prev_batch_edge_commit is None this constructs an invalid range like 'abc123..None'. Add a guard to return early (e.g., empty string) or raise before building the range when prev_batch_edge_commit is falsy.
| if not prev_batch_edge_commit: | |
| return "" |
| self.logger.info("Checking last processed commit existance in current batch") | ||
| await run_shell_command( | ||
| ["git", "cat-file", "-e", last_processed_commit], cwd=repo_path | ||
| ) | ||
| self.logger.info("Found! using optimized range") | ||
| default_commit_range = f"{last_processed_commit}..{prev_batch_edge_commit}" |
There was a problem hiding this comment.
git cat-file -e only verifies the object exists anywhere in the repo, not that it lies within the intended batch range. This can incorrectly skip commits if last_processed_commit is outside edge_commit..prev_batch_edge_commit. Replace the existence check with a membership test such as: first verify ancestry (git merge-base --is-ancestor last_processed_commit prev_batch_edge_commit) and ensure last_processed_commit is not reachable from edge_commit (e.g., git merge-base --is-ancestor last_processed_commit edge_commit and invert) or enumerate the range with git rev-list edge_commit..prev_batch_edge_commit and search for last_processed_commit.
| self.logger.info("Checking last processed commit existance in current batch") | |
| await run_shell_command( | |
| ["git", "cat-file", "-e", last_processed_commit], cwd=repo_path | |
| ) | |
| self.logger.info("Found! using optimized range") | |
| default_commit_range = f"{last_processed_commit}..{prev_batch_edge_commit}" | |
| self.logger.info("Checking if last processed commit is within current batch range") | |
| # Check if last_processed_commit is ancestor of prev_batch_edge_commit | |
| await run_shell_command( | |
| ["git", "merge-base", "--is-ancestor", last_processed_commit, prev_batch_edge_commit], cwd=repo_path | |
| ) | |
| # Check that last_processed_commit is NOT ancestor of edge_commit | |
| try: | |
| await run_shell_command( | |
| ["git", "merge-base", "--is-ancestor", last_processed_commit, edge_commit], cwd=repo_path | |
| ) | |
| # If this succeeds, last_processed_commit is reachable from edge_commit, so not in range | |
| self.logger.info("last processed commit is reachable from edge_commit; not using optimized range") | |
| except Exception: | |
| # If this fails, last_processed_commit is NOT ancestor of edge_commit, so it's in the range | |
| self.logger.info("Found! using optimized range") | |
| default_commit_range = f"{last_processed_commit}..{prev_batch_edge_commit}" |
This pull request enhances the commit processing logic by optimizing how commit ranges are determined when processing batches. The main improvement is the introduction of a mechanism to skip already-processed commits by leveraging the last successfully processed commit, which helps avoid redundant work and improves efficiency.
Key improvements to commit range calculation:
_get_optimized_commit_rangeto dynamically determine the commit range, usinglast_processed_committo skip previously processed commits when possible._execute_git_logto use the optimized commit range by calling_get_optimized_commit_range, and updated its signature to acceptlast_processed_commitas an argument. [1] [2]process_single_batch_commitsto passrepository.last_processed_committo the relevant methods, enabling the optimization.