Skip to content

fix(git-integration): optimize commit range to exclude previously processed commits [CM-724]#3524

Merged
mbani01 merged 2 commits into
mainfrom
fix/optimize_commit_range_for_batches
Oct 16, 2025
Merged

fix(git-integration): optimize commit range to exclude previously processed commits [CM-724]#3524
mbani01 merged 2 commits into
mainfrom
fix/optimize_commit_range_for_batches

Conversation

@mbani01
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 commented Oct 16, 2025

This pull request enhances the commit processing logic by optimizing how commit ranges are determined when processing batches. The main improvement is the introduction of a mechanism to skip already-processed commits by leveraging the last successfully processed commit, which helps avoid redundant work and improves efficiency.

Key improvements to commit range calculation:

  • Added a new method _get_optimized_commit_range to dynamically determine the commit range, using last_processed_commit to skip previously processed commits when possible.
  • Modified _execute_git_log to use the optimized commit range by calling _get_optimized_commit_range, and updated its signature to accept last_processed_commit as an argument. [1] [2]
  • Updated process_single_batch_commits to pass repository.last_processed_commit to the relevant methods, enabling the optimization.

@mbani01 mbani01 self-assigned this Oct 16, 2025
@mbani01 mbani01 requested a review from Copilot October 16, 2025 17:26
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Optimizes commit processing by introducing logic to skip already-processed commits based on last_processed_commit. The changes add an optimization method and extend the git log execution path to use it.

  • Added _get_optimized_commit_range to compute a potentially shorter commit range.
  • Updated _execute_git_log to accept last_processed_commit and use the optimization.
  • Passed repository.last_processed_commit into the batch processing flow.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


if last_processed_commit and last_processed_commit != edge_commit:
try:
self.logger.info("Checking last processed commit existance in current batch")
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'existance' to 'existence'.

Suggested change
self.logger.info("Checking last processed commit existance in current batch")
self.logger.info("Checking last processed commit existence in current batch")

Copilot uses AI. Check for mistakes.
Returns:
Git commit range string (e.g., "commit_a..commit_b")
"""

Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prev_batch_edge_commit is None this constructs an invalid range like 'abc123..None'. Add a guard to return early (e.g., empty string) or raise before building the range when prev_batch_edge_commit is falsy.

Suggested change
if not prev_batch_edge_commit:
return ""

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +286
self.logger.info("Checking last processed commit existance in current batch")
await run_shell_command(
["git", "cat-file", "-e", last_processed_commit], cwd=repo_path
)
self.logger.info("Found! using optimized range")
default_commit_range = f"{last_processed_commit}..{prev_batch_edge_commit}"
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git cat-file -e only verifies the object exists anywhere in the repo, not that it lies within the intended batch range. This can incorrectly skip commits if last_processed_commit is outside edge_commit..prev_batch_edge_commit. Replace the existence check with a membership test such as: first verify ancestry (git merge-base --is-ancestor last_processed_commit prev_batch_edge_commit) and ensure last_processed_commit is not reachable from edge_commit (e.g., git merge-base --is-ancestor last_processed_commit edge_commit and invert) or enumerate the range with git rev-list edge_commit..prev_batch_edge_commit and search for last_processed_commit.

Suggested change
self.logger.info("Checking last processed commit existance in current batch")
await run_shell_command(
["git", "cat-file", "-e", last_processed_commit], cwd=repo_path
)
self.logger.info("Found! using optimized range")
default_commit_range = f"{last_processed_commit}..{prev_batch_edge_commit}"
self.logger.info("Checking if last processed commit is within current batch range")
# Check if last_processed_commit is ancestor of prev_batch_edge_commit
await run_shell_command(
["git", "merge-base", "--is-ancestor", last_processed_commit, prev_batch_edge_commit], cwd=repo_path
)
# Check that last_processed_commit is NOT ancestor of edge_commit
try:
await run_shell_command(
["git", "merge-base", "--is-ancestor", last_processed_commit, edge_commit], cwd=repo_path
)
# If this succeeds, last_processed_commit is reachable from edge_commit, so not in range
self.logger.info("last processed commit is reachable from edge_commit; not using optimized range")
except Exception:
# If this fails, last_processed_commit is NOT ancestor of edge_commit, so it's in the range
self.logger.info("Found! using optimized range")
default_commit_range = f"{last_processed_commit}..{prev_batch_edge_commit}"

Copilot uses AI. Check for mistakes.
@mbani01 mbani01 changed the title fix(git-integration): optimize commit range to exclude previously processed commits fix(git-integration): optimize commit range to exclude previously processed commits [CM-724] Oct 16, 2025
@mbani01 mbani01 merged commit 607d460 into main Oct 16, 2025
13 checks passed
@mbani01 mbani01 deleted the fix/optimize_commit_range_for_batches branch October 16, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants