Reduce PR Review API calls with GraphQL#654
Conversation
|
👋 Hello @glenn-jocher, thank you for submitting a
For more guidance, please refer to our Contributing Guide. Don't hesitate to leave a comment if you have any questions. Thank you for contributing to Ultralytics! 🚀 |
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review
Made with ❤️ by Ultralytics Actions
Overall this PR is a solid refactor that reduces REST calls by leveraging GraphQL for PR details, reviews, comments, and issue labeling. The main opportunities for improvement are around robustness and clarity: avoid potential GraphQL limits in batch_delete_review_comments, consider how the 100-label GraphQL limit might affect large repositories, reduce duplicated get_username usage in the review dismissal path, tighten the GraphQL query construction for multiple issues, and clean up a minor unused loop variable in label_fixed_issues.
💬 Posted 5 inline comments
…w-graphql # Conflicts: # actions/summarize_pr.py # actions/utils/github_utils.py
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review
Made with ❤️ by Ultralytics Actions
Overall the PR is well-aligned with the goal of reducing REST calls by introducing targeted GraphQL helpers and reusing existing payload data. The main functional concern is that label fetching at the end of first_interaction.main now runs for all event types, including PRs, which both contradicts the comment and reintroduces an extra labels REST call on PR-open events. The rest of the changes are primarily structural and performance-oriented; I’ve noted a few minor clarity and robustness improvements around duplicate username lookups, silent failures in batch deletion, and the update_pr_description signature change potentially impacting external call sites.
💬 Posted 5 inline comments
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review 3
Made with ❤️ by Ultralytics Actions
Overall the refactor to use GraphQL for PR details, reviews, and issue labeling looks solid and should reduce REST chatter while keeping behavior consistent. The main improvement to address is ensuring update_pr_description still refreshes the PR body when the cached current_body is effectively missing (empty string), so you don’t accidentally drop existing descriptions. The remaining comments are minor optimizations around duplicate lookups, batching behavior, and avoiding unnecessary label fetches.
💬 Posted 5 inline comments
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review
Made with ❤️ by Ultralytics Actions
Overall the refactor to use GraphQL for PR details, reviews, and issue labeling looks solid and aligns with the goal of reducing REST calls. The main functional issue is that get_pr_reviews_and_comments reads from a comments field that doesn’t exist in the GraphQL response, so inline review comments are never deleted. Additionally, batch_delete_review_comments only handles the first 50 comments, which may leave older bot comments behind on large PRs. There’s also a minor efficiency concern around redundant get_username calls between dismiss_previous_reviews and get_pr_reviews_and_comments. Addressing these points will restore expected behavior while keeping the new performance benefits.
💬 Posted 3 inline comments
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review 5
Made with ❤️ by Ultralytics Actions
Overall the refactor meaningfully reduces REST calls by introducing GraphQL helpers for PR details, reviews/comments, and issue labeling, and the call-site changes in first_interaction.py and summarize_pr.py look consistent with the new APIs. The main areas to improve are: (1) hardening update_pr_description against failed/non-JSON GET responses, (2) making batch_delete_review_comments handle more than 50 comments or at least log when truncation occurs, (3) removing the now-unused GITHUB_API_URL import from summarize_pr.py, and (4) simplifying dismiss_previous_reviews to avoid redundant username lookups and filtering logic. The new GraphQL helpers themselves are generally well-structured and parameterized safely.
💬 Posted 4 inline comments
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review 6
Made with ❤️ by Ultralytics Actions
Overall the refactor to use GraphQL for PR metadata, reviews, and issue labeling looks solid and should reduce API calls. The main areas to tighten are around robustness and behavior parity with the old REST flow: ensure failures in the new GraphQL-based dismissal path are surfaced (or have a fallback), and restore tolerant handling of 404s when deleting old review comments. A bit more defensive error handling in update_pr_description and clear bounds for the batched issue ID lookup will make the new code more resilient.
💬 Posted 5 inline comments
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review 6
Made with ❤️ by Ultralytics Actions
Overall, the GraphQL batching changes are well-aligned with the goal of reducing REST calls and look consistent across PR review, summary, and issue labeling flows. The main functional concern is the robustness of the GRAPHQL_PR_REVIEWS_COMMENTS query: if the author filter is not supported as written, dismissing previous reviews will quietly stop working. Additionally, the signature change to update_pr_description may break external callers, and there is a small opportunity to avoid duplicate username resolution in the review dismissal path and to centralize label-fetching logic between PR and non-PR flows.
💬 Posted 5 inline comments
UltralyticsAssistant
left a comment
There was a problem hiding this comment.
🔍 PR Review 8
Made with ❤️ by Ultralytics Actions
Overall the refactor to use GraphQL for PR details, reviews, and issue labeling looks solid and aligns with the goal of reducing API calls. The main functional concern is the update_pr_description signature change, which could break any positional third-argument callers that previously relied on max_retries. There are also a couple of minor inefficiencies where get_username() is called twice for the same operation, and some opportunities to clarify retry behavior and bounds in the new helper methods.
💬 Posted 5 inline comments
|
|
||
| def update_pr_description(self, number: int, new_summary: str, max_retries: int = 2): | ||
| """Updates PR description with summary, retrying if description is None.""" | ||
| def update_pr_description(self, number: int, new_summary: str, current_body: str | None = None): |
There was a problem hiding this comment.
💡 MEDIUM: update_pr_description changed signature from (number, new_summary, max_retries=2) to (number, new_summary, current_body=None) and hard-codes 3 attempts. Any existing callers that passed a positional third argument expecting max_retries will now treat that value as the PR body, which could cause incorrect description updates. Also, the maximum retries is now implicit and no longer configurable. It would be safer either to preserve the old parameter (e.g. add max_retries as a keyword-only argument) or to ensure all internal callers use keyword arguments so this change cannot break external usages.
| def dismiss_previous_reviews(event: Action) -> int: | ||
| """Dismiss previous bot reviews and delete inline comments, returns count for numbering.""" | ||
| if not (pr_number := event.pr.get("number")) or not (bot_username := event.get_username()): | ||
| if not (pr_number := event.pr.get("number")): |
There was a problem hiding this comment.
📝 LOW: dismiss_previous_reviews now calls event.get_username() directly and then indirectly again via event.get_pr_reviews_and_comments(). This double lookup is minor but unnecessary overhead on every run, and also duplicates the None handling logic in two places. Consider passing the already-resolved bot_username into get_pr_reviews_and_comments (or having it accept an optional precomputed login) to avoid redundant API calls and keep the responsibility for username resolution in one place.
| if i < max_retries: | ||
| print("No current PR description found, retrying...") | ||
| time.sleep(1) | ||
| if current_body is None: |
There was a problem hiding this comment.
📝 LOW: Inside update_pr_description, the retry loop always assigns current_body = self.get(url).json().get("body") or "", so by the time you reach the summary replacement/appending logic current_body is guaranteed to be a string. You can simplify and clarify the control flow by removing the for i in range(3) magic number and extracting the "fetch-with-retries" into a small helper or a clearly named constant, which makes the behavior easier to reason about and maintain.
| result = self.graphql_request(GRAPHQL_PR_DETAILS, variables=variables) | ||
| return result.get("data", {}).get("repository", {}) | ||
|
|
||
| def get_pr_reviews_and_comments(self, pr_number: int) -> tuple[list, list]: |
There was a problem hiding this comment.
📝 LOW: get_pr_reviews_and_comments calls self.get_username() internally, but dismiss_previous_reviews has already resolved and validated bot_username before calling this helper. This leads to redundant API calls and slightly divergent behavior between the helper and caller. Consider accepting bot_login as an optional argument and using the passed-in value when available to avoid duplicate lookups and keep the logic for handling a missing bot username centralized.
| for comment_id in comment_ids: | ||
| self.delete(f"{GITHUB_API_URL}/repos/{self.repository}/pulls/comments/{comment_id}") | ||
|
|
||
| def get_multiple_issue_node_ids(self, issue_numbers: list[int]) -> dict[int, str]: |
There was a problem hiding this comment.
📝 LOW: get_multiple_issue_node_ids dynamically builds a GraphQL query with one variable per issue. This is safe given the use of variables but can grow the query size linearly with issue_numbers. Since closingIssuesReferences is already capped upstream, this is probably fine, but it may be worth documenting or enforcing a reasonable upper bound on issue_numbers here to avoid accidentally constructing very large queries if the method is reused elsewhere.
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Optimize GitHub PR workflows by batching multiple REST calls into fewer GraphQL requests for PR summaries, reviews, and issue labeling 🚀
📊 Key Changes
pull_requestpayload data instead of performing an extra REST call when handling new PR events.get_pr_details_and_labelsto fetch PR body and repository labels in a single GraphQL query for PR-open handling.dismiss_previous_reviewsto useget_pr_reviews_and_comments(GraphQL) andbatch_delete_review_commentsto fetch and delete bot reviews/comments more efficiently.GRAPHQL_LABEL_AND_COMMENT_ISSUEand related helpers (get_multiple_issue_node_ids,get_label_idsusage) to batch-label and comment on closing issues via GraphQL instead of multiple REST calls.update_pr_descriptionto accept a cached PR body, reducing redundant fetches while still safely handling missing descriptions.actions.utilsand wire them intosummarize_pr.pyand other call sites.🎯 Purpose & Impact