Skip to content

perf(worker): Batch testrun fetching and updating for flake processing#869

Open
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/batch-flake-testruns-fAXlBc
Open

perf(worker): Batch testrun fetching and updating for flake processing#869
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/batch-flake-testruns-fAXlBc

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry Bot commented Apr 20, 2026

Fixes WORKER-Y93. The issue was that: Iterating through uploads and querying testruns for each individually causes an N+1 query problem.

  • Refactored get_testruns to get_testruns_for_uploads to accept a list of upload IDs.
  • Modified process_single_upload to receive testruns directly, removing its internal query and bulk update.
  • Updated process_flakes_for_commit to fetch all relevant testruns for all uploads in a single batched query.
  • Consolidated the Testrun bulk update operation to occur once after processing all uploads, improving database efficiency.

This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13568247

Not quite right? Click here to continue debugging with Seer.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.


Note

Medium Risk
Reduces DB load by changing flake processing to batch-fetch and bulk-update Testrun rows across uploads; risk is moderate due to altered query/update sequencing that could affect which runs get processed or updated.

Overview
Improves flake processing performance by eliminating per-upload Testrun queries (N+1) and instead fetching all recent testruns for a commit’s uploads in one batched query, grouping them by upload_id for processing.

Moves Testrun outcome persistence from per-upload updates to a single bulk_update after all uploads are processed, while keeping flake creation/upsert behavior the same.

Reviewed by Cursor Bugbot for commit 35e31ba. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment on lines 128 to +134
log.info(
"process_flakes_for_commit: processed upload",
extra={"upload": upload.id},
)

# Bulk-update all testruns whose outcome may have been changed to "flaky_fail"
Testrun.objects.bulk_update(all_testruns, ["outcome"])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: An exception during the upload processing loop will cause all previously processed data from that batch to be lost, as database updates now only occur after the entire loop finishes.
Severity: MEDIUM

Suggested Fix

To restore the previous fault-tolerant behavior, move the Testrun.objects.bulk_update and Flake.objects.bulk_create calls back inside the for loop that iterates through uploads. This could be done by collecting testruns and flakes per-upload and saving them at the end of each loop iteration, or by wrapping each iteration in a transaction.atomic() block to ensure atomicity per upload.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L128-L134

Potential issue: The refactoring moves the `Testrun.objects.bulk_update` and
`Flake.objects.bulk_create` calls from inside the per-upload processing loop to after
the loop completes. In the original code, if an exception occurred while processing one
upload, the results from previously completed uploads in the same batch were already
persisted. In the new code, if any exception occurs at any point within the loop over
uploads, the entire operation is aborted, and all in-memory changes to `Testrun` objects
and newly created `Flake` objects from preceding, successfully processed uploads are
discarded and never written to the database. While the likelihood of an exception in
`process_single_upload` is low, this change represents a regression in fault tolerance.

Did we get this right? 👍 / 👎 to inform future reviews.

@sentry
Copy link
Copy Markdown
Contributor Author

sentry Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.25%. Comparing base (0ad8a0c) to head (35e31ba).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #869   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files        1307     1307           
  Lines       48017    48021    +4     
  Branches     1636     1636           
=======================================
+ Hits        44299    44303    +4     
  Misses       3407     3407           
  Partials      311      311           
Flag Coverage Δ
workerintegration 58.53% <10.00%> (-0.02%) ⬇️
workerunit 90.39% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link
Copy Markdown

codecov-notifications Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants