Skip to content

perf(worker): Optimize flake processing with bulk testrun fetching#866

Open
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/bulk-flake-testruns
Open

perf(worker): Optimize flake processing with bulk testrun fetching#866
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/bulk-flake-testruns

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry Bot commented Apr 20, 2026

Fixes WORKER-Y90. The issue was that: Iterating ReportSession objects and fetching Testrun objects individually causes N+1 queries to ta_timeseries database.

  • Refactored flake processing to fetch all relevant testruns for a commit's uploads in a single query.
  • Grouped fetched testruns by upload ID to avoid N+1 queries when iterating through uploads.
  • Modified process_single_upload to accept pre-fetched testruns.
  • Moved Testrun bulk update to occur once at the end of process_flakes_for_commit for all modified testruns.
  • Renamed get_testruns to get_all_testruns to reflect its new functionality of fetching for multiple uploads.

This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13565747

Not quite right? Click here to continue debugging with Seer.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.


Note

Low Risk
Performance-focused refactor that changes query batching and when Testrun.outcome updates are persisted; low risk but could affect which testruns get updated if upload filtering/order assumptions are wrong.

Overview
Optimizes flake processing to eliminate N+1 DB queries by fetching all Testruns for a commit’s relevant uploads in one query and grouping them by upload_id before per-upload processing.

Refactors process_single_upload to accept pre-fetched testruns, renames get_testruns to get_all_testruns, and defers Testrun.objects.bulk_update(..., ["outcome"]) to run once per commit over the full fetched set.

Reviewed by Cursor Bugbot for commit 36d0691. Bugbot is set up for automated code reviews on this repo. Configure here.

update_fields=["end_date", "count", "recent_passes_count", "fail_count"],
)

Testrun.objects.bulk_update(all_testruns, ["outcome"])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The bulk_update on Testrun objects uses timestamp as the key. Since multiple testruns can share the same timestamp, this can lead to updating incorrect rows.
Severity: HIGH

Suggested Fix

To prevent incorrect updates, ensure a unique identifier is used for the bulk_update operation. Consider adding a composite unique constraint to the Testrun model, for example on timestamp and upload_id, to uniquely identify each row. If bulk_update cannot use a composite key, consider iterating and updating each object individually to ensure correctness.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L150

Potential issue: The `Testrun` model uses `timestamp` as its primary key, but a database
migration drops the underlying primary key constraint to create a TimescaleDB
hypertable. The code calls `Testrun.objects.bulk_update` to update the `outcome` field.
Because multiple `Testrun` instances from different uploads can be created with
identical timestamps, Django's `bulk_update` will generate an `UPDATE` statement that
incorrectly modifies multiple rows that share the same `timestamp`. This leads to data
corruption where the outcome of the wrong testrun is updated. This is confirmed by test
data patterns where multiple testruns are created with `timezone.now()`.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 36d0691. Configure here.

@sentry_sdk.trace
def process_single_upload(
upload: ReportSession, curr_flakes: dict[bytes, Flake], repo_id: int
upload: ReportSession,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused upload parameter in process_single_upload

Low Severity

The upload parameter in process_single_upload is never used within the function body. It was previously needed to call get_testruns(upload), but now that testruns are passed directly as a parameter, upload serves no purpose. This dead parameter adds confusion about the function's interface and could mislead future maintainers into thinking the function depends on the upload object.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 36d0691. Configure here.

@sentry
Copy link
Copy Markdown
Contributor Author

sentry Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.25%. Comparing base (0ad8a0c) to head (36d0691).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #866   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files        1307     1307           
  Lines       48017    48021    +4     
  Branches     1636     1636           
=======================================
+ Hits        44299    44303    +4     
  Misses       3407     3407           
  Partials      311      311           
Flag Coverage Δ
workerintegration 58.53% <10.00%> (-0.02%) ⬇️
workerunit 90.39% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants