perf(worker): Optimize flake processing with bulk testrun fetching#866
perf(worker): Optimize flake processing with bulk testrun fetching#866sentry[bot] wants to merge 1 commit intomainfrom
Conversation
| update_fields=["end_date", "count", "recent_passes_count", "fail_count"], | ||
| ) | ||
|
|
||
| Testrun.objects.bulk_update(all_testruns, ["outcome"]) |
There was a problem hiding this comment.
Bug: The bulk_update on Testrun objects uses timestamp as the key. Since multiple testruns can share the same timestamp, this can lead to updating incorrect rows.
Severity: HIGH
Suggested Fix
To prevent incorrect updates, ensure a unique identifier is used for the bulk_update operation. Consider adding a composite unique constraint to the Testrun model, for example on timestamp and upload_id, to uniquely identify each row. If bulk_update cannot use a composite key, consider iterating and updating each object individually to ensure correctness.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: apps/worker/services/test_analytics/ta_process_flakes.py#L150
Potential issue: The `Testrun` model uses `timestamp` as its primary key, but a database
migration drops the underlying primary key constraint to create a TimescaleDB
hypertable. The code calls `Testrun.objects.bulk_update` to update the `outcome` field.
Because multiple `Testrun` instances from different uploads can be created with
identical timestamps, Django's `bulk_update` will generate an `UPDATE` statement that
incorrectly modifies multiple rows that share the same `timestamp`. This leads to data
corruption where the outcome of the wrong testrun is updated. This is confirmed by test
data patterns where multiple testruns are created with `timezone.now()`.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 36d0691. Configure here.
| @sentry_sdk.trace | ||
| def process_single_upload( | ||
| upload: ReportSession, curr_flakes: dict[bytes, Flake], repo_id: int | ||
| upload: ReportSession, |
There was a problem hiding this comment.
Unused upload parameter in process_single_upload
Low Severity
The upload parameter in process_single_upload is never used within the function body. It was previously needed to call get_testruns(upload), but now that testruns are passed directly as a parameter, upload serves no purpose. This dead parameter adds confusion about the function's interface and could mislead future maintainers into thinking the function depends on the upload object.
Reviewed by Cursor Bugbot for commit 36d0691. Configure here.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #866 +/- ##
=======================================
Coverage 92.25% 92.25%
=======================================
Files 1307 1307
Lines 48017 48021 +4
Branches 1636 1636
=======================================
+ Hits 44299 44303 +4
Misses 3407 3407
Partials 311 311
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |


Fixes WORKER-Y90. The issue was that: Iterating
ReportSessionobjects and fetchingTestrunobjects individually causes N+1 queries tota_timeseriesdatabase.process_single_uploadto accept pre-fetched testruns.Testrunbulk update to occur once at the end ofprocess_flakes_for_commitfor all modified testruns.get_testrunstoget_all_testrunsto reflect its new functionality of fetching for multiple uploads.This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13565747
Not quite right? Click here to continue debugging with Seer.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Low Risk
Performance-focused refactor that changes query batching and when
Testrun.outcomeupdates are persisted; low risk but could affect which testruns get updated if upload filtering/order assumptions are wrong.Overview
Optimizes flake processing to eliminate N+1 DB queries by fetching all
Testruns for a commit’s relevant uploads in one query and grouping them byupload_idbefore per-upload processing.Refactors
process_single_uploadto accept pre-fetched testruns, renamesget_testrunstoget_all_testruns, and defersTestrun.objects.bulk_update(..., ["outcome"])to run once per commit over the full fetched set.Reviewed by Cursor Bugbot for commit 36d0691. Bugbot is set up for automated code reviews on this repo. Configure here.