Skip to content

Commit 03cb7cd

Browse files
authored
Ignore rows which are missing stats during daily aggregation (#5287)
There were a couple old versions of CF running in the clusterfuzz-candidate2 bots. This PR ignores the stats which were generated by those bots. It's unfortunate that the stats are missing these, but IMO it's better to completely ignore these rows rather than writing partial data. There were 10 of these instances running, which resulted in some fuzzers having a fuzzing session unaccounted for. Testing: Ran this SQL against the BigQuery table and verified it correctly filters out the rows and resulted in the `testcases_generated == testcases_executed` ### Rollout we will have to rerun the cron job for the past month again after this deploys to properly ignore those rows.
1 parent 305841a commit 03cb7cd

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

src/clusterfuzz/_internal/cron/aggregate_fuzzer_stats.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,8 @@ def _query_fuzzer_stats(fuzzer_name, project_id, target_date_str):
149149
dataset_id = fuzzer_stats.dataset_name(fuzzer_name)
150150
table_id = 'JobRun'
151151

152+
# We ignore rows where testcases_generated is NULL because those are rows
153+
# created by an old revision of Clusterfuzz which are missing the durations.
152154
query = f"""
153155
SELECT
154156
'{fuzzer_name}' as fuzzer_name,
@@ -170,6 +172,7 @@ def _query_fuzzer_stats(fuzzer_name, project_id, target_date_str):
170172
FROM `{project_id}.{dataset_id}.{table_id}`
171173
WHERE
172174
DATE(TIMESTAMP_SECONDS(CAST(timestamp AS INT64))) = '{target_date_str}'
175+
AND testcases_generated IS NOT NULL
173176
GROUP BY
174177
date
175178
"""

0 commit comments

Comments
 (0)