You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scripts/optimization/README.md
+60-6Lines changed: 60 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -336,14 +336,13 @@ and the pricing for each region found [here](https://cloud.google.com/bigquery/p
336
336
337
337
## Queries grouped by hash
338
338
339
-
The [queries_grouped_by_hash.sql](queries_grouped_by_hash.sql) script creates a
339
+
The [queries_grouped_by_hash_project.sql](queries_grouped_by_hash_project.sql) script creates a
340
340
table named,
341
-
`queries_grouped_by_hash`. This table groups queries by their normalized query
341
+
`queries_grouped_by_hash_project`. This table groups queries by their normalized query
342
342
pattern, which ignores
343
343
comments, parameter values, UDFs, and literals in the query text.
344
344
This allows us to group queries that are logically the same, but
345
-
have different literals. The `queries_grouped_by_hash` table does not expose the
346
-
raw SQL text of the queries.
345
+
have different literals.
347
346
348
347
The [viewable_queries_grouped_by_hash.sql](viewable_queries_grouped_by_hash.sql)
349
348
script creates a table named,
@@ -355,6 +354,11 @@ in execution than the `queries_grouped_by_hash.sql` script because it has to
355
354
loop over all projects and for each
356
355
project query the `INFORMATION_SCHEMA.JOBS_BY_PROJECT` view.
357
356
357
+
Both the `queries_grouped_by_hash` (Org and Project level) tables include duration percentiles (`median_time_ms`, `p75_time_ms`, `p90_time_ms`, etc.) calculated from `creation_time`. These metrics help identify query performance stability:
358
+
- **Median**: If median is high, it indicates that the query is taking a long time to complete. Prioritize optimizing queries with high median duration. (filter earlier, check joins).
359
+
- **Median vs p99**: A large gap indicates unstable performance (e.g., occasional slot contention or data skew).
360
+
- **p95/p99**: Useful for tracking SLA violations and "worst-case" user experience.
361
+
358
362
For example, the following queries would be grouped together because the date
359
363
literal filters are ignored:
360
364
@@ -372,16 +376,25 @@ Running the `run_anti_pattern_tool.sh` bash script will build and run the Anti-P
372
376
373
377
```sql
374
378
SELECT *
375
-
FROM optimization_workshop.queries_grouped_by_hash
379
+
FROM optimization_workshop.queries_grouped_by_hash_org
376
380
ORDER BY total_gigabytes_processed DESC
377
381
LIMIT 100
378
382
```
379
383
384
+
* Top 200 queries with the highest total slot hours
385
+
386
+
```sql
387
+
SELECT *
388
+
FROM optimization_workshop.queries_grouped_by_hash_project
389
+
ORDER BY total_slot_hours DESC
390
+
LIMIT 200
391
+
```
392
+
380
393
* Top 100 recurring queries with the highest slot hours consumed
381
394
382
395
```sql
383
396
SELECT *
384
-
FROM optimization_workshop.queries_grouped_by_hash
397
+
FROM optimization_workshop.queries_grouped_by_hash_org
385
398
ORDER BY total_slot_hours * days_active * job_count DESC
386
399
LIMIT 100
387
400
```
@@ -487,6 +500,45 @@ generated for them in the past 30 days.
487
500
488
501
</details>
489
502
503
+
<details><summary><b>🔍 BI Engine Mode Duration </b></summary>
504
+
505
+
## BI Engine Mode Duration
506
+
507
+
The [bi_engine_mode_duration](bi_engine_mode_duration.sql)
508
+
script creates a table named, `bi_engine_mode_duration`. This table
509
+
groups queries by their BI Engine mode and then shows for every day timeslice how long queries took for each mode.
510
+
511
+
### Examples of querying script results
512
+
513
+
*Order by day and BI Engine mode
514
+
515
+
```sql
516
+
SELECT *
517
+
FROM optimization_workshop.bi_engine_mode_duration
518
+
ORDER BY day, bi_engine_mode ASC
519
+
```
520
+
521
+
</details>
522
+
523
+
<details><summary><b>🔍 BI Engine Disabled Reasons</b></summary>
524
+
525
+
## BI Engine Disabled Reasons
526
+
527
+
The [bi_engine_disabled_reasons](bi_engine_disabled_reasons.sql)
528
+
script creates a table named, `bi_engine_disabled_reasons`. This table groups queries by their BI Engine Disabled reason and counts them by reason.
529
+
530
+
### Examples of querying script results
531
+
532
+
*Order by reasons count descending
533
+
534
+
```sql
535
+
SELECT *
536
+
FROM optimization_workshop.bi_engine_disabled_reasons
537
+
ORDER BY count DESC
538
+
```
539
+
540
+
</details>
541
+
490
542
# Workload Analysis
491
543
492
544
<details><summary><b>🔍 Hourly slot consumption by query hash</b></summary>
@@ -534,3 +586,5 @@ of that hour's slots each grouping of labels consumed.
0 commit comments