|
| 1 | +# Dashboard Unification Plan |
| 2 | + |
| 3 | +This document outlines proposed improvements to unify panel titles, units, and groupings across all PostgresAI Grafana dashboards. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Executive Summary |
| 8 | + |
| 9 | +After reviewing all 14 dashboards (22,700+ lines of JSON configuration), I identified several inconsistencies that impact user experience and maintainability: |
| 10 | + |
| 11 | +1. **Panel titles** use inconsistent naming conventions |
| 12 | +2. **Units** vary for similar metrics across dashboards |
| 13 | +3. **Row groupings** differ between related dashboards (e.g., Table Stats vs Index Health) |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## 1. Panel Title Unification |
| 18 | + |
| 19 | +### 1.1 Current Issues |
| 20 | + |
| 21 | +#### Inconsistent "Top N" Panel Title Patterns |
| 22 | + |
| 23 | +| Dashboard | Current Title | Issue | |
| 24 | +|-----------|---------------|-------| |
| 25 | +| Dashboard 8 (Table Stats) | `Top $top_n tables by total size` | Uses "total size" | |
| 26 | +| Dashboard 10 (Index Health) | `Top $top_n indexes by size` | Uses just "size" | |
| 27 | +| Dashboard 8 (Table Stats) | `Top $top_n tables by size change rate (absolute)` | "(absolute)" suffix | |
| 28 | +| Dashboard 9 (Single Table) | `Table logical size distribution changes (absolute)` | Different pattern | |
| 29 | + |
| 30 | +#### Inconsistent Bloat Panel Naming |
| 31 | + |
| 32 | +| Dashboard | Current Title | |
| 33 | +|-----------|---------------| |
| 34 | +| Dashboard 9 (Single Table) | `Estimated bloat %` | |
| 35 | +| Dashboard 10 (Index Health) | `Top $top_n indexes by estimated bloat %` | |
| 36 | +| Dashboard 11 (Single Index) | No explicit bloat panel title (uses Boguk ratio) | |
| 37 | + |
| 38 | +#### Inconsistent IO Panel Naming |
| 39 | + |
| 40 | +| Dashboard | Current Title | |
| 41 | +|-----------|---------------| |
| 42 | +| Dashboard 10 (Index Health) | `Top $top_n indexes by reads (bytes)` | |
| 43 | +| Dashboard 10 (Index Health) | `Top $top_n indexes by hits (bytes)` | |
| 44 | +| Dashboard 11 (Single Index) | `Index reads and hits (bytes)` | |
| 45 | +| Dashboard 9 (Single Table) | `Shared block hits` / `Shared block reads` | |
| 46 | + |
| 47 | +### 1.2 Proposed Unified Title Patterns |
| 48 | + |
| 49 | +#### Pattern 1: Aggregated Dashboards (Multiple Objects) |
| 50 | +``` |
| 51 | +Top $top_n {objects} by {metric} |
| 52 | +``` |
| 53 | +Examples: |
| 54 | +- `Top $top_n tables by total size` |
| 55 | +- `Top $top_n indexes by size` |
| 56 | +- `Top $top_n tables by tuple inserts` |
| 57 | +- `Top $top_n indexes by block reads` |
| 58 | + |
| 59 | +#### Pattern 2: Single Object Dashboards |
| 60 | +``` |
| 61 | +{Metric name} |
| 62 | +``` |
| 63 | +Examples: |
| 64 | +- `Size distribution` |
| 65 | +- `Tuple operations` |
| 66 | +- `Block reads and hits` |
| 67 | +- `Estimated bloat` |
| 68 | + |
| 69 | +#### Pattern 3: Rate Metrics |
| 70 | +Add `/s` suffix for rates, not "(absolute)": |
| 71 | +- `Size change rate` → `Size growth /s` |
| 72 | +- `Tuple operations` (for rates) |
| 73 | +- Remove "(absolute)" from titles |
| 74 | + |
| 75 | +### 1.3 Specific Title Changes |
| 76 | + |
| 77 | +#### Dashboard 8 (Table Stats) - Aggregated |
| 78 | +| Current | Proposed | |
| 79 | +|---------|----------| |
| 80 | +| `Top $top_n tables by total size` | `Top $top_n tables by total size` (keep) | |
| 81 | +| `Top $top_n tables by size change rate (absolute)` | `Top $top_n tables by size growth /s` | |
| 82 | +| `Top $top_n tables by table (w/o TOAST) size` | `Top $top_n tables by heap size (excl. TOAST)` | |
| 83 | +| `Top $top_n tables by table (w/o TOAST) size change rate (absolute)` | `Top $top_n tables by heap size growth /s` | |
| 84 | +| `Top $top_n tables by TOAST size` | `Top $top_n tables by TOAST size` (keep) | |
| 85 | +| `Top $top_n tables by index size` | `Top $top_n tables by indexes size` | |
| 86 | +| `Top $top_n tables by tuple inserts (including TOAST)` | `Top $top_n tables by tuple inserts` | |
| 87 | +| `Top $top_n tables by tuple updates (HOT + non-HOT) (including TOAST)` | `Top $top_n tables by tuple updates` | |
| 88 | +| `Top $top_n tables by tuple deletes (including TOAST)` | `Top $top_n tables by tuple deletes` | |
| 89 | +| `Top $top_n tables by blk_reads in bytes` | `Top $top_n tables by block reads` | |
| 90 | +| `Top $top_n tables by blk_hits in bytes` | `Top $top_n tables by block hits` | |
| 91 | + |
| 92 | +#### Dashboard 9 (Single Table Analysis) |
| 93 | +| Current | Proposed | |
| 94 | +|---------|----------| |
| 95 | +| `Table logical size distribution` | `Size distribution` | |
| 96 | +| `Table logical size distribution changes (absolute)` | `Size growth /s` | |
| 97 | +| `Estimated bloat %` | `Bloat percentage (estimated)` | |
| 98 | +| `Estimated bloat size` | `Bloat size (estimated)` | |
| 99 | +| `Tuple operations` | `Tuple operations /s` | |
| 100 | +| `Tuple operations (%)` | `Tuple operations distribution (%)` | |
| 101 | +| `Live tuples fetch distribution` | `Tuple fetch methods /s` | |
| 102 | +| `Shared block hits` | `Block cache hits /s` | |
| 103 | +| `Shared block reads` | `Block disk reads /s` | |
| 104 | +| `Shared block hit ratio` | `Block cache hit ratio` | |
| 105 | + |
| 106 | +#### Dashboard 10 (Index Health) - Aggregated |
| 107 | +| Current | Proposed | |
| 108 | +|---------|----------| |
| 109 | +| `Top $top_n indexes by size` | `Top $top_n indexes by size` (keep) | |
| 110 | +| `Top $top_n indexes by tuples read` | `Top $top_n indexes by tuples read /s` | |
| 111 | +| `Top $top_n indexes by tuples fetched` | `Top $top_n indexes by tuples fetched /s` | |
| 112 | +| `Top $top_n indexes by reads (bytes)` | `Top $top_n indexes by block reads` | |
| 113 | +| `Top $top_n indexes by hits (bytes)` | `Top $top_n indexes by block hits` | |
| 114 | +| `Top $top_n indexes by estimated bloat %` | `Top $top_n indexes by bloat %` | |
| 115 | +| `Top $top_n indexes by estimated bloat size` | `Top $top_n indexes by bloat size` | |
| 116 | + |
| 117 | +#### Dashboard 11 (Single Index Analysis) |
| 118 | +| Current | Proposed | |
| 119 | +|---------|----------| |
| 120 | +| `Index size` | `Size` | |
| 121 | +| `Index scans` | `Scans /s` | |
| 122 | +| `Tuples read and fetched` | `Tuples read and fetched /s` | |
| 123 | +| `Index reads and hits (bytes)` | `Block reads and hits` | |
| 124 | +| (Boguk ratio panel has no title) | `Bloat analysis (Boguk ratio)` | |
| 125 | + |
| 126 | +#### Dashboard 1 (Node Performance Overview) |
| 127 | +| Current | Proposed | |
| 128 | +|---------|----------| |
| 129 | +| `TPS` | `Transactions /s` | |
| 130 | +| `QPS (pg_stat_statements)` | `Queries /s` | |
| 131 | +| `Query total time (pg_stat_statements)` | `Query execution time /s` | |
| 132 | +| `Query time per call (latency) (pg_stat_statements)` | `Query latency per call` | |
| 133 | +| `Tuples fetched and tuples returned per second` | `Tuple read operations /s` | |
| 134 | +| `Tuples operations per second` | `Tuple write operations /s` | |
| 135 | +| `blk_reads and blk_hits per second (bytes)` | `Block reads and hits /s` | |
| 136 | +| `blk_read_time and blk_write_time (s/s)` | `Block I/O wait time /s` | |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## 2. Unit Unification |
| 141 | + |
| 142 | +### 2.1 Current Unit Inconsistencies |
| 143 | + |
| 144 | +| Metric Type | Units Found | Dashboards | |
| 145 | +|-------------|-------------|------------| |
| 146 | +| Byte rates | `binBps`, `Bps` | Mixed across all | |
| 147 | +| Percentages | `percent`, `percentunit` | Mixed across all | |
| 148 | +| Operations | `ops`, `ops/s`, `short` | Mixed | |
| 149 | +| Calls | `calls/s`, `ops/s` | Dashboard 2 | |
| 150 | +| Time | `s`, `ms`, `s/s`, `ms/s` | Mixed | |
| 151 | + |
| 152 | +### 2.2 Proposed Unit Standards |
| 153 | + |
| 154 | +| Metric Type | Standard Unit | Grafana Unit ID | |
| 155 | +|-------------|---------------|-----------------| |
| 156 | +| Byte sizes (absolute) | Bytes (IEC) | `bytes` | |
| 157 | +| Byte rates (throughput) | Bytes/sec (IEC) | `binBps` | |
| 158 | +| Percentages (0-100 scale) | Percent (0-100) | `percent` | |
| 159 | +| Percentages (0-1 scale) | Percent (0.0-1.0) | `percentunit` | |
| 160 | +| Operations per second | ops/sec | `ops` (with /s in title) | |
| 161 | +| Time durations | Seconds | `s` | |
| 162 | +| Time rates | Seconds per second | `s` (with ratio in title) | |
| 163 | +| Latencies | Milliseconds | `ms` | |
| 164 | +| Row counts | Short | `short` | |
| 165 | + |
| 166 | +### 2.3 Specific Unit Changes |
| 167 | + |
| 168 | +#### Index Dashboards (10, 11) |
| 169 | +| Panel | Current Unit | Proposed Unit | |
| 170 | +|-------|--------------|---------------| |
| 171 | +| Top indexes by reads (bytes) | `Bps` | `binBps` | |
| 172 | +| Top indexes by hits (bytes) | `Bps` | `binBps` | |
| 173 | +| Index reads and hits (bytes) | `Bps` | `binBps` | |
| 174 | + |
| 175 | +#### Table Dashboards (8, 9) |
| 176 | +Already mostly consistent with `binBps` for rates and `bytes` for absolute sizes. |
| 177 | + |
| 178 | +#### Query Dashboards (2, 3) |
| 179 | +| Panel | Current Unit | Proposed Unit | |
| 180 | +|-------|--------------|---------------| |
| 181 | +| Calls per second | `calls/s` | `ops` (rename to "ops/s" in title or keep calls/s) | |
| 182 | + |
| 183 | +**Recommendation**: Keep `calls/s` as it's semantically clearer for queries. |
| 184 | + |
| 185 | +--- |
| 186 | + |
| 187 | +## 3. Row Grouping and Ordering Unification |
| 188 | + |
| 189 | +### 3.1 Current Row Groupings |
| 190 | + |
| 191 | +#### Dashboard 8 (Table Stats) |
| 192 | +1. Detailed table view (aggregated table statistics) |
| 193 | +2. Size stats |
| 194 | +3. Tuple stats |
| 195 | +4. IO stats |
| 196 | +5. Estimated bloat stats |
| 197 | + |
| 198 | +#### Dashboard 9 (Single Table Analysis) |
| 199 | +1. Size stats |
| 200 | +2. Estimated bloat stats |
| 201 | +3. Tuple stats |
| 202 | +4. IO stats |
| 203 | + |
| 204 | +#### Dashboard 10 (Index Health) |
| 205 | +1. Detailed index view |
| 206 | +2. Size stats |
| 207 | +3. Index usage stats |
| 208 | +4. IO stats |
| 209 | +5. Estimated bloat stats |
| 210 | + |
| 211 | +#### Dashboard 11 (Single Index Analysis) |
| 212 | +1. Size stats |
| 213 | +2. Index usage stats |
| 214 | +3. IO stats |
| 215 | +4. Estimated bloat stats |
| 216 | + |
| 217 | +### 3.2 Identified Inconsistencies |
| 218 | + |
| 219 | +1. **Bloat stats position**: In Dashboard 9 (Single Table), bloat comes right after Size stats. In Dashboard 8, 10, 11, bloat comes at the end. |
| 220 | + |
| 221 | +2. **Missing sections**: Dashboard 9 has "IO stats" section, but Dashboard 11 (Single Index) also has "IO stats" - these are consistent. |
| 222 | + |
| 223 | +3. **Naming inconsistency**: "Index usage stats" vs "Tuple stats" - these cover similar concepts (how objects are being used). |
| 224 | + |
| 225 | +### 3.3 Proposed Unified Row Order |
| 226 | + |
| 227 | +For **Aggregated Dashboards** (8, 10): |
| 228 | +1. `Detailed view` (collapsed table with all metrics) |
| 229 | +2. `Size stats` (size-related time series) |
| 230 | +3. `Activity stats` (operations, scans, tuple activity) |
| 231 | +4. `IO stats` (block reads, hits, cache performance) |
| 232 | +5. `Bloat stats` (estimated bloat metrics) |
| 233 | + |
| 234 | +For **Single Object Dashboards** (9, 11): |
| 235 | +1. `Size stats` |
| 236 | +2. `Activity stats` |
| 237 | +3. `IO stats` |
| 238 | +4. `Bloat stats` |
| 239 | + |
| 240 | +### 3.4 Specific Changes |
| 241 | + |
| 242 | +#### Dashboard 9 (Single Table Analysis) |
| 243 | +Move "Estimated bloat stats" section from position 2 to position 4 (after IO stats): |
| 244 | + |
| 245 | +**Current Order:** |
| 246 | +1. Size stats |
| 247 | +2. Estimated bloat stats ← Move to end |
| 248 | +3. Tuple stats |
| 249 | +4. IO stats |
| 250 | + |
| 251 | +**Proposed Order:** |
| 252 | +1. Size stats |
| 253 | +2. Tuple stats (rename to "Activity stats") |
| 254 | +3. IO stats |
| 255 | +4. Bloat stats |
| 256 | + |
| 257 | +#### Dashboard 8 (Table Stats) |
| 258 | +Rename "Tuple stats" to "Activity stats" for consistency with proposed naming. |
| 259 | + |
| 260 | +#### Dashboard 10 (Index Health) |
| 261 | +Rename "Index usage stats" to "Activity stats" for consistency. |
| 262 | + |
| 263 | +#### Dashboard 11 (Single Index Analysis) |
| 264 | +Rename "Index usage stats" to "Activity stats" for consistency. |
| 265 | + |
| 266 | +--- |
| 267 | + |
| 268 | +## 4. Additional Recommendations |
| 269 | + |
| 270 | +### 4.1 Panel Height Consistency |
| 271 | + |
| 272 | +Current panel heights vary significantly: |
| 273 | +- Most time series panels: h=8 to h=13 |
| 274 | +- Recommendation: Standardize on h=10 for regular panels, h=13 for wide/important panels |
| 275 | + |
| 276 | +### 4.2 Legend Configuration |
| 277 | + |
| 278 | +Some dashboards show different legend calculations: |
| 279 | +- Dashboard 9: `["min", "max", "mean"]` - good |
| 280 | +- Dashboard 10: `["min", "max", "mean"]` - good |
| 281 | +- Some panels only show `["last"]` |
| 282 | + |
| 283 | +Recommendation: Use `["min", "max", "mean"]` for all time series panels. |
| 284 | + |
| 285 | +### 4.3 Footer Text Panel |
| 286 | + |
| 287 | +All dashboards end with the PostgresAI branding text panel. This is consistent and should be maintained. |
| 288 | + |
| 289 | +--- |
| 290 | + |
| 291 | +## 5. Implementation Priority |
| 292 | + |
| 293 | +### Priority 1: High Impact, Low Risk |
| 294 | +1. Rename "Index usage stats" → "Activity stats" (Dashboards 10, 11) |
| 295 | +2. Rename "Tuple stats" → "Activity stats" (Dashboards 8, 9) |
| 296 | +3. Fix unit inconsistency: `Bps` → `binBps` in Index dashboards |
| 297 | + |
| 298 | +### Priority 2: Medium Impact |
| 299 | +1. Standardize "Top N" panel title patterns |
| 300 | +2. Reorder Dashboard 9 sections (move bloat to end) |
| 301 | +3. Add `/s` suffix to rate panel titles |
| 302 | + |
| 303 | +### Priority 3: Polish |
| 304 | +1. Standardize panel heights |
| 305 | +2. Unify legend configurations |
| 306 | +3. Review and standardize panel descriptions |
| 307 | + |
| 308 | +--- |
| 309 | + |
| 310 | +## 6. Implementation Checklist |
| 311 | + |
| 312 | +### Dashboard 8 (Table Stats) |
| 313 | +- [ ] Rename "Tuple stats" row to "Activity stats" |
| 314 | +- [ ] Simplify panel titles (remove "(including TOAST)", "(absolute)") |
| 315 | +- [ ] Add `/s` to rate panel titles |
| 316 | + |
| 317 | +### Dashboard 9 (Single Table Analysis) |
| 318 | +- [ ] Rename "Tuple stats" row to "Activity stats" |
| 319 | +- [ ] Move "Estimated bloat stats" section to end (after IO stats) |
| 320 | +- [ ] Simplify panel titles |
| 321 | + |
| 322 | +### Dashboard 10 (Index Health) |
| 323 | +- [ ] Rename "Index usage stats" row to "Activity stats" |
| 324 | +- [ ] Change units from `Bps` to `binBps` for reads/hits panels |
| 325 | +- [ ] Add `/s` to rate panel titles |
| 326 | + |
| 327 | +### Dashboard 11 (Single Index Analysis) |
| 328 | +- [ ] Rename "Index usage stats" row to "Activity stats" |
| 329 | +- [ ] Change units from `Bps` to `binBps` for reads/hits panel |
| 330 | +- [ ] Add title to Boguk ratio panel: "Bloat analysis (Boguk ratio)" |
| 331 | + |
| 332 | +### Dashboard 1 (Node Performance Overview) |
| 333 | +- [ ] Simplify panel titles (remove "pg_stat_statements" suffixes where redundant) |
| 334 | +- [ ] Consider renaming sections for consistency |
| 335 | + |
| 336 | +--- |
| 337 | + |
| 338 | +## 7. Summary of Changes |
| 339 | + |
| 340 | +| Category | Changes Count | Risk Level | |
| 341 | +|----------|---------------|------------| |
| 342 | +| Row renames | 4 | Low | |
| 343 | +| Row reordering | 1 | Low | |
| 344 | +| Panel title updates | ~30 | Low | |
| 345 | +| Unit fixes | 3-4 | Low | |
| 346 | +| Panel config (heights, legends) | ~20 | Very Low | |
| 347 | + |
| 348 | +Total estimated changes: ~60 modifications across 4-5 dashboards. |
| 349 | + |
| 350 | +--- |
| 351 | + |
| 352 | +*Document prepared for dashboard unification review.* |
0 commit comments