Commit e6e7a67
committed
feat: add roofline analysis (RooflineAnalyzer, measure_peaks)
Adds measurement primitives for compute-vs-memory-bound layer analysis:
- measure_peaks(): empirical probe of peak FLOPs/s (matmul) and streaming
bandwidth (cache-defeating memcpy). Pins TF32 off by default for honest
fp32 peaks; caches per (device, dtype, sizes).
- HardwarePeaks: dataclass with peak_flops, peak_bandwidth, ridge_point,
and the flags under which they were measured.
- RooflineAnalyzer: per-layer profiler with .profile() / .summary() /
.plot(). Single-pass hooks measure FLOPs (analytical for Conv and
Linear), bytes (weights + input + output, Williams 2009), and time.
Classifies each layer as memory-bound or compute-bound; layers outside
Conv/Linear land in an "undefined" bucket with a warning.
- Plotly log-log roofline with transparent background and the project
teal palette.
Per the measurement-only contract, fasterbench exposes numbers and the
plot; compression decisions belong in fasterrecipes.
Includes:
- API notebook nbs/analysis/roofline.ipynb with inline unit tests
(hand-computed Conv2d flops/bytes, cache test, Linear stack) and
#|slow integration tests (ResNet-18 CPU with synthetic peaks,
CUDA smoke test guarded by is_available()).
- Tutorial nbs/tutorials/roofline.ipynb showing hardware peaks,
ResNet-18 profiling, and AI shift across input resolutions.
- Sidebar + index.ipynb re-exports.1 parent 4c06d71 commit e6e7a67
10 files changed
Lines changed: 1182 additions & 44 deletions
File tree
- fasterbench
- nbs
- analysis
- metrics
- tutorials
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
| |||
38 | 41 | | |
39 | 42 | | |
40 | 43 | | |
| 44 | + | |
| 45 | + | |
41 | 46 | | |
42 | 47 | | |
43 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
133 | 163 | | |
134 | 164 | | |
135 | 165 | | |
| |||
0 commit comments