Commit eb2fa8e
committed
fix(launcher): recurse subdirectories when injecting monitoring: into recipes
The previous glob `$SRT_RECIPE_DST/*.yaml` only matched top-level YAMLs,
but recipes live under workload subdirectories (e.g. 8k1k/*.yaml). The
loop iterated zero times, no recipe got the monitoring: block, perfmon
never spawned, no perf_samples_*.csv were written, aggregate_power
silently skipped patching the agg JSON, and the dashboard had no power
data.
Sweep #26548110246 burned hours of GB300 time and shipped "success" with
zero power keys in every agg artifact — exactly the silent-failure chain
we should have caught earlier.
Fix: recurse via `find -type f -name '*.yaml'`. Add a loud WARNING when
zero recipes get the injection so future regressions surface immediately
instead of waiting for missing dashboard data to be noticed.1 parent a9339df commit eb2fa8e
1 file changed
Lines changed: 14 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
120 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
121 | 128 | | |
122 | 129 | | |
123 | 130 | | |
| 131 | + | |
124 | 132 | | |
125 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
126 | 137 | | |
127 | 138 | | |
128 | 139 | | |
| |||
0 commit comments