Commit e6468fc
Consolidate MFU tracking into perf_logger, address PR review feedback
Reworks MFU tracking per reviewer feedback on #1548:
- Delete per-recipe flops.py, test_flops.py, and the CLI entirely
- Inline ~30-line FLOPs helper into each recipe's existing perf_logger.py
- MFU metrics (train/tflops_per_gpu, train/mfu_pct) flow through the existing
torchmetrics -> WANDB path, respecting logging_frequency
- Drop comm-overhead estimation; will be a separate future PR
The new formula is per_token_flops(seq_len) * num_unpadded_tokens_on_rank.
The unpadded-tokens counter (already used by tokens_per_second_per_gpu) is
per-rank after DP/CP sharding and accumulated across grad-acc micro-batches,
so the formula works uniformly across DDP/FSDP2/FSDP2+CP/DDP+CP/mFSDP and
across BSHD and THD (sequence packing) with no per-strategy factors.
Net: -3000 / +300 lines. Training scripts lose all MFU scaffolding; the
only change per script is one extra kwarg on the PerfLogger constructor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>1 parent 27255a1 commit e6468fc
29 files changed
Lines changed: 601 additions & 4458 deletions
File tree
- bionemo-recipes/recipes
- codonfm_native_te
- tests
- esm2_native_te
- tests
- llama3_native_te
- tests
- opengenome2_llama_native_te
- tests
- ci/scripts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
188 | | - | |
| 188 | + | |
| 189 | + | |
189 | 190 | | |
190 | | - | |
| 191 | + | |
| 192 | + | |
191 | 193 | | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
196 | 198 | | |
197 | 199 | | |
198 | 200 | | |
| |||
0 commit comments