Skip to content

4. Optimize training performance: torch.compile, AMP, persistent workers#9

Open
musicalplatypus wants to merge 2 commits into
TexasInstruments:mainfrom
musicalplatypus:pr/performance-optimizations
Open

4. Optimize training performance: torch.compile, AMP, persistent workers#9
musicalplatypus wants to merge 2 commits into
TexasInstruments:mainfrom
musicalplatypus:pr/performance-optimizations

Conversation

@musicalplatypus

Copy link
Copy Markdown

Summary

Adds performance optimization flags to the training pipeline, improving training throughput on both GPU and CPU.

Changes

  1. torch.compile support — Wraps models with torch.compile() when available (PyTorch 2.0+), enabling fused kernels and graph optimizations
  2. Native AMP (Automatic Mixed Precision) — Adds --native-amp flag for bfloat16/float16 autocast with GradScaler on CUDA
  3. Persistent DataLoader workers — Enables persistent_workers=True to avoid worker process restart overhead between epochs
  4. Eval efficiency — Streamlined evaluation loops to reduce redundant computation
  5. Config pipeline threadingcompile_model and native_amp flags threaded through the modelmaker YAML config → params → training argv pipeline

Files Changed (9 files)

  • tinyml-tinyverse/tinyml_tinyverse/common/utils/utils.py — core optimizations
  • tinyml-tinyverse/tinyml_tinyverse/references/common/train_base.py — shared training base
  • tinyml-tinyverse/tinyml_tinyverse/references/*/train.py — per-task integration
  • tinyml-modelmaker/.../timeseries/params.py — config parameter definitions
  • tinyml-modelmaker/.../timeseries_base.py — argv construction

Testing

  • Benchmarked on classification and forecasting tasks
  • torch.compile provides ~15-30% speedup on supported hardware
  • No regression when flags are disabled (default behavior unchanged)

t5fkg8d44d-beep and others added 2 commits April 7, 2026 07:21
…, eval efficiency

- Wire up dead --compile-model arg to torch.compile (aot_eager for MPS, inductor for CUDA)
- Add --native-amp flag for PyTorch native AMP (torch.amp.autocast) on CUDA and MPS
- Add persistent_workers=True to DataLoaders (avoids respawning on macOS spawn)
- Fix pin_memory logic to use torch.cuda.is_available() instead of broken gpu>0 check
- Use optimizer.zero_grad(set_to_none=True) across all training loops
- Fix O(n²) torch.cat accumulation in evaluate_classification with list-based collection
- Move per-batch f1/confusion_matrix to epoch-end in classification eval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er config pipeline

Wire torch.compile and native AMP opt-in flags from modelmaker params through
timeseries_base.py argv construction to tinyml-tinyverse training scripts.
Update ARCHITECTURE.md with Training Performance Optimizations section and
PORTING_ASSESSMENT.md with post-porting development history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@musicalplatypus musicalplatypus changed the title Optimize training performance: torch.compile, AMP, persistent workers 4. Optimize training performance: torch.compile, AMP, persistent workers Apr 7, 2026
Adithya-Thonse pushed a commit that referenced this pull request Jun 12, 2026
Merge in TINYML-ALGO/tinyml-agent-skills from 2026/pranav_a to main

* commit '31e9eb19ffc4c48d40779e87aad15649e542b5db':
  correcting npu devices list
Adithya-Thonse added a commit that referenced this pull request Jun 12, 2026
de8af16d Pull request #45: https://jira.itg.ti.com/browse/TINYML_ALGO-698
REVERT: e48ef1a Pull request #14: TINYML_ALGO-711: fixing readme
REVERT: 16fc6a6 TINYML_ALGO-711: fixing readme
REVERT: e3639d2 Pull request #13: removing pycache
REVERT: f8bb3b7 removing pycache
REVERT: dd38428 Pull request #12: restructuring agent skill
REVERT: ff02a0e restructuring agent skill
REVERT: d26c6a5 Pull request #11: fixing tiny ml name
REVERT: 640ffd3 fixing tiny ml name
REVERT: 4ee3a19 Pull request #10: 2026/pranav a
REVERT: be83fc6 minor fixes
REVERT: e3a5700 removed assets, included autoMP quant
REVERT: 1af575a Pull request #9: correcting npu devices list
REVERT: 31e9eb1 correcting npu devices list
REVERT: 59b209b Pull request #8: improving readme
REVERT: 8c3260b improving readme
REVERT: 668916f Pull request #7: improving readme
REVERT: 68686b3 improving readme
REVERT: 814316e Pull request #6: fixes to readme and marketplace json
REVERT: e4bc0b4 fixes to readme and marketplace json
REVERT: 6a64208 Pull request #5: fixes to readme
REVERT: 0f9c868 fixes to readme
REVERT: 52f95ff Pull request #4: 2026/pranav a
REVERT: 443295d fixes to readme
REVERT: 1881112 fixes to readme and marketplace json
REVERT: 229ab57 Pull request #3: 2026/pranav a
REVERT: 6519104 minor readme fix
REVERT: 38e9f9f minor readme fix
REVERT: db81f81 Pull request #2: minor readme fix
REVERT: 1c0737a minor readme fix
REVERT: 0a0c02d Pull request #1: minor readme fix
REVERT: b682335 minor readme fix
REVERT: 062eb39 Initial Commit

git-subtree-dir: tinyml-agent-skills
git-subtree-split: de8af16d9e23de3e9bda3d811a0ebdece1178260
Adithya-Thonse pushed a commit that referenced this pull request Jun 12, 2026
Merge in TINYML-ALGO/tinyml-tensorlab from 2026/pranav to main

* commit '33d6ea6eb8cbc71e8c8c392961d9cf3bd941f579':
  minor link fix
  fixing ccs nomenclature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants