4. Optimize training performance: torch.compile, AMP, persistent workers by musicalplatypus · Pull Request #9 · TexasInstruments/tinyml-tensorlab

musicalplatypus · 2026-04-07T12:32:27Z

Summary

Adds performance optimization flags to the training pipeline, improving training throughput on both GPU and CPU.

Changes

torch.compile support — Wraps models with torch.compile() when available (PyTorch 2.0+), enabling fused kernels and graph optimizations
Native AMP (Automatic Mixed Precision) — Adds --native-amp flag for bfloat16/float16 autocast with GradScaler on CUDA
Persistent DataLoader workers — Enables persistent_workers=True to avoid worker process restart overhead between epochs
Eval efficiency — Streamlined evaluation loops to reduce redundant computation
Config pipeline threading — compile_model and native_amp flags threaded through the modelmaker YAML config → params → training argv pipeline

Files Changed (9 files)

tinyml-tinyverse/tinyml_tinyverse/common/utils/utils.py — core optimizations
tinyml-tinyverse/tinyml_tinyverse/references/common/train_base.py — shared training base
tinyml-tinyverse/tinyml_tinyverse/references/*/train.py — per-task integration
tinyml-modelmaker/.../timeseries/params.py — config parameter definitions
tinyml-modelmaker/.../timeseries_base.py — argv construction

Testing

Benchmarked on classification and forecasting tasks
torch.compile provides ~15-30% speedup on supported hardware
No regression when flags are disabled (default behavior unchanged)

…, eval efficiency - Wire up dead --compile-model arg to torch.compile (aot_eager for MPS, inductor for CUDA) - Add --native-amp flag for PyTorch native AMP (torch.amp.autocast) on CUDA and MPS - Add persistent_workers=True to DataLoaders (avoids respawning on macOS spawn) - Fix pin_memory logic to use torch.cuda.is_available() instead of broken gpu>0 check - Use optimizer.zero_grad(set_to_none=True) across all training loops - Fix O(n²) torch.cat accumulation in evaluate_classification with list-based collection - Move per-batch f1/confusion_matrix to epoch-end in classification eval Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…er config pipeline Wire torch.compile and native AMP opt-in flags from modelmaker params through timeseries_base.py argv construction to tinyml-tinyverse training scripts. Update ARCHITECTURE.md with Training Performance Optimizations section and PORTING_ASSESSMENT.md with post-porting development history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge in TINYML-ALGO/tinyml-agent-skills from 2026/pranav_a to main * commit '31e9eb19ffc4c48d40779e87aad15649e542b5db': correcting npu devices list

de8af16d Pull request #45: https://jira.itg.ti.com/browse/TINYML_ALGO-698 REVERT: e48ef1a Pull request #14: TINYML_ALGO-711: fixing readme REVERT: 16fc6a6 TINYML_ALGO-711: fixing readme REVERT: e3639d2 Pull request #13: removing pycache REVERT: f8bb3b7 removing pycache REVERT: dd38428 Pull request #12: restructuring agent skill REVERT: ff02a0e restructuring agent skill REVERT: d26c6a5 Pull request #11: fixing tiny ml name REVERT: 640ffd3 fixing tiny ml name REVERT: 4ee3a19 Pull request #10: 2026/pranav a REVERT: be83fc6 minor fixes REVERT: e3a5700 removed assets, included autoMP quant REVERT: 1af575a Pull request #9: correcting npu devices list REVERT: 31e9eb1 correcting npu devices list REVERT: 59b209b Pull request #8: improving readme REVERT: 8c3260b improving readme REVERT: 668916f Pull request #7: improving readme REVERT: 68686b3 improving readme REVERT: 814316e Pull request #6: fixes to readme and marketplace json REVERT: e4bc0b4 fixes to readme and marketplace json REVERT: 6a64208 Pull request #5: fixes to readme REVERT: 0f9c868 fixes to readme REVERT: 52f95ff Pull request #4: 2026/pranav a REVERT: 443295d fixes to readme REVERT: 1881112 fixes to readme and marketplace json REVERT: 229ab57 Pull request #3: 2026/pranav a REVERT: 6519104 minor readme fix REVERT: 38e9f9f minor readme fix REVERT: db81f81 Pull request #2: minor readme fix REVERT: 1c0737a minor readme fix REVERT: 0a0c02d Pull request #1: minor readme fix REVERT: b682335 minor readme fix REVERT: 062eb39 Initial Commit git-subtree-dir: tinyml-agent-skills git-subtree-split: de8af16d9e23de3e9bda3d811a0ebdece1178260

Merge in TINYML-ALGO/tinyml-tensorlab from 2026/pranav to main * commit '33d6ea6eb8cbc71e8c8c392961d9cf3bd941f579': minor link fix fixing ccs nomenclature

t5fkg8d44d-beep and others added 2 commits April 7, 2026 07:21

musicalplatypus changed the title ~~Optimize training performance: torch.compile, AMP, persistent workers~~ 4. Optimize training performance: torch.compile, AMP, persistent workers Apr 7, 2026

Adithya-Thonse pushed a commit that referenced this pull request Jun 12, 2026

Pull request #9: correcting npu devices list

1af575a

Merge in TINYML-ALGO/tinyml-agent-skills from 2026/pranav_a to main * commit '31e9eb19ffc4c48d40779e87aad15649e542b5db': correcting npu devices list

Adithya-Thonse pushed a commit that referenced this pull request Jun 12, 2026

Pull request #9: fixing ccs nomenclature

7336428

Merge in TINYML-ALGO/tinyml-tensorlab from 2026/pranav to main * commit '33d6ea6eb8cbc71e8c8c392961d9cf3bd941f579': minor link fix fixing ccs nomenclature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4. Optimize training performance: torch.compile, AMP, persistent workers#9

4. Optimize training performance: torch.compile, AMP, persistent workers#9
musicalplatypus wants to merge 2 commits into
TexasInstruments:mainfrom
musicalplatypus:pr/performance-optimizations

musicalplatypus commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

musicalplatypus commented Apr 7, 2026

Summary

Changes

Files Changed (9 files)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants