4. Optimize training performance: torch.compile, AMP, persistent workers#9
Open
musicalplatypus wants to merge 2 commits into
Open
Conversation
…, eval efficiency - Wire up dead --compile-model arg to torch.compile (aot_eager for MPS, inductor for CUDA) - Add --native-amp flag for PyTorch native AMP (torch.amp.autocast) on CUDA and MPS - Add persistent_workers=True to DataLoaders (avoids respawning on macOS spawn) - Fix pin_memory logic to use torch.cuda.is_available() instead of broken gpu>0 check - Use optimizer.zero_grad(set_to_none=True) across all training loops - Fix O(n²) torch.cat accumulation in evaluate_classification with list-based collection - Move per-batch f1/confusion_matrix to epoch-end in classification eval Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er config pipeline Wire torch.compile and native AMP opt-in flags from modelmaker params through timeseries_base.py argv construction to tinyml-tinyverse training scripts. Update ARCHITECTURE.md with Training Performance Optimizations section and PORTING_ASSESSMENT.md with post-porting development history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adithya-Thonse
pushed a commit
that referenced
this pull request
Jun 12, 2026
Merge in TINYML-ALGO/tinyml-agent-skills from 2026/pranav_a to main * commit '31e9eb19ffc4c48d40779e87aad15649e542b5db': correcting npu devices list
Adithya-Thonse
added a commit
that referenced
this pull request
Jun 12, 2026
de8af16d Pull request #45: https://jira.itg.ti.com/browse/TINYML_ALGO-698 REVERT: e48ef1a Pull request #14: TINYML_ALGO-711: fixing readme REVERT: 16fc6a6 TINYML_ALGO-711: fixing readme REVERT: e3639d2 Pull request #13: removing pycache REVERT: f8bb3b7 removing pycache REVERT: dd38428 Pull request #12: restructuring agent skill REVERT: ff02a0e restructuring agent skill REVERT: d26c6a5 Pull request #11: fixing tiny ml name REVERT: 640ffd3 fixing tiny ml name REVERT: 4ee3a19 Pull request #10: 2026/pranav a REVERT: be83fc6 minor fixes REVERT: e3a5700 removed assets, included autoMP quant REVERT: 1af575a Pull request #9: correcting npu devices list REVERT: 31e9eb1 correcting npu devices list REVERT: 59b209b Pull request #8: improving readme REVERT: 8c3260b improving readme REVERT: 668916f Pull request #7: improving readme REVERT: 68686b3 improving readme REVERT: 814316e Pull request #6: fixes to readme and marketplace json REVERT: e4bc0b4 fixes to readme and marketplace json REVERT: 6a64208 Pull request #5: fixes to readme REVERT: 0f9c868 fixes to readme REVERT: 52f95ff Pull request #4: 2026/pranav a REVERT: 443295d fixes to readme REVERT: 1881112 fixes to readme and marketplace json REVERT: 229ab57 Pull request #3: 2026/pranav a REVERT: 6519104 minor readme fix REVERT: 38e9f9f minor readme fix REVERT: db81f81 Pull request #2: minor readme fix REVERT: 1c0737a minor readme fix REVERT: 0a0c02d Pull request #1: minor readme fix REVERT: b682335 minor readme fix REVERT: 062eb39 Initial Commit git-subtree-dir: tinyml-agent-skills git-subtree-split: de8af16d9e23de3e9bda3d811a0ebdece1178260
Adithya-Thonse
pushed a commit
that referenced
this pull request
Jun 12, 2026
Merge in TINYML-ALGO/tinyml-tensorlab from 2026/pranav to main * commit '33d6ea6eb8cbc71e8c8c392961d9cf3bd941f579': minor link fix fixing ccs nomenclature
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds performance optimization flags to the training pipeline, improving training throughput on both GPU and CPU.
Changes
torch.compilesupport — Wraps models withtorch.compile()when available (PyTorch 2.0+), enabling fused kernels and graph optimizations--native-ampflag for bfloat16/float16 autocast with GradScaler on CUDApersistent_workers=Trueto avoid worker process restart overhead between epochscompile_modelandnative_ampflags threaded through the modelmaker YAML config → params → training argv pipelineFiles Changed (9 files)
tinyml-tinyverse/tinyml_tinyverse/common/utils/utils.py— core optimizationstinyml-tinyverse/tinyml_tinyverse/references/common/train_base.py— shared training basetinyml-tinyverse/tinyml_tinyverse/references/*/train.py— per-task integrationtinyml-modelmaker/.../timeseries/params.py— config parameter definitionstinyml-modelmaker/.../timeseries_base.py— argv constructionTesting
torch.compileprovides ~15-30% speedup on supported hardware