power collection by runwangdl · Pull Request #37 · runwangdl/TrainDeeploy

runwangdl · 2026-06-19T01:38:19Z

Stores the flashable board build artifacts + GPIO power-measurement trigger for three GAP9 training experiments, for migration to the power-measurement platform (Nordic PPK2).

What

DeeployTest/PowerCollection/<Model>/build_master/ — full build dir incl. flashable ELF + board_workdir flash image
DeeployTest/PowerCollection/<Model>/hex/ — L3 weight payload (readfs) required to flash
DeeployTest/PowerCollection/flash_power.sh — gapy flash/run helper (uses test platform GAP_SDK_HOME; --power drives gapy PPK2 capture synced to the pin-89 GPIO window)
deeploytraintest.c — ports the inference power GPIO pulse (pin 89) into the training harness, held high across the full training loop (POWER_MEASUREMENT was inference-only before)

Configs (CI test_gap9_tiled_training_l3_singlebuffer, -s board, -DPOWER_MEASUREMENT=ON)

Model	l1	cc_stack	CHW	board
SleepConViT	122000	4096	no	0/4, ~131.6M cyc
MCUNet	116000	8192	yes	0/4, ~717.6M cyc
TSDR	122000	4096	yes	0/4, ~904.9M cyc

Note: also carries the feat/zo-sleepconvit-training training commits (branch base).

…onViT BP Lets onnx4deeploy-generated FP32 training graphs that use Concat + Slice (e.g. SleepConViT: ConvStem branch-concat + cls-token slice) run backprop on Siracusa and tiled GAP9. Validated: SleepConViT BP 0/4 errors on both (GAP9 tiled l1=122000/cc_stack=4096 -> train 134.6M cyc; Siracusa untiled -> losses match PyTorch autograd to ~1e-5). - Generic/Parsers.py: Slice default `steps` int64 (np.ones() was float64 -> poisoned int-only tiling math). - PULPOpen/Bindings.py, GAP9/Bindings.py: float Concat bindings (template is byte-wise memcpy, dtype-agnostic; only integer bindings existed). - PULPOpen/TileConstraints/SliceConstraint.py: int() the slice step (OR-tools IntExpr*float unsupported). - TilingExtension/MemoryConstraintFlows.py: kill-set skips folded constant inputs (Slice starts/ends/axes/steps) instead of asserting. - TilingExtension/CodeTransformationPasses/TilingHoistingMixIn.py: coerce numpy ints in hoisted value tables. - GAP9/Tiler.py + GAP9/Platform.py: GAP9 Slice tiling-ready binding via the GAP9 transformer/mchan (the PULP one emits unlinkable PULP mchan calls on GAP9). - Tests/Models/Training/SleepConViT: onnx4deeploy training graph + optimizer + reference (4 SGD steps).

… cc_stack=4096, tol=5e-3) The existing gap9-training-tiled-l3-singlebuffer CI job is parametrized over L3_SINGLEBUFFER_TRAINING_MODELS, so registering SleepConViT here makes CI run its backprop test. Validated locally: 0/4 errors (train ~134.6M cyc).

…adW Cout-full Enable end-to-end backprop training for the TSDR spectrogram transformer on GAP9 (tiled, L3 single-buffer). Runs Errors 0/4 vs ORT reference @ tol 5e-3 (train_cycles ~940M, opt ~12.3M). ConvGradConstraint (CoutHWSliceStrategy): make coutHWSlice_force_cout_full conditional on dW size. Forcing Cout-full pins the whole dW in L1, which is infeasible for large-Cout regular convs (TSDR patch-embed dW ~123KB > L1). Now: - dW <= 64KB -> keep forced Cout-full (validated Siracusa drift-fix path, numerics unchanged; re-verified ResNet8 0/4 max-diff 1.1e-5, MobileNetV1 0/4 max-diff 3.1e-5). - dW > 64KB -> no Cout/HW restriction; let the tiler tile Cout freely so the conv is feasible (drift caught by the numerical reference). Register TSDR in the GAP9 tiled L3 training CI (l1=122000, cc_stack=4096, tol=5e-3, num_data_inputs=1, conv_channels_first).

Enable end-to-end backprop training for MCUNet (MnasNet-style) on GAP9, tiled into L1=116KB (< the 128KB usable L1). Runs Errors 0/4 vs ORT reference @ tol 5e-3 (train_cycles ~714M, opt ~16M). Root cause it fixes: PWConvGradWTileConstraint pinned the pointwise ConvGradW's X and dY to FULL spatial (the mixed Cout+HW memset path is unimplemented, so it fell back to Cout-only tiling). For MCUNet's 48x48 pointwise layers that means e.g. conv2d_2 X=[16,48,48]=147KB alone exceeds L1 -> proven-infeasible tiling. Fix: for small-dW PW convs, set coutHWSlice_force_cout_full and make the full-spatial pin CONDITIONAL on dW>64KB. With Cout pinned full there is no Cout tiling, so allowing H/W to tile is the SAFE "HW-only" memset-once case (mirrors the #34 regular-ConvGradW fix) — X then tiles spatially and fits. Large-dW PW (e.g. MobileNetV1 block_11, dW~128KB) keeps the original full-spatial path. Tiling L1 floor for MCUNet drops ~160KB -> ~80KB. Regression: MobileNetV1 GAP9 re-verified 0/4 (274M cyc, unchanged). MCUNet uses bias-free convs (drop conv biases) to avoid the 1-D tiled-bias codegen path, same approach as TSDR. Registered in the GAP9 tiled L3 training CI (l1=116000, cc_stack=8192, conv_channels_first).

… tol=5e-3)

…Net/TSDR training Adds a GPIO power-measurement trigger to the GAP9 training harness and stores the flashable build artifacts for three training experiments, built with `-s board` + `-DPOWER_MEASUREMENT=ON` at their CI configs (test_gap9_tiled_training_l3_singlebuffer). Per experiment (DeeployTest/PowerCollection/<Model>/): build_master/ full build dir incl. the flashable ELF + board_workdir flash image hex/ L3 weight payload written to external flash (readfs) — required to flash flash_power.sh: gapy flash/run helper for the power-measurement platform. Flashes a migrated build_master via the test platform's own GAP_SDK_HOME (SSBL/openocd come from there, not the build machine — keep SDK versions aligned). `--power` runs gapy's PPK2 capture, synchronized to the pin-89 GPIO window. Harness (deeploytraintest.c): port the inference power-measurement GPIO pulse (pin 89) into the training harness — held high across the full training loop (fwd/bwd + optimizer) for external power capture (PPK2). Previously POWER_MEASUREMENT was inference-only. All validated 0/4 errors on board: - SleepConViT l1=122000 cc_stack=4096 train ~131.6M cyc - MCUNet l1=116000 cc_stack=8192 CHW train ~717.6M cyc - TSDR l1=122000 cc_stack=4096 CHW train ~904.9M cyc Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

runwangdl and others added 6 commits June 18, 2026 22:15

ci(ZO): MCUNet GAP9 training override (channels-first, cc_stack=8192,…

717bd1e

… tol=5e-3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

power collection#37

power collection#37
runwangdl wants to merge 6 commits into
develfrom
power-collection

runwangdl commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

runwangdl commented Jun 19, 2026

What

Configs (CI test_gap9_tiled_training_l3_singlebuffer, -s board, -DPOWER_MEASUREMENT=ON)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant