Skip to content

Commit bf99548

Browse files
danmcleranclaude
andcommitted
Strip Phase N references from GitHub Pages docs
The Pages site should read standalone without requiring the reader to cross-reference QUANTIZATION.md plan milestones. Replace "Phase 9/10/11/ 12/13/14/15/16 ships X" lead-ins with feature-descriptive language, drop "(Phase N)" suffixes from section headings, and remove "Phase N." prefixes from table descriptions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 67fc8a1 commit bf99548

11 files changed

Lines changed: 59 additions & 61 deletions

File tree

docs/architectures.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,5 @@ TinyMind provides a range of neural network architectures, all as header-only C+
1818
| [FFT Layer]({{ site.baseurl }}/architectures/fft) | 768 bytes (64-pt Q8.8) | Frequency-domain feature extraction |
1919
| [Quantized Networks]({{ site.baseurl }}/architectures/quantized-networks) | 128 bytes (packed binary) | 32-64x weight compression |
2020
| [Int8 Affine Quantization]({{ site.baseurl }}/architectures/int8-quantization) | int8 weights + int32 accum | TFLite/CMSIS-NN style post-training int8 across dense/conv/pool/BN/LN/softmax/RNN/attention/FFT |
21-
| [Mixed Precision]({{ site.baseurl }}/architectures/mixed-precision) | int8 + fp16 + bf16 bridges | Phase 9 qbridge converters between int8 affine / `QValue` Q-format / float / fp16 / bf16 |
22-
| [SIMD Backends]({{ site.baseurl }}/architectures/simd-backends) | n/a (perf, not capacity) | Phase 14 ISA-capability gates: NEON / SVE / Helium / AVX2 / AVX-512, byte-identical to scalar |
21+
| [Mixed Precision]({{ site.baseurl }}/architectures/mixed-precision) | int8 + fp16 + bf16 bridges | `qbridge` converters between int8 affine / `QValue` Q-format / float / fp16 / bf16 |
22+
| [SIMD Backends]({{ site.baseurl }}/architectures/simd-backends) | n/a (perf, not capacity) | ISA-capability gates: NEON / SVE / Helium / AVX2 / AVX-512, byte-identical to scalar |

docs/architectures/fft.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,4 +285,4 @@ These estimates include twiddle multiplication (2 multiplies + 2 adds per butter
285285

286286
## Int8 Quantized Counterpart
287287

288-
Phase 13 ships `QFFT1D<N>`, a radix-2 DIT FFT on int16 buffers with Q1.15 twiddle factors. Twiddles are caller-owned, built host-side by `buildQFFTTwiddles(n, cos_out, sin_out)`. Scaled butterflies (right-shift by 1 per stage; total scaling 1/N) keep the int16 working register bounded. `magnitudeSquared` emits int32; the int8 boundary on either side is expressed as an ordinary `Requantizer`. Inverse via the conjugate trick. See [Int8 Affine Quantization]({{ site.baseurl }}/architectures/int8-quantization) for the surrounding integer pipeline.
288+
TinyMind ships `QFFT1D<N>`, a radix-2 DIT FFT on int16 buffers with Q1.15 twiddle factors. Twiddles are caller-owned, built host-side by `buildQFFTTwiddles(n, cos_out, sin_out)`. Scaled butterflies (right-shift by 1 per stage; total scaling 1/N) keep the int16 working register bounded. `magnitudeSquared` emits int32; the int8 boundary on either side is expressed as an ordinary `Requantizer`. Inverse via the conjugate trick. See [Int8 Affine Quantization]({{ site.baseurl }}/architectures/int8-quantization) for the surrounding integer pipeline.

docs/architectures/int8-quantization.md

Lines changed: 30 additions & 30 deletions
Large diffs are not rendered by default.

docs/architectures/lstm-gru.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ See the [Weight Import Export and PyTorch Interoperability]({{ site.baseurl }}/t
251251
252252
# Int8 Quantized Counterparts
253253
254-
For inference-only deployment that does not need the trainable Q-format pipeline at all, Phase 12 ships pure-integer int8 cells alongside `LstmNeuralNetwork` / `GruNeuralNetwork`:
254+
For inference-only deployment that does not need the trainable Q-format pipeline at all, TinyMind ships pure-integer int8 cells alongside `LstmNeuralNetwork` / `GruNeuralNetwork`:
255255
256256
- `QLSTMCell` — four gates (i, f, g, o) in TFLite ordering. Two rescalers per gate (input-MAC + recurrent-MAC) into a shared sigmoid / tanh LUT input scale; cell update via two `multiplyByQuantizedMultiplier` calls. Cell-state storage `int8_t` (default) or `int16_t` for long unroll horizons (gate `TINYMIND_ENABLE_INT16_ACCUM=1`).
257257
- `QGRUCell` — three gates (r, z, n) in canonical ordering. Reset-before-multiply formulation, `(1 - z_t)` computed exactly in the sigmoid grid as `-z_t`.

docs/architectures/mixed-precision.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,17 @@ nav_order: 8
77

88
# Mixed Precision
99

10-
Phase 9 adds composability between the previously orphaned numeric pipelines and a software half-precision storage tier. The result is a small set of **pointwise converters** that live at layer boundaries, so a single network can run an int8 affine CNN frontend, hand off to a Q-format LSTM head, hand off again to an fp16 attention block, and project back to int8 for the classifier — every layer keeps the runtime cost of its own grid, the bridges only run once per tensor crossing.
10+
TinyMind composes its three numeric pipelines through a small set of **pointwise converters** that live at layer boundaries, plus a software half-precision storage tier. A single network can run an int8 affine CNN frontend, hand off to a Q-format LSTM head, hand off again to an fp16 attention block, and project back to int8 for the classifier — every layer keeps the runtime cost of its own grid, the bridges only run once per tensor crossing.
1111

1212
## Three pipelines, one model
1313

14-
Pre-Phase-9, TinyMind shipped three numeric pipelines that did not talk to each other:
15-
1614
| Pipeline | Storage | Where it lives | When it wins |
1715
|---|---|---|---|
1816
| `QValue` Q-format | int8 / int16 / int32 / int64 with a compile-time binary point | `cpp/qformat.hpp` + `cpp/neuralnet.hpp` | Trainable on-MCU, single global grid, no per-tensor metadata |
1917
| Float | `float` / `double` | Same templates, different `ValueType` | Host development, training |
20-
| Int8 affine | int8 weights + int8 activations + per-tensor `(scale, zero_point)` | `cpp/q*.hpp` family (Phase 1–8) | TFLite-shape inference, multi-grid (each tensor picks its own range) |
18+
| Int8 affine | int8 weights + int8 activations + per-tensor `(scale, zero_point)` | `cpp/q*.hpp` family | TFLite-shape inference, multi-grid (each tensor picks its own range) |
2119

22-
Phase 9 wires the three together. Phase 14's `simd_neon_fp16.hpp` later added vector specializations for fp16 storage; this page covers the storage tier and the converters.
20+
The qbridge converters tie the three together. The `simd_neon_fp16.hpp` backend adds vector specializations for fp16 storage on Arm hardware that supports it; this page covers the storage tier and the converters.
2321

2422
## qbridge.hpp — pointwise converters
2523

@@ -74,20 +72,20 @@ The `unit_test/embedded/Makefile` exercises this corner as `fp16_freestanding` (
7472

7573
## Mixed-precision exemplar — `mixed_precision_kws`
7674

77-
[`examples/mixed_precision_kws/`](https://github.com/danmcleran/tinymind/tree/master/examples/mixed_precision_kws) (Phase 16) wires the qbridge converters in production shape:
75+
[`examples/mixed_precision_kws/`](https://github.com/danmcleran/tinymind/tree/master/examples/mixed_precision_kws) wires the qbridge converters in production shape:
7876

7977
```
8078
input [S=8][E=8] float
8179
----[ int8 frontend ]----------------------------
8280
QDense E -> E (one call per sequence step)
8381
qrelu -> [S][E] int8
84-
----[ Phase 9 bridge: affineI8 -> fp16 ]---------
82+
----[ qbridge: affineI8 -> fp16 ]----------------
8583
-> [S][E] fp16
8684
----[ fp16 attention head ]----------------------
8785
Linear (ReLU-kernel) self-attention with residual
8886
skip from the post-relu feature buffer, then
8987
mean-pool over S -> [E] fp16
90-
----[ Phase 9 bridge: fp16 -> affineI8 ]---------
88+
----[ qbridge: fp16 -> affineI8 ]----------------
9189
-> [E] int8
9290
----[ int8 classifier ]--------------------------
9391
QDense E -> NUM_CLASSES -> [NUM_CLASSES] int8 logits
@@ -105,7 +103,7 @@ The precision-tier pattern — int8 front + classifier bracketing an fp16 head
105103

106104
- **Not QAT.** Mixed precision is a deployment story, not a training story.
107105
- **Not fp16 arithmetic.** The library treats fp16 as a storage tier; inner arithmetic promotes to float. The vector fp16 ISA gates (`SIMD_NEON_FP16`, AVX-512 fp16) get there on hardware that supports it, but the library does not synthesize fp16 software arithmetic.
108-
- **Not int4.** Storage is int8 / int16 / int32 / fp16 / bf16 / float / double. Sub-byte storage is a non-goal of this phase.
106+
- **Not int4.** Storage is int8 / int16 / int32 / fp16 / bf16 / float / double. Sub-byte storage is out of scope.
109107

110108
## See Also
111109

docs/architectures/self-attention.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ The Q8.8 small config is real-time capable at 100-1000 Hz sensor rates on M0. Th
274274

275275
## Int8 Quantized Counterpart
276276

277-
Phase 13 ships pure-integer int8 attention alongside `SelfAttention1D`:
277+
TinyMind ships pure-integer int8 attention alongside `SelfAttention1D`:
278278

279279
- `QAttention1D` — int8 linear (ReLU-kernel) attention. Same shape and math; ReLU on Q'/K' folded into the requantizer by raising `qmin = zero_point`. Caller-owned weight, bias, and scratch buffers.
280280
- `QAttentionSoftmax1D` — standard softmax attention. Score requantizer folds the `1 / sqrt(d_k)` factor via `qAttentionInvSqrt(P)`; softmax uses the same 256-entry int32 exp LUT as `QSoftmax1D`.

docs/architectures/simd-backends.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ nav_order: 7
77

88
# SIMD Backends
99

10-
Phase 14 wires ISA-capability-gated SIMD specializations into the inner reduction loop of the int8 affine layer family (`QDense`, `QConv2D`, `QConv2DPerChannel`). The library never sniffs the CPU. Every backend lives behind a `TINYMIND_ENABLE_SIMD_*` preprocessor gate, every gate defaults to `0`, and with all gates off the layer bodies fall back to a scalar dispatch that emits **byte-identical** output to the pre-Phase-14 build.
10+
TinyMind ships ISA-capability-gated SIMD specializations in the inner reduction loop of the int8 affine layer family (`QDense`, `QConv2D`, `QConv2DPerChannel`). The library never sniffs the CPU. Every backend lives behind a `TINYMIND_ENABLE_SIMD_*` preprocessor gate, every gate defaults to `0`, and with all gates off the layer bodies fall back to a scalar dispatch that emits **byte-identical** output to the scalar reference.
1111

1212
## Design rules
1313

@@ -59,7 +59,7 @@ The public entry point is `tinymind::simd::int8DotWithZeroPoint` in [`cpp/includ
5959

6060
## Bit-exactness invariant — why it matters
6161

62-
The integer SIMD backends produce byte-identical output to the scalar reference for any input. The Phase 16 integration suite (`unit_test/integration/`) leans on this: each exemplar's `make golden` mode emits an int8 byte stream, and the integration test asserts that stream matches a baked-in expected string. Because the inference path is deterministic and the SIMD backends are bit-exact, the same expected string passes regardless of which gate combination the example binary was built with. Any silent drift in `qaffine.hpp`, `qcalibration.hpp`, or any SIMD specialization that claims bit-exactness trips the test.
62+
The integer SIMD backends produce byte-identical output to the scalar reference for any input. The integration suite (`unit_test/integration/`) leans on this: each exemplar's `make golden` mode emits an int8 byte stream, and the integration test asserts that stream matches a baked-in expected string. Because the inference path is deterministic and the SIMD backends are bit-exact, the same expected string passes regardless of which gate combination the example binary was built with. Any silent drift in `qaffine.hpp`, `qcalibration.hpp`, or any SIMD specialization that claims bit-exactness trips the test.
6363

6464
The AVX2 backend deliberately avoids `PMADDUBSW`: that instruction saturates on the pair-sum step, which would break the bit-exactness guarantee on pathological inputs. AVX-VNNI and AVX-512-VNNI use the canonical uint8-shift trick so `VPDPBUSD` reduces a uint8 / int8 product exactly.
6565

@@ -96,11 +96,11 @@ Run the resulting binary on the target hardware (or under `qemu-aarch64` for cor
9696

9797
## What about non-int8 layers?
9898

99-
Phase 14 specializes the int8 affine layer family because that is where the integer dot product wins big. The Q-format pipeline (`QValue<Q, F, signed>`) and float pipeline rely on compiler auto-vectorization with `-O3 -march=native` — no library-side specialization. The `SIMD_NEON_FP16` and `SIMD_HELIUM_MVE_F` float gates land via `cpp/include/simd/simd_neon_fp16.hpp`, used by the mixed-precision exemplar.
99+
TinyMind specializes the int8 affine layer family because that is where the integer dot product wins big. The Q-format pipeline (`QValue<Q, F, signed>`) and float pipeline rely on compiler auto-vectorization with `-O3 -march=native` — no library-side specialization. The `SIMD_NEON_FP16` and `SIMD_HELIUM_MVE_F` float gates land via `cpp/include/simd/simd_neon_fp16.hpp`, used by the mixed-precision exemplar.
100100

101101
## See Also
102102

103103
- [Int8 Affine Quantization]({{ site.baseurl }}/architectures/int8-quantization) — the layer family these backends accelerate.
104-
- [Mixed Precision]({{ site.baseurl }}/architectures/mixed-precision) — Phase 9 qbridge + fp16 storage, the consumer of the float vector gates.
104+
- [Mixed Precision]({{ site.baseurl }}/architectures/mixed-precision) — qbridge + fp16 storage, the consumer of the float vector gates.
105105
- [`examples/perf_matrix/`](https://github.com/danmcleran/tinymind/tree/master/examples/perf_matrix) — bench source.
106106
- [`cpp/include/simd/`](https://github.com/danmcleran/tinymind/tree/master/cpp/include/simd) — backend headers (one per capability).

docs/getting-started.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,6 @@ These tutorials walk through complete, working examples that demonstrate TinyMin
1717
| [Keyword Spotting CNN on a Cortex-M]({{ site.baseurl }}/getting-started/keyword-spotting-cnn) | Depthwise-separable 2D CNN, bench harness, MCU porting | ~19 KB static |
1818
| [Predictive Maintenance on AI4I 2020]({{ site.baseurl }}/getting-started/predictive-maintenance) | Q16.16 MLP, imbalanced binary classification, confusion matrix | ~35 KB static |
1919
| [PyTorch -> TinyMind int8 (XOR)]({{ site.baseurl }}/getting-started/pytorch-quant-xor) | End-to-end post-training int8 quantization: PyTorch float training, per-tensor calibration, pure-integer C++ inference | Tiny |
20-
| [PyTorch -> TinyMind int8 (importer)]({{ site.baseurl }}/getting-started/pytorch-importer) | Phase 15 production flow: `torch.state_dict` -> `tinymind_import.py` -> `weights.hpp`. `PercentileObserver` / `KLDivergenceObserver` / cross-layer equalization | Tiny |
20+
| [PyTorch -> TinyMind int8 (importer)]({{ site.baseurl }}/getting-started/pytorch-importer) | Production flow: `torch.state_dict` -> `tinymind_import.py` -> `weights.hpp`. `PercentileObserver` / `KLDivergenceObserver` / cross-layer equalization | Tiny |
2121
| [Keyword Spotting CNN (int8)]({{ site.baseurl }}/getting-started/keyword-spotting-int8) | int8 quantized depthwise-separable CNN, per-channel depthwise, CSV cycle/byte report vs float | ~5 KB static |
22-
| [MobileNetV2-shaped int8]({{ site.baseurl }}/getting-started/mobilenetv2-int8) | Phase 16 exemplar: stride-2 stem + inverted-residual blocks + GAP + dense, linear-bottleneck convention, golden-byte regression | Compact |
22+
| [MobileNetV2-shaped int8]({{ site.baseurl }}/getting-started/mobilenetv2-int8) | int8 exemplar: stride-2 stem + inverted-residual blocks + GAP + dense, linear-bottleneck convention, golden-byte regression | Compact |

docs/getting-started/mobilenetv2-int8.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ nav_order: 8
77

88
# MobileNetV2-shaped int8
99

10-
This tutorial walks the Phase 16 [`examples/mobilenetv2_int8/`](https://github.com/danmcleran/tinymind/tree/master/examples/mobilenetv2_int8) exemplar — a deterministic int8 MobileNetV2-shaped pipeline that exercises the inverted-residual block, linear bottlenecks, residual skips through `QAdd`, and the GAP + dense head. The build pattern in this file scales linearly to a full MobileNetV2-1.0 model (same block, 17× with the channel and stride schedule from the spec).
10+
This tutorial walks the [`examples/mobilenetv2_int8/`](https://github.com/danmcleran/tinymind/tree/master/examples/mobilenetv2_int8) exemplar — a deterministic int8 MobileNetV2-shaped pipeline that exercises the inverted-residual block, linear bottlenecks, residual skips through `QAdd`, and the GAP + dense head. The build pattern in this file scales linearly to a full MobileNetV2-1.0 model (same block, 17× with the channel and stride schedule from the spec).
1111

12-
It is also the first exemplar that ships a `make golden` mode — the int8 logit byte stream is locked by the `unit_test/integration/` Boost.Test suite, regardless of which Phase 14 SIMD backend the build resolves to.
12+
The exemplar ships a `make golden` mode — the int8 logit byte stream is locked by the `unit_test/integration/` Boost.Test suite, regardless of which SIMD backend the build resolves to.
1313

1414
## Pipeline (NHWC)
1515

@@ -89,7 +89,7 @@ make golden # int8 logits for the bundled 4-sample test set
8989

9090
`make run` prints per-tensor affine params and the worst max-abs error vs the float reference; the bundled dataset passes within 50% of the logits range.
9191

92-
`make golden` writes a stable text dump of the int8 logit bytes that the integration suite asserts byte-for-byte. Because Phase 14's bit-exactness guarantee holds for every enabled SIMD backend, the same expected string passes regardless of which gate combination the example binary was built with.
92+
`make golden` writes a stable text dump of the int8 logit bytes that the integration suite asserts byte-for-byte. Because the SIMD backends' bit-exactness guarantee holds for every enabled backend, the same expected string passes regardless of which gate combination the example binary was built with.
9393

9494
## What the integration suite catches
9595

docs/getting-started/pytorch-importer.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ nav_order: 7
77

88
# PyTorch → TinyMind int8 (production importer)
99

10-
This tutorial walks the **Phase 15 importer flow**: take a trained PyTorch model, pull weights from `torch.state_dict`, run per-layer calibration with any of `MinMaxObserver` / `PercentileObserver` / `KLDivergenceObserver`, optionally apply Cross-Layer Equalization to recover accuracy on imbalanced layers, and emit a TinyMind-format `weights.hpp` that snaps straight into the int8 `Q*` layer family.
10+
This tutorial walks the **production PyTorch importer flow**: take a trained PyTorch model, pull weights from `torch.state_dict`, run per-layer calibration with any of `MinMaxObserver` / `PercentileObserver` / `KLDivergenceObserver`, optionally apply Cross-Layer Equalization to recover accuracy on imbalanced layers, and emit a TinyMind-format `weights.hpp` that snaps straight into the int8 `Q*` layer family.
1111

1212
It's the heavier-lift counterpart to [PyTorch → TinyMind int8 (XOR)]({{ site.baseurl }}/getting-started/pytorch-quant-xor). Same destination — pure-integer C++ inference — but instead of hand-rolling the calibration loop for one tiny network, you describe each layer once and the importer handles range estimation, Conv+BN fusion, weight quantization, and header emission.
1313

@@ -88,7 +88,7 @@ The three observers cover different activation shapes:
8888
| `PercentileObserver(lo, hi)` | Heavy-tail activations (post-conv with large receptive field, pre-softmax logits). `(0.05, 99.95)` clips the worst ~0.1% so the int8 grid is not wasted on a handful of extreme samples |
8989
| `KLDivergenceObserver` | When percentile clipping is too crude. TensorRT-style: fix a 2048-bin histogram width, fill it, sweep threshold T in `[128, 2048]` to minimize KL between the clipped float distribution and its int8-quantized form. Heaviest but highest fidelity |
9090

91-
Match the observer to each tensor's empirical shape; the importer does not try to auto-pick. The Phase 15 [`examples/import_demo/`](https://github.com/danmcleran/tinymind/tree/master/examples/import_demo) C++ binary exercises all three on a deterministic 3-8-4-2 MLP so the calibration math is easy to inspect side by side.
91+
Match the observer to each tensor's empirical shape; the importer does not try to auto-pick. The [`examples/import_demo/`](https://github.com/danmcleran/tinymind/tree/master/examples/import_demo) C++ binary exercises all three on a deterministic 3-8-4-2 MLP so the calibration math is easy to inspect side by side.
9292

9393
## Cross-Layer Equalization
9494

0 commit comments

Comments
 (0)