|
109 | 109 | "\n", |
110 | 110 | "With `auto` precision:\n", |
111 | 111 | "- `weight` and `bias` default to the global model precision (`ap_fixed<16,6>`).\n", |
112 | | - "- `result` and `accum` are computed to be wide enough to hold the worst-case accumulation without overflow — these are often much wider than 16 bits. While ensuring no accuracy loss, it may come at the expense of resources.\n", |
| 112 | + "- `result` and `accum` are computed to be wide enough to hold the worst-case accumulation without overflow \u2014 these are often much wider than 16 bits. While ensuring no accuracy loss, it may come at the expense of resources.\n", |
113 | 113 | "\n", |
114 | 114 | "This configuration is a useful starting point for manual tuning -- you can inspect the profiling plots below to decide which layers can safely use narrower types and accordingly update the config." |
115 | 115 | ] |
|
141 | 141 | "The `hls4ml.model.profiling.numerical` function plots the distribution of weights and biases as a box-and-whisker chart. The **grey boxes** show the range representable with the data types set in the hls4ml config.\n", |
142 | 142 | "\n", |
143 | 143 | "The rule of thumb:\n", |
144 | | - "- The grey box should cover the full whisker **to the right** (large values) — otherwise weights saturate or wrap around.\n", |
| 144 | + "- The grey box should cover the full whisker **to the right** (large values) \u2014 otherwise weights saturate or wrap around.\n", |
145 | 145 | "- It is acceptable for the box not to reach the left whisker (small values): those weights are simply rounded to zero, which is *often* harmless.\n", |
146 | 146 | "\n", |
147 | 147 | "Providing data (here the first 1000 test samples for speed) also shows the same distributions at the **output of each layer**, which reveals whether the activation dynamic range is well-matched to the fixed-point type.\n", |
|
182 | 182 | "source": [ |
183 | 183 | "## Customise precision\n", |
184 | 184 | "\n", |
185 | | - "After inspecting the profiling plot, let's try narrowing the weight precision of `fc1` from 16 bits to 8 bits (`ap_fixed<8,2>` — 8 total bits, 2 integer bits). This reduces the multiplier width and can save significant LUT and DSP resources.\n", |
| 185 | + "After inspecting the profiling plot, let's try narrowing the weight precision of `fc1` from 16 bits to 8 bits (`ap_fixed<8,2>` \u2014 8 total bits, 2 integer bits). This reduces the multiplier width and can save significant LUT and DSP resources.\n", |
186 | 186 | "\n", |
187 | | - "**Note on the output layer:** Using `auto` precision can produce an accumulator at the output of the last fully-connected layer that is wider than the softmax look-up tables can handle. We therefore manually cap it with `fixed<16,6,RND,SAT>`, which also enables rounding and saturation — important when narrowing any type that feeds into a non-linear function." |
| 187 | + "**Note on the output layer:** Using `auto` precision can produce an accumulator at the output of the last fully-connected layer that is wider than the softmax look-up tables can handle. We therefore manually cap it with `fixed<16,6,RND,SAT>`, which also enables rounding and saturation \u2014 important when narrowing any type that feeds into a non-linear function." |
188 | 188 | ] |
189 | 189 | }, |
190 | 190 | { |
|
254 | 254 | "source": [ |
255 | 255 | "## Compile, trace, predict\n", |
256 | 256 | "\n", |
257 | | - "Compile the hls4ml model and call `hls_model.trace` instead of `hls_model.predict`. This returns both the final predictions **and** a dictionary of intermediate layer outputs — one array per layer, keyed by layer name.\n", |
| 257 | + "Compile the hls4ml model and call `hls_model.trace` instead of `hls_model.predict`. This returns both the final predictions **and** a dictionary of intermediate layer outputs \u2014 one array per layer, keyed by layer name.\n", |
258 | 258 | "\n", |
259 | 259 | "We collect the same dictionary from the original model for comparison. We only trace the first 1000 samples since tracing is slower than a plain forward pass." |
260 | 260 | ] |
|
303 | 303 | "source": [ |
304 | 304 | "## Inspect\n", |
305 | 305 | "\n", |
306 | | - "We can now print, plot, or otherwise compare the output of each layer between the original model and the hls4ml fixed-point emulation. This makes it easy to spot which layer first deviates — a sign that the precision there is too narrow.\n", |
| 306 | + "We can now print, plot, or otherwise compare the output of each layer between the original model and the hls4ml fixed-point emulation. This makes it easy to spot which layer first deviates \u2014 a sign that the precision there is too narrow.\n", |
307 | 307 | "\n", |
308 | 308 | "Let's print the first-layer output for the very first test sample." |
309 | 309 | ] |
|
353 | 353 | "leg = Legend(ax, lines, labels=[MODEL_TYPE, 'hls4ml (8-bit fc1)'], loc='lower right', frameon=False)\n", |
354 | 354 | "ax.add_artist(leg)" |
355 | 355 | ] |
| 356 | + }, |
| 357 | + { |
| 358 | + "cell_type": "markdown", |
| 359 | + "id": "6afa959e", |
| 360 | + "metadata": {}, |
| 361 | + "source": [ |
| 362 | + "## Further reading\n", |
| 363 | + "\n", |
| 364 | + "For more details, see: Schulte, Ramhorst, Sun et al., \"hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware\", ACM Trans. Reconfigurable Technol. Syst. (2026), [doi:10.1145/3801979](https://dl.acm.org/doi/abs/10.1145/3801979)" |
| 365 | + ] |
356 | 366 | } |
357 | 367 | ], |
358 | 368 | "metadata": { |
|
0 commit comments