You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"source": "import os\nos.environ['KERAS_BACKEND'] = 'tensorflow'\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport sys\nsys.path.append('..')\nimport plotting\nimport hls4ml\nfrom sklearn.metrics import accuracy_score\n\n%matplotlib inline\n\n# Load the data\nX_test = np.ascontiguousarray(np.load('../data/jet-tagging/X_test.npy'), dtype=np.float32)\ny_test = np.load('../data/jet-tagging/y_test.npy')\nclasses = np.load('../data/jet-tagging/classes.npy', allow_pickle=True)"
52
34
},
53
35
{
54
36
"cell_type": "markdown",
@@ -103,19 +85,7 @@
103
85
"cell_type": "markdown",
104
86
"id": "cell-5",
105
87
"metadata": {},
106
-
"source": [
107
-
"## What is the ReuseFactor?\n",
108
-
"\n",
109
-
"In the default (`ReuseFactor = 1`) configuration, hls4ml instantiates one multiplier for every weight in the network. All multiplications for a given layer happen in a single clock cycle, giving the minimum possible latency — but using the most multipliers.\n",
110
-
"\n",
111
-
"Setting `ReuseFactor = N` tells hls4ml to time-multiplex the same multiplier hardware across `N` weight-input pairs. This means the layer takes `N` clock cycles to compute instead of one, but uses roughly `1/N` as many multipliers.\n",
112
-
"\n",
113
-
"\n",
114
-
"\n",
115
-
"The reuse factor must evenly divide the number of weights in each layer. For example, the first layer has `16 × 64 = 1024` weights, so valid reuse factors include 1, 2, 4, 8, 16, 32, 64, etc.\n",
116
-
"\n",
117
-
"Changing the reuse factor does **not** change the model accuracy — the same arithmetic is performed, just spread over more clock cycles. We will verify this below."
118
-
]
88
+
"source": "## What is the ReuseFactor?\n\nIn the default (`ReuseFactor = 1`) configuration, hls4ml instantiates one multiplier for every weight in the network. All multiplications for a given layer happen in a single clock cycle, giving the minimum possible latency — but using the most multipliers.\n\nSetting `ReuseFactor = N` tells hls4ml to time-multiplex the same multiplier hardware across `N` weight-input pairs. This means the layer takes `N` clock cycles to compute instead of one, but uses roughly `1/N` as many multipliers.\n\n\n\nThe reuse factor must evenly divide the number of weights in each layer. For example, the first layer has `16 × 64 = 1024` weights, so valid reuse factors include 1, 2, 4, 8, 16, 32, 64, etc.\n\nChanging the reuse factor does **not** change the model accuracy — the same arithmetic is performed, just spread over more clock cycles. We will verify this below."
Copy file name to clipboardExpand all lines: 4_advanced_models/4a_qkeras_cnn_svhn.ipynb
+3-26Lines changed: 3 additions & 26 deletions
Original file line number
Diff line number
Diff line change
@@ -4,17 +4,7 @@
4
4
"cell_type": "markdown",
5
5
"id": "4a-0",
6
6
"metadata": {},
7
-
"source": [
8
-
"# Part 4a: Convolutional Neural Networks with QKeras on the SVHN dataset\n",
9
-
"\n",
10
-
"In this notebook we train a quantized convolutional neural network (CNN) on the [Street View House Numbers (SVHN)](http://ufldl.stanford.edu/housenumbers/) dataset and deploy it with hls4ml.\n",
11
-
"\n",
12
-
"The SVHN dataset consists of real-world images of house numbers extracted from Google Street View, cropped to 32×32 RGB pixels. Unlike MNIST it is a harder, more realistic problem: images can contain more than one digit, and the centre digit defines the label. Each image belongs to one of 10 classes (digits 0–9).\n",
13
-
"\n",
14
-
"\n",
15
-
"\n",
16
-
"The dataset has 73,257 training images and 26,032 test images."
17
-
]
7
+
"source": "# Part 4a: Convolutional Neural Networks with QKeras on the SVHN dataset\n\nIn this notebook we train a quantized convolutional neural network (CNN) on the [Street View House Numbers (SVHN)](http://ufldl.stanford.edu/housenumbers/) dataset and deploy it with hls4ml.\n\nThe SVHN dataset consists of real-world images of house numbers extracted from Google Street View, cropped to 32×32 RGB pixels. Unlike MNIST it is a harder, more realistic problem: images can contain more than one digit, and the centre digit defines the label. Each image belongs to one of 10 classes (digits 0–9).\n\n\n\nThe dataset has 73,257 training images and 26,032 test images."
18
8
},
19
9
{
20
10
"cell_type": "code",
@@ -276,20 +266,7 @@
276
266
"cell_type": "markdown",
277
267
"id": "4a-13",
278
268
"metadata": {},
279
-
"source": [
280
-
"## Convolutions in hls4ml\n",
281
-
"\n",
282
-
"hls4ml supports two I/O modes for neural networks:\n",
283
-
"\n",
284
-
"- **`io_parallel`**: All inputs arrive simultaneously. Suitable for small models, when all activations fit into registers.\n",
285
-
"- **`io_stream`**: Data flows through the network one element at a time via FIFO buffers. Required for larger CNNs, when the full feature maps are too large to hold in registers. Shift registers maintain a sliding window of `kernel_height − 1` rows, feeding the convolution kernel one pixel at a time.\n",
286
-
"\n",
287
-
"See the [hls4ml documentation](https://fastmachinelearning.org/hls4ml/concepts.html) for more details.\n",
"**Note on softmax precision:** using `auto` precision for the output of the last dense layer can produce accumulators wider than the softmax look-up tables can handle. We cap this manually with `fixed<16,6,RND,SAT>`."
292
-
]
269
+
"source": "## Convolutions in hls4ml\n\nhls4ml supports two I/O modes for neural networks:\n\n- **`io_parallel`**: All inputs arrive simultaneously. Suitable for small models, when all activations fit into registers.\n- **`io_stream`**: Data flows through the network one element at a time via FIFO buffers. Required for larger CNNs, when the full feature maps are too large to hold in registers. Shift registers maintain a sliding window of `kernel_height − 1` rows, feeding the convolution kernel one pixel at a time.\n\nSee the [hls4ml documentation](https://fastmachinelearning.org/hls4ml/concepts.html) for more details.\n\n\n\n**Note on softmax precision:** using `auto` precision for the output of the last dense layer can produce accumulators wider than the softmax look-up tables can handle. We cap this manually with `fixed<16,6,RND,SAT>`."
"Now we run `build` again, running HLS Synthesis, Logic Synthesis and Place & Route, finally producing a bitfile and an archive of files that we'll need to run inference on the pynq-z2 board. \n",
492
-
"\n",
493
-
"**This step takes around 20 minutes.**\n",
494
-
"\n",
495
-
"The floorplan of the bitfile should like something like this, where the individual tree modules are highlighted in different colours:\n",
"source": "## Build the model\n\nNow we run `build` again, running HLS Synthesis, Logic Synthesis and Place & Route, finally producing a bitfile and an archive of files that we'll need to run inference on the pynq-z2 board. \n\n**This step takes around 20 minutes.**\n\nThe floorplan of the bitfile should like something like this, where the individual tree modules are highlighted in different colours:\n\n<img src=\"../images/part6a_bdt_floorplan.png\" width=\"300\" />"
0 commit comments