Skip to content

Commit e0eda43

Browse files
committed
Move BDT to more models. SR to follow next.
1 parent 154132b commit e0eda43

1 file changed

Lines changed: 98 additions & 73 deletions

File tree

Lines changed: 98 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,20 @@
66
"source": [
77
"<img src=\"https://github.com/thesps/conifer/blob/master/conifer_v1.png?raw=true\" width=\"250\" alt=\"conifer\" />\n",
88
"\n",
9-
"In this notebook we will take the first steps with training a BDT with `xgboost`, then translating it to HLS code for FPGA with `conifer`\n",
9+
"In this notebook we will take the first steps with training a boosted decision tree (BDT) with `xgboost`, then translating it to HLS code for FPGA inference with `conifer`.\n",
1010
"\n",
11-
"Key concepts:\n",
12-
"- model training\n",
13-
"- model evaluation\n",
14-
"- `conifer` configuration and conversion\n",
15-
"- model emulation\n",
16-
"- model synthesis\n",
17-
"- accelerator creation\n",
11+
"## What is a Boosted Decision Tree?\n",
1812
"\n",
19-
"For some use cases, the Forest Processing Unit might be an easier entry point as no FPGA synthesis is required for supported boards. Read more about the FPU here: https://ssummers.web.cern.ch/conifer/fpu.html"
13+
"A Boosted Decision Tree (BDT) is an ensemble learning method that builds a strong classifier by combining many shallow decision trees. Each tree is trained to correct the residual errors of the previous ones. `XGBoost` is a particularly efficient and widely used gradient boosting framework that adds regularisation and second-order gradient information to improve generalisation and training speed. BDTs are popular in high-energy physics because they train quickly, are interpretable, and are often competitive with deep neural networks on tabular data. Their tree-structured computation also maps naturally to FPGA hardware: each tree can be evaluated in parallel, making BDTs well-suited for low-latency trigger and online inference applications.\n",
14+
"\n",
15+
"## Key notebook parts\n",
16+
"\n",
17+
"- **Model training**: train a multi-class `XGBClassifier` on the jet tagging dataset and compare its accuracy to the Keras/PyTorch baseline from Part 1\n",
18+
"- **Model evaluation**: measure classification performance using ROC and accuracy\n",
19+
"- **`conifer` configuration and conversion**: configure the `xilinxhls` backend and convert the trained XGBoost model into `conifer`'s intermediate representation, which generates synthesisable HLS C++ code\n",
20+
"- **Model emulation**: compile the generated HLS C++ on the CPU and run bit-accurate predictions to verify conversion correctness and numerical precision before FPGA synthesis\n",
21+
"- **Model synthesis**: run Vitis HLS C Synthesis followed by Vivado RTL synthesis\n",
22+
"- **Accelerator creation**: configure a board-specific deployment target and build a complete bitfile for a `pynq-z2` board, ready for on-device inference\n"
2023
]
2124
},
2225
{
@@ -27,30 +30,38 @@
2730
"source": [
2831
"import xgboost as xgb\n",
2932
"import matplotlib.pyplot as plt\n",
33+
"import sys\n",
34+
"sys.path.append('..')\n",
3035
"import plotting\n",
3136
"import numpy as np\n",
3237
"from scipy.special import softmax\n",
3338
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n",
3439
"import conifer\n",
3540
"import json\n",
3641
"import os\n",
37-
"import sys\n",
38-
"\n",
39-
"os.environ['PATH'] = os.environ['XILINX_VITIS'] + '/bin:' + os.environ['PATH']\n",
4042
"\n",
41-
"# enable more output from conifer\n",
43+
"# Enable more outputs from conifer\n",
4244
"import logging\n",
4345
"\n",
4446
"logging.basicConfig(stream=sys.stdout, level=logging.WARNING)\n",
4547
"logger = logging.getLogger('conifer')\n",
4648
"logger.setLevel('DEBUG')\n",
4749
"\n",
48-
"# create a random seed at we use to make the results repeatable\n",
50+
"# Create a random seed at we use to make the results repeatable\n",
4951
"seed = int('hls4ml-tutorial'.encode('utf-8').hex(), 16) % 2**31\n",
5052
"\n",
5153
"print(f'Using conifer version {conifer.__version__}')"
5254
]
5355
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": null,
59+
"metadata": {},
60+
"outputs": [],
61+
"source": [
62+
"MODEL_TYPE = 'keras' # set to 'pytorch' if you used the PyTorch notebook in Part 1"
63+
]
64+
},
5465
{
5566
"cell_type": "markdown",
5667
"metadata": {},
@@ -59,7 +70,7 @@
5970
"\n",
6071
"Load the jet tagging dataset.\n",
6172
"\n",
62-
"**Note**: you need to run part1 first."
73+
"**Note**: you need to run part 1 first to generate the dataset files."
6374
]
6475
},
6576
{
@@ -68,11 +79,11 @@
6879
"metadata": {},
6980
"outputs": [],
7081
"source": [
71-
"X_train_val = np.load('X_train_val.npy')\n",
72-
"X_test = np.load('X_test.npy')\n",
73-
"y_train_val_one_hot = np.load('y_train_val.npy')\n",
74-
"y_test_one_hot = np.load('y_test.npy')\n",
75-
"classes = np.load('classes.npy', allow_pickle=True)"
82+
"X_train_val = np.load('../data/X_train_val.npy')\n",
83+
"X_test = np.load('../data/X_test.npy')\n",
84+
"y_train_val_one_hot = np.load('../data/y_train_val.npy')\n",
85+
"y_test_one_hot = np.load('../data/y_test.npy')\n",
86+
"classes = np.load('../data/classes.npy', allow_pickle=True)"
7687
]
7788
},
7889
{
@@ -131,32 +142,50 @@
131142
"outputs": [],
132143
"source": [
133144
"from sklearn.metrics import accuracy_score\n",
134-
"from tensorflow.keras.models import load_model\n",
135-
"\n",
136-
"# load the KERAS model from part 1\n",
137-
"model_ref = load_model('model_1/KERAS_check_best_model.h5')\n",
138-
"y_ref = model_ref.predict(X_test)\n",
139145
"\n",
140-
"# compute predictions of the xgboost model\n",
146+
"if MODEL_TYPE == 'keras':\n",
147+
" from tensorflow.keras.models import load_model\n",
148+
" model_ref = load_model('../models/keras_model_part1.h5')\n",
149+
" y_ref = model_ref.predict(X_test)\n",
150+
"\n",
151+
"elif MODEL_TYPE == 'pytorch':\n",
152+
" import torch\n",
153+
" import torch.nn as nn\n",
154+
"\n",
155+
" class JetTagger(nn.Module):\n",
156+
" def __init__(self):\n",
157+
" super().__init__()\n",
158+
" self.fc1 = nn.Linear(16, 64)\n",
159+
" self.fc2 = nn.Linear(64, 32)\n",
160+
" self.fc3 = nn.Linear(32, 32)\n",
161+
" self.output = nn.Linear(32, 5)\n",
162+
"\n",
163+
" def forward(self, x):\n",
164+
" x = torch.relu(self.fc1(x))\n",
165+
" x = torch.relu(self.fc2(x))\n",
166+
" x = torch.relu(self.fc3(x))\n",
167+
" return torch.softmax(self.output(x), dim=1)\n",
168+
"\n",
169+
" model_ref = JetTagger()\n",
170+
" model_ref.load_state_dict(torch.load('../models/pytorch_weights_part1.pt'))\n",
171+
" model_ref.eval()\n",
172+
" with torch.no_grad():\n",
173+
" y_ref = model_ref(torch.FloatTensor(X_test)).numpy()\n",
174+
"\n",
175+
"# Compute predictions of the xgboost model\n",
141176
"y_xgb = clf.predict_proba(X_test)\n",
142-
"print(f'Accuracy baseline: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_ref, axis=1)):.5f}')\n",
177+
"print(f'Accuracy {MODEL_TYPE}: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_ref, axis=1)):.5f}')\n",
143178
"print(f'Accuracy xgboost: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_xgb, axis=1)):.5f}')\n",
144179
"\n",
145180
"fig, ax = plt.subplots(figsize=(9, 9))\n",
146181
"_ = plotting.makeRoc(y_test_one_hot, y_ref, classes, linestyle='--')\n",
147-
"plt.gca().set_prop_cycle(None) # reset the colors\n",
182+
"plt.gca().set_prop_cycle(None)\n",
148183
"_ = plotting.makeRoc(y_test_one_hot, y_xgb, classes, linestyle='-')\n",
149184
"\n",
150-
"# add a legend\n",
151185
"from matplotlib.lines import Line2D\n",
152-
"\n",
153-
"lines = [\n",
154-
" Line2D([0], [0], ls='--'),\n",
155-
" Line2D([0], [0], ls='-'),\n",
156-
"]\n",
157186
"from matplotlib.legend import Legend\n",
158-
"\n",
159-
"leg = Legend(ax, lines, labels=['part1 Keras', 'xgboost'], loc='lower right', frameon=False)\n",
187+
"leg = Legend(ax, [Line2D([0], [0], ls='--'), Line2D([0], [0], ls='-')],\n",
188+
" labels=[f'part1 {MODEL_TYPE}', 'xgboost'], loc='lower right', frameon=False)\n",
160189
"ax.add_artist(leg)"
161190
]
162191
},
@@ -170,7 +199,7 @@
170199
"\n",
171200
"We will print the configuration, modify it, and print it again. The modifications are:\n",
172201
"- set the `OutputDirectory` to something descriptive\n",
173-
"- set the `XilinxPart` to the part number of the FPGA on the Alveo U50"
202+
"- set the `XilinxPart` to the part number of the FPGA on the Alveo U250"
174203
]
175204
},
176205
{
@@ -181,16 +210,16 @@
181210
"source": [
182211
"cfg = conifer.backends.xilinxhls.auto_config()\n",
183212
"\n",
184-
"# print the config\n",
213+
"# Print the config\n",
185214
"print('Default Configuration\\n' + '-' * 50)\n",
186215
"plotting.print_dict(cfg)\n",
187216
"print('-' * 50)\n",
188217
"\n",
189-
"# modify the config\n",
190-
"cfg['OutputDir'] = 'model_5/'\n",
218+
"# Set output directory and target device\n",
219+
"cfg['OutputDir'] = '../hls4ml_prjs/conifer_prj_bdt_part6a'\n",
191220
"cfg['XilinxPart'] = 'xcu250-figd2104-2L-e'\n",
192221
"\n",
193-
"# print the config again\n",
222+
"# Print the config again (to verify change)\n",
194223
"print('Modified Configuration\\n' + '-' * 50)\n",
195224
"plotting.print_dict(cfg)\n",
196225
"print('-' * 50)"
@@ -220,14 +249,17 @@
220249
"metadata": {},
221250
"outputs": [],
222251
"source": [
223-
"# convert the model to the conifer representation\n",
252+
"# Convert the model to the conifer representation\n",
224253
"conifer_model = conifer.converters.convert_from_xgboost(clf, cfg)\n",
225-
"# print the help to see the API on the conifer_model\n",
254+
"\n",
255+
"# Print the help to see the API of the conifer_model\n",
226256
"help(conifer_model)\n",
227-
"# write the project (writing HLS project to disk)\n",
257+
"\n",
258+
"# Write the project (writing HLS project to disk)\n",
228259
"conifer_model.write()\n",
229-
"# save the conifer model - we can load this again later\n",
230-
"clf.save_model('model_5/xgboost_model.json')"
260+
"\n",
261+
"# Save the xgboost model alongside the conifer project\n",
262+
"clf.save_model('../hls4ml_prjs/conifer_prj_bdt_part6a/xgboost_model.json')"
231263
]
232264
},
233265
{
@@ -237,10 +269,10 @@
237269
"## Explore\n",
238270
"Browse the files in the newly created project directory to take a look at the HLS code.\n",
239271
"\n",
240-
"The output of `!tree model_5` is:\n",
272+
"The output of `!tree ../hls4ml_prjs/conifer_prj_bdt_part6a` is:\n",
241273
"\n",
242274
"```\n",
243-
"model_5/\n",
275+
"conifer_prj_bdt_part6a/\n",
244276
"├── bridge.cpp\n",
245277
"├── build_hls.tcl\n",
246278
"├── firmware\n",
@@ -306,29 +338,22 @@
306338
"source": [
307339
"y_hls_proba = softmax(y_hls) # compute class probabilities from the raw predictions\n",
308340
"\n",
309-
"print(f'Accuracy baseline: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_ref, axis=1)):.5f}')\n",
341+
"print(f'Accuracy {MODEL_TYPE}: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_ref, axis=1)):.5f}')\n",
310342
"print(f'Accuracy xgboost: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_xgb, axis=1)):.5f}')\n",
311343
"print(f'Accuracy conifer: {accuracy_score(np.argmax(y_test_one_hot, axis=1), np.argmax(y_hls_proba, axis=1)):.5f}')\n",
312344
"\n",
313-
"\n",
314345
"fig, ax = plt.subplots(figsize=(9, 9))\n",
315346
"_ = plotting.makeRoc(y_test_one_hot, y_ref, classes, linestyle='--')\n",
316-
"plt.gca().set_prop_cycle(None) # reset the colors\n",
347+
"plt.gca().set_prop_cycle(None)\n",
317348
"_ = plotting.makeRoc(y_test_one_hot, y_xgb, classes, linestyle=':')\n",
318-
"plt.gca().set_prop_cycle(None) # reset the colors\n",
349+
"plt.gca().set_prop_cycle(None)\n",
319350
"_ = plotting.makeRoc(y_test_one_hot, y_hls_proba, classes, linestyle='-')\n",
320351
"\n",
321-
"# add a legend\n",
322352
"from matplotlib.lines import Line2D\n",
323-
"\n",
324-
"lines = [\n",
325-
" Line2D([0], [0], ls='--'),\n",
326-
" Line2D([0], [0], ls=':'),\n",
327-
" Line2D([0], [0], ls='-'),\n",
328-
"]\n",
329353
"from matplotlib.legend import Legend\n",
330-
"\n",
331-
"leg = Legend(ax, lines, labels=['part1 Keras', 'xgboost', 'conifer'], loc='lower right', frameon=False)\n",
354+
"leg = Legend(ax,\n",
355+
" [Line2D([0], [0], ls='--'), Line2D([0], [0], ls=':'), Line2D([0], [0], ls='-')],\n",
356+
" labels=[f'part1 {MODEL_TYPE}', 'xgboost', 'conifer'], loc='lower right', frameon=False)\n",
332357
"ax.add_artist(leg)"
333358
]
334359
},
@@ -337,11 +362,11 @@
337362
"metadata": {},
338363
"source": [
339364
"## Build\n",
340-
"Now we'll run the Vitis HLS and Vivado synthesis. HLS C Synthesis compiles our C++ to RTL, performing scheduling and resource mapping. Vivado synthesis synthesizes the RTL from the previous step into a netlist, and produces a more realistic resource estimation. The latency can't change during Vivado synthesis, it's fixed in the RTL description.\n",
365+
"Now we'll run the Vitis HLS and Vivado synthesis. HLS C Synthesis compiles our C++ to RTL, performing scheduling and resource mapping. Vivado synthesis synthesizes the RTL from the previous step into a netlist, and produces a more realistic resource estimation. \n",
341366
"\n",
342367
"After the build completes we can also browse the new log files and reports that are generated.\n",
343368
"\n",
344-
"**Warning**: this step might take around 10 minutes"
369+
"**This step takes around 10 minutes.**"
345370
]
346371
},
347372
{
@@ -397,7 +422,7 @@
397422
"outputs": [],
398423
"source": [
399424
"pynq_model_cfg = conifer.backends.xilinxhls.auto_config()\n",
400-
"pynq_model_cfg['OutputDir'] = 'model_5_pynq' # choose a new project directory\n",
425+
"pynq_model_cfg['OutputDir'] = '../hls4ml_prjs/conifer_prj_bdt_part6a_pynq'\n",
401426
"pynq_model_cfg['ProjectName'] = 'conifer_jettag'\n",
402427
"pynq_model_cfg['AcceleratorConfig'] = {\n",
403428
" 'Board': 'pynq-z2', # choose a pynq-z2 board\n",
@@ -444,7 +469,7 @@
444469
"source": [
445470
"### Load the model\n",
446471
"\n",
447-
"We load the JSON for the conifer model we previously used, applying the new configuration just defined. We'll see that the FPGA part specified by the board overrides the `XilinxPart` specified in the default."
472+
"We load the JSON for the conifer model we previously saved, applying the new configuration just defined. We'll see that the FPGA part specified by the board overrides the `XilinxPart` specified in the default."
448473
]
449474
},
450475
{
@@ -453,7 +478,7 @@
453478
"metadata": {},
454479
"outputs": [],
455480
"source": [
456-
"pynq_model = conifer.model.load_model('model_5/my_prj.json', new_config=pynq_model_cfg)\n",
481+
"pynq_model = conifer.model.load_model('../hls4ml_prjs/conifer_prj_bdt_part6a/my_prj.json', new_config=pynq_model_cfg)\n",
457482
"pynq_model.write()"
458483
]
459484
},
@@ -465,11 +490,11 @@
465490
"\n",
466491
"Now we run `build` again, running HLS Synthesis, Logic Synthesis and Place & Route, finally producing a bitfile and an archive of files that we'll need to run inference on the pynq-z2 board. \n",
467492
"\n",
468-
"**Warning**: this step might take around 20 minutes to complete.\n",
493+
"**This step takes around 20 minutes.**\n",
469494
"\n",
470495
"The floorplan of the bitfile should like something like this, where the individual tree modules are highlighted in different colours:\n",
471496
"\n",
472-
"<img src=\"./images/part5_floorplan.png\" width=\"300\" />"
497+
"<img src=\"../images/part5_floorplan.png\" width=\"300\" />"
473498
]
474499
},
475500
{
@@ -488,9 +513,9 @@
488513
"## Inference on pynq-z2\n",
489514
"\n",
490515
"Running inference on the `pynq-z2` would look like this:\n",
491-
"- download the `model_5/conifer_jettag.zip` archive from this notebook\n",
492-
"- upload `conifer_jettag.zip` to the pynq-z2 device and unzip it\n",
493-
"- start a jupyter notebook on the `pynq-z2` and run the following code:\n",
516+
"- Download the `conifer_bdt_pynq/conifer_jettag.zip` archive from this notebook\n",
517+
"- Upload `conifer_jettag.zip` to the pynq-z2 device and unzip it\n",
518+
"- Start a jupyter notebook on the `pynq-z2` and run the following code:\n",
494519
"\n",
495520
"```\n",
496521
"import conifer\n",
@@ -503,7 +528,7 @@
503528
],
504529
"metadata": {
505530
"kernelspec": {
506-
"display_name": "Python 3 (ipykernel)",
531+
"display_name": "hls4ml-tutorial",
507532
"language": "python",
508533
"name": "python3"
509534
},
@@ -517,7 +542,7 @@
517542
"name": "python",
518543
"nbconvert_exporter": "python",
519544
"pygments_lexer": "ipython3",
520-
"version": "3.10.10"
545+
"version": "3.10.16"
521546
}
522547
},
523548
"nbformat": 4,

0 commit comments

Comments
 (0)