Skip to content

Commit fa6a8cc

Browse files
author
makoeppel
committed
create qonnx and HGQ page, update FPGA resources
1 parent 6e7f999 commit fa6a8cc

5 files changed

Lines changed: 118 additions & 172 deletions

File tree

content/inference/hls4ml.md

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,56 @@
11
# Direct inference with hls4ml
22

3-
<p align="center">
4-
<img src="../images/inference/hls4ml_logo.png" alt="drawing" width="50%" />
5-
</p>
63

4+
![image](https://github.com/fastmachinelearning/fastmachinelearning.github.io/raw/master/images/hls4ml_logo.svg)
5+
6+
[![Documentation Status](https://github.com/fastmachinelearning/hls4ml/actions/workflows/build-sphinx.yml/badge.svg)](https://fastmachinelearning.org/hls4ml)
7+
[![PyPI version](https://badge.fury.io/py/hls4ml.svg)](https://badge.fury.io/py/hls4ml)
78

89
[hls4ml](https://fastmachinelearning.org/hls4ml/) is a Python package developed by the [Fast Machine Learning Lab](https://fastmachinelearning.org/). It's primary purpose is to create firmware implementations of machine learning (ML) models to be run on FPGAs. The package interfaces with a high-level synthesis (HLS) backend (i.e. Xilinx Vivado HLS) to transpile the ML model into hardware description language (HDL). The primary hls4ml documentation, including API reference pages, is located [here](https://fastmachinelearning.org/hls4ml/).
910

1011
<p align="center">
1112
<img src="../images/inference/hls4ml_overview.jpg" alt="drawing" width="75%" />
1213
</p>
1314

15+
### Current tools supported
16+
17+
| ML framework/HLS backend | (Q)Keras | PyTorch | (Q)ONNX | Vivado HLS | Intel HLS | Vitis HLS |
18+
|--------------------------|-----------|---------|----------------|------------|-----------|--------------|
19+
| MLP | supported | limited | in development | supported | supported | experimental |
20+
| CNN | supported | limited | in development | supported | supported | experimental |
21+
| RNN (LSTM) | supported | N/A | in development | supported | supported | N/A |
22+
| GNN (GarNet) | supported | N/A | N/A | N/A | N/A | N/A |
23+
24+
25+
### Compile an example model
26+
27+
```python
28+
import hls4ml
29+
30+
# Fetch a keras model from our example repository
31+
# This will download our example model to your working directory and return an example configuration file
32+
config = hls4ml.utils.fetch_example_model('KERAS_3layer.json')
33+
34+
# You can print the configuration to see some default parameters
35+
print(config)
36+
37+
# Convert it to a hls project
38+
hls_model = hls4ml.converters.keras_to_hls(config)
39+
40+
# Print full list of example models if you want to explore more
41+
hls4ml.utils.fetch_example_list()
42+
43+
# Use Vivado HLS to synthesize the model
44+
# This might take several minutes
45+
hls_model.build()
46+
47+
# Print out the report if you want
48+
hls4ml.report.read_vivado_report('my-hls-test')
49+
50+
```
51+
52+
### More resources
53+
1454
The main hls4ml tutorial code is kept on [GitHub](https://github.com/fastmachinelearning/hls4ml-tutorial). Users are welcome to walk through the notebooks at their own pace. There is also a set of slides linked to the [README](https://github.com/fastmachinelearning/hls4ml-tutorial/blob/master/README.md).
1555

1656
That said, there have been several cases where the hls4ml developers have given live demonstrations and tutorials. Below is a non-exhaustive list of tutorials given in the last few years (newest on top).

content/inference/qonnx.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Direct inference with (Q)ONNX Runtime
2+
3+
[![ReadTheDocs](https://readthedocs.org/projects/qonnx/badge/?version=latest&style=plastic)](http://qonnx.readthedocs.io/)
4+
[![PyPI version](https://badge.fury.io/py/qonnx.svg)](https://badge.fury.io/py/qonnx)
5+
[![arxiv](https://img.shields.io/badge/arXiv-2206.07527-b31b1b.svg)](https://arxiv.org/abs/2206.07527)
6+
7+
Text taken and adopted from the QONNX [README.md](https://github.com/fastmachinelearning/qonnx/blob/main/README.md).
8+
9+
<img align="left" src="https://xilinx.github.io/finn/img/TFC_1W2A.onnx.png" alt="QONNX example" style="margin-right: 20px" width="200"/>
10+
11+
QONNX (Quantized ONNX) introduces three new custom operators -- [`Quant`](docs/qonnx-custom-ops/quant_op.md), [`BipolarQuant`](docs/qonnx-custom-ops/bipolar_quant_op.md), and [`Trunc`](docs/qonnx-custom-ops/trunc_op.md) -- in order to represent arbitrary-precision uniform quantization in [ONNX](onnx.md). This enables:
12+
13+
* Representation of binary, ternary, 3-bit, 4-bit, 6-bit or any other quantization.
14+
* Quantization is an operator itself, and can be applied to any parameter or layer input.
15+
* Flexible choices for scaling factor and zero-point granularity.
16+
* Quantized values are carried using standard `float` datatypes to remain [ONNX](onnx.md) protobuf-compatible.
17+
18+
This repository contains a set of Python utilities to work with QONNX models, including but not limited to:
19+
20+
* executing QONNX models for (slow) functional verification
21+
* shape inference, constant folding and other basic optimizations
22+
* summarizing the inference cost of a QONNX model in terms of mixed-precision MACs, parameter and activation volume
23+
* Python infrastructure for writing transformations and defining executable, shape-inferencable custom ops
24+
* (experimental) data layout conversion from standard ONNX NCHW to custom QONNX NHWC ops

content/resources/fpga_resources/index.md

Lines changed: 1 addition & 169 deletions
Original file line numberDiff line numberDiff line change
@@ -10,172 +10,4 @@ To programm an FPGA one has two main options. The first one is to use a Hardware
1010

1111
When it comes to vendors there are two big onces: Xilinx (part of AMD) and Altera (part of Intel). Both vendors provide there own tooling to simulate, synthesis and debug the design. Xilinx FPGAs are used for CMS and they have Vivado for design, simulation, synthesis, and debugging tasks and Vitis for software development for Xilinx FPGAs and SoCs. Intel FPGAs are programmed using Quartus Prime. For HLS tools they come with Vivado HLS (Xilinx) and HLS Compiler (Intel).
1212

13-
To simplify the pipeline from a trained model to an implementation on the FPGA CMS is supporting different tools, which will be explained in the flowing in more detail.
14-
15-
## 1. hls4ml
16-
17-
![image](https://github.com/fastmachinelearning/fastmachinelearning.github.io/raw/master/images/hls4ml_logo.svg)
18-
19-
[![Documentation Status](https://github.com/fastmachinelearning/hls4ml/actions/workflows/build-sphinx.yml/badge.svg)](https://fastmachinelearning.org/hls4ml)
20-
[![PyPI version](https://badge.fury.io/py/hls4ml.svg)](https://badge.fury.io/py/hls4ml)
21-
22-
#### Description
23-
hls4ml is a Python library designed to bring machine learning inference to FPGAs by leveraging high-level synthesis (HLS). The idea is to convert trained machine learning models from popular open-source frameworks (such as PyTorch, Tensorflow, Keras etc.) into FPGA-compatible firmware, tailored to specific needs.
24-
25-
As the project is actively evolving the hls4ml team is always looking for people trying there tools.
26-
27-
#### Current tools supported
28-
29-
| ML framework/HLS backend | (Q)Keras | PyTorch | (Q)ONNX | Vivado HLS | Intel HLS | Vitis HLS |
30-
|--------------------------|-----------|---------|----------------|------------|-----------|--------------|
31-
| MLP | supported | limited | in development | supported | supported | experimental |
32-
| CNN | supported | limited | in development | supported | supported | experimental |
33-
| RNN (LSTM) | supported | N/A | in development | supported | supported | N/A |
34-
| GNN (GarNet) | supported | N/A | N/A | N/A | N/A | N/A |
35-
36-
37-
### Compile an example model
38-
39-
```python
40-
import hls4ml
41-
42-
# Fetch a keras model from our example repository
43-
# This will download our example model to your working directory and return an example configuration file
44-
config = hls4ml.utils.fetch_example_model('KERAS_3layer.json')
45-
46-
# You can print the configuration to see some default parameters
47-
print(config)
48-
49-
# Convert it to a hls project
50-
hls_model = hls4ml.converters.keras_to_hls(config)
51-
52-
# Print full list of example models if you want to explore more
53-
hls4ml.utils.fetch_example_list()
54-
55-
# Use Vivado HLS to synthesize the model
56-
# This might take several minutes
57-
hls_model.build()
58-
59-
# Print out the report if you want
60-
hls4ml.report.read_vivado_report('my-hls-test')
61-
62-
```
63-
64-
## 2. Conifer
65-
66-
<img src="https://github.com/thesps/conifer/raw/master/conifer_v1.png" width="250" alt="conifer">
67-
68-
Conifer converts from popular BDT training frameworks, and can emit code projects in different FPGA languages.
69-
70-
Available converters:
71-
72-
- scikit-learn
73-
- xgboost
74-
- ONNX - giving access to other training libraries such as lightGBM and CatBoost with ONNXMLTools
75-
- TMVA
76-
- Tensorflow Decision Forest (tf_df)
77-
78-
Available backends:
79-
80-
- Xilinx HLS - for best results use latest Vitis HLS, but Vivado HLS is also supported (conifer uses whichever is on your `$PATH`)
81-
- VHDL - a direct-to-VHDL implementation, deeply pipelined for high clock frequencies
82-
- FPU - Forest Processing Unit reusable IP core for flexible BDT inference
83-
- C++ - intended for bit-accurate emulation on CPU with a single include header file
84-
- Python - intended for validation of model conversion and to allow inspection of a model without a configuration
85-
86-
### Usage
87-
88-
```python
89-
from sklearn.ensemble import GradientBoostingClassifier
90-
# Train a BDT
91-
clf = GradientBoostingClassifier().fit(X_train, y_train)
92-
93-
# Create a conifer config dictionary
94-
cfg = conifer.backends.xilinxhls.auto_config()
95-
# Change the bit precision (print the config to see everything modifiable)
96-
cfg['Precision'] = 'ap_fixed<12,4>'
97-
98-
# Convert the sklearn model to a conifer model
99-
model = conifer.converters.convert_from_sklearn(clf, cfg)
100-
# Write the HLS project and compile the C++-Python bridge
101-
model.compile()
102-
103-
# Run bit-accurate prediction on the CPU
104-
y_hls = model.decision_function(X)
105-
y_skl = clf.decision_function(X)
106-
107-
# Synthesize the model for the target FPGA
108-
model.build()
109-
```
110-
111-
## 3. (Q)ONNX
112-
113-
[![ReadTheDocs](https://readthedocs.org/projects/qonnx/badge/?version=latest&style=plastic)](http://qonnx.readthedocs.io/)
114-
[![PyPI version](https://badge.fury.io/py/qonnx.svg)](https://badge.fury.io/py/qonnx)
115-
[![arxiv](https://img.shields.io/badge/arXiv-2206.07527-b31b1b.svg)](https://arxiv.org/abs/2206.07527)
116-
117-
<img align="left" src="https://xilinx.github.io/finn/img/TFC_1W2A.onnx.png" alt="QONNX example" style="margin-right: 20px" width="200"/>
118-
119-
120-
QONNX (Quantized ONNX) introduces three new custom operators -- [`Quant`](docs/qonnx-custom-ops/quant_op.md), [`BipolarQuant`](docs/qonnx-custom-ops/bipolar_quant_op.md), and [`Trunc`](docs/qonnx-custom-ops/trunc_op.md) -- in order to represent arbitrary-precision uniform quantization in ONNX. This enables:
121-
122-
* Representation of binary, ternary, 3-bit, 4-bit, 6-bit or any other quantization.
123-
* Quantization is an operator itself, and can be applied to any parameter or layer input.
124-
* Flexible choices for scaling factor and zero-point granularity.
125-
* Quantized values are carried using standard `float` datatypes to remain ONNX protobuf-compatible.
126-
127-
This repository contains a set of Python utilities to work with QONNX models, including but not limited to:
128-
129-
* executing QONNX models for (slow) functional verification
130-
* shape inference, constant folding and other basic optimizations
131-
* summarizing the inference cost of a QONNX model in terms of mixed-precision MACs, parameter and activation volume
132-
* Python infrastructure for writing transformations and defining executable, shape-inferencable custom ops
133-
* (experimental) data layout conversion from standard ONNX NCHW to custom QONNX NHWC ops
134-
135-
## 4. High Granularity Quantization (HGQ)
136-
137-
[![docu](https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg)](https://calad0i.github.io/HGQ/)
138-
[![pypi](https://badge.fury.io/py/hgq.svg)](https://badge.fury.io/py/hgq)
139-
[![arxiv](https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg)](https://arxiv.org/abs/2405.00645)
140-
141-
142-
[High Granularity Quantization (HGQ)](https://github.com/calad0i/HGQ/) is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
143-
144-
![image](https://calad0i.github.io/HGQ/_images/overview.svg)
145-
146-
Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.
147-
148-
```python
149-
import keras
150-
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
151-
from HGQ import ResetMinMax, FreeBOPs
152-
153-
model = keras.models.Sequential([
154-
HQuantize(beta=1.e-5),
155-
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
156-
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
157-
HDense(10, beta=1.e-5),
158-
])
159-
160-
opt = keras.optimizers.Adam(learning_rate=0.001)
161-
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
162-
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
163-
callbacks = [ResetMinMax(), FreeBOPs()]
164-
165-
model.fit(..., callbacks=callbacks)
166-
167-
from HGQ import trace_minmax, to_proxy_model
168-
from hls4ml.converters import convert_from_keras_model
169-
170-
trace_minmax(model, x_train, cover_factor=1.0)
171-
proxy = to_proxy_model(model, aggressive=True)
172-
173-
model_hls = convert_from_keras_model(
174-
proxy,
175-
backend='vivado',
176-
output_dir=...,
177-
part=...
178-
)
179-
```
180-
181-
An interactive example of HGQ can be found in the [kaggle notebook](https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1). Full documentation can be found at [calad0i.github.io/HGQ](https://calad0i.github.io/HGQ/>).
13+
To simplify the pipeline from a trained model to an implementation on the FPGA CMS is supporting different tools, which are explained in the inference section ([hls4ml](../../inference/hls4ml.md), [conifer](../../inference/conifer.md), [qonnx](../../inference/qonnx.md)). Furthermore, tools for quantize aware training are used (QKeras, [HGQ](../../training/HGQ.md)).

content/training/HGQ.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
High Granularity Quantization (HGQ)
2+
3+
[![docu](https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg)](https://calad0i.github.io/HGQ/)
4+
[![pypi](https://badge.fury.io/py/hgq.svg)](https://badge.fury.io/py/hgq)
5+
[![arxiv](https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg)](https://arxiv.org/abs/2405.00645)
6+
7+
Text taken and adopted from the HGQ [README.md](https://github.com/calad0i/HGQ/blob/master/README.md).
8+
9+
[High Granularity Quantization (HGQ)](https://github.com/calad0i/HGQ/) is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
10+
11+
![image](https://calad0i.github.io/HGQ/_images/overview.svg)
12+
13+
Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.
14+
15+
```python
16+
import keras
17+
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
18+
from HGQ import ResetMinMax, FreeBOPs
19+
20+
model = keras.models.Sequential([
21+
HQuantize(beta=1.e-5),
22+
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
23+
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
24+
HDense(10, beta=1.e-5),
25+
])
26+
27+
opt = keras.optimizers.Adam(learning_rate=0.001)
28+
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
29+
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
30+
callbacks = [ResetMinMax(), FreeBOPs()]
31+
32+
model.fit(..., callbacks=callbacks)
33+
34+
from HGQ import trace_minmax, to_proxy_model
35+
from hls4ml.converters import convert_from_keras_model
36+
37+
trace_minmax(model, x_train, cover_factor=1.0)
38+
proxy = to_proxy_model(model, aggressive=True)
39+
40+
model_hls = convert_from_keras_model(
41+
proxy,
42+
backend='vivado',
43+
output_dir=...,
44+
part=...
45+
)
46+
```
47+
48+
An interactive example of HGQ can be found in the [kaggle notebook](https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1). Full documentation can be found at [calad0i.github.io/HGQ](https://calad0i.github.io/HGQ/>).

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ nav:
150150
- PyTorch: inference/pytorch.md
151151
- PyTorch Geometric: inference/pyg.md
152152
- ONNX: inference/onnx.md
153+
- QONNX: inference/qonnx.md
153154
- XGBoost: inference/xgboost.md
154155
- hls4ml: inference/hls4ml.md
155156
- conifer: inference/conifer.md
@@ -168,4 +169,5 @@ nav:
168169
- Training as a Service:
169170
- MLaaS4HEP: training/MLaaS4HEP.md
170171
- Autoencoders: training/autoencoders.md
172+
- HGQ: training/HGQ.md
171173
# - Benchmarking:

0 commit comments

Comments
 (0)