Distributed Arithmetic strategy for Dense, Conv1/2D, and EinsumDense by calad0i · Pull Request #1191 · fastmachinelearning/hls4ml

calad0i · 2025-02-11T02:46:21Z

Description

This PR introduces a new strategy, distributed_arithmetic for

Dense (io parallel / stream)
Conv1/2D (io parallel / stream)
EinsumDense (io parallel)

With this strategy, all matmul like operations in there layers are decomposed into optimized adder trees. Heavy lifting tasks are offloaded to da4ml, where everything is jitted with numba. There, CMVM problem is optimized with greedy common subexpression elimination. A reduction of LUT consumption of over 30% is frequently seen when WRAP is used as overflow mode with improved latency. DSP consumption will almost always be 0 with this strategy.

This PR depends on the s-quark-pr and includes all changes made there. (QEinsumDense not available otherwise)

Type of change

New feature (non-breaking change which adds functionality)

Tests

Tests added to test_hgq_layers.py and test_einsum_dense.py. EinsumDense test will NOT be triggered in the current configuration due to keras v3 dependency.

Checklist

No docs for now.

softmax fix fix softmax parsing issue ckpt softmax fix fix table size after overriding inv_inp_t

…exact-softmax

JanFSchulte · 2025-06-27T13:56:02Z

pre-commit.ci autofix

JanFSchulte · 2025-06-27T14:00:37Z

pre-commit.ci autofix

JanFSchulte

Just leaving a few preliminary comments, there are still some parts that I need to look at in a bit more detail.

JanFSchulte · 2025-06-30T19:14:20Z

+    } else if (CONFIG_T::strategy == nnet::resource || CONFIG_T::strategy == nnet::resource_unrolled) {
        conv_2d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases);
+    } else {
+        assert(false && "Invalid strategy for conv_2d_cl");


Shouldn't this be caught earlier and not at runtime of the HLS code?

JanFSchulte · 2025-06-30T19:36:08Z

 void maximum(input1_T data1[CONFIG_T::n_elem], input2_T data2[CONFIG_T::n_elem], res_T res[CONFIG_T::n_elem]) {
    for (int ii = 0; ii < CONFIG_T::n_elem; ii++) {
        res[ii] = (data1[ii] > data2[ii]) ? static_cast<res_T>(data1[ii]) : static_cast<res_T>(data2[ii]);
+        res[ii] = (data1[ii] > data2[ii]) ? static_cast<res_T>(data1[ii]) : static_cast<res_T>(data2[ii]);


Why is this line duplicated?

JanFSchulte · 2025-06-30T19:53:14Z

-        self.add_weights_variable(name='table', var_name='table{index}', precision=table_t, data=self.table)
+        self.table = self.attributes['table_data']
+        self.attributes['table_size'] = len(self.table)
+        self.table_size = len(self.table)


Why do we need to table_size both in the attributes dict and as a standalone attribute?

JanFSchulte · 2025-06-30T20:49:12Z

+            node.model.config.layer_name_precision[node.name] = str(new_type)
+            return False
+
+        if not all(isinstance(l, FixedPointQuantizer) for l in out_layers):


Isn't this case covered by the assert above?

JanFSchulte · 2025-07-01T12:41:28Z

+        Ks, Is, Fs = Ks[0], Is[0], Fs[0]  # remove batch dimension
+    else:
+        Ks = np.ones(inp_shape, dtype=np.int8)
+        Is = Fs = np.full(inp_shape, 126, dtype=np.int8)


Out of curiosity, why 8 bit in this fallback option?

I used 8 bit for bw everywhere for the bit_exact.py file.

JanFSchulte · 2025-07-01T12:59:07Z

+
+
+def _get_input_kif(node: Layer):
+    """Get the input k, i, f to a layer.


I think for future maintainability of this code it would be good to document somewhere (here or elsewhere, what k, i, f, and b stand for.

Documented in bit_exact.py

JanFSchulte · 2025-07-01T13:48:51Z


    layer_config = None
    if model_arch['class_name'] == 'Sequential':
-        print('Interpreting Sequential')


As a general comment for this PR: I personally like that currently hls4ml is giving the user some feedback on what it is actually doing at a given time, especially when running multiple steps in sequence. And printing out the layer structure it creates is also helpful as a sanity check. So I would prefer not to remove all these print outs.

vloncar

LGTM. We looked at it, thrice, and seems to be in good shape. There may be follow-ups to address any emerging bugs in tests. If no one objects, we'll merge this by Friday.

JanFSchulte · 2025-07-02T15:03:33Z

The latest set of pytests showed some actual issues, so I want to re-run them one more time. Otherwise I'm ok with merging, my comments are mostly nitpicky.

vloncar · 2025-07-02T15:07:52Z

Apart from the issue with test_merge that we're investigating, the others seem to be stability issues (not actual issues, but do need to be addressed too).

JanFSchulte · 2025-07-02T17:31:17Z

There was also an issue with da4ml, but I see that Chang bumped the version of that and it is gone. So all good from my side.

* added Cropping1D and Cropping2D keras layers support * removed .bak templates files * added cropping layers tests for vivado and vitis * [pre-commit.ci] auto fixes from pre-commit hooks --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

vloncar · 2025-07-03T18:06:20Z

The test_merge failure and the other precision test failure will be addressed in a subsequent PR. Merging this. Huge thanks to Chang for the monumental effort of building all of it.

calad0i force-pushed the da4ml-v2 branch from ed624c0 to fb75fbf Compare March 7, 2025 23:24

calad0i added the please test Trigger testing by creating local PR branch label Mar 7, 2025

calad0i force-pushed the da4ml-v2 branch from fb75fbf to 892d526 Compare March 8, 2025 01:19

calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 8, 2025

calad0i force-pushed the da4ml-v2 branch from 892d526 to ae68da2 Compare March 8, 2025 01:24

calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 8, 2025

calad0i force-pushed the da4ml-v2 branch from ae68da2 to 2dfbcb9 Compare March 9, 2025 02:57

calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 9, 2025

calad0i force-pushed the da4ml-v2 branch from a545da9 to 99093ca Compare March 9, 2025 04:41

calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 9, 2025

calad0i added 14 commits May 28, 2025 06:22

skip oneapi test if icpx doesnt exist

c640c46

bit-exact-possible multidim softmax

e95aabe

softmax fix

0393f54

softmax fix fix softmax parsing issue ckpt softmax fix fix table size after overriding inv_inp_t

Merge branch 'skip_oneapi_test_if_icpx_doesnt_exist' into vivado-bit-…

249206c

…exact-softmax

move softmax attr to fpga backend, post rebase fix

a950636

add keras v3 object parser

904433d

add keras v3 layer handlers

a14e02e

einsumdense and einsum

a1bcb66

add einsum templates

95b3e92

bit-exact-possible multidim softmax

3b7f1d6

symbolic bitwidth infer util

7037ea4

add qinterval test

a21e428

keras v2-v3 reshape fn compability patch

f8a07ae

hgq2 layer handlers

3ecc9ab

[pre-commit.ci] auto fixes from pre-commit hooks

c5ad390

JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 27, 2025

JanFSchulte reviewed Jul 1, 2025

View reviewed changes

calad0i force-pushed the da4ml-v2 branch from d462c48 to 31b551d Compare July 2, 2025 09:03

calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jul 2, 2025

vloncar approved these changes Jul 2, 2025

View reviewed changes

JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jul 2, 2025

JanFSchulte approved these changes Jul 2, 2025

View reviewed changes

calad0i and others added 7 commits July 2, 2025 13:57

model opt pass fix and avg pool fix

d6c0e16

squashed cosmetic and minor changes

c595c4f

multi graph dimname fix

7fdbd4d

bump da4ml version

8fa0398

Merge branch 'main' into da4ml-v2

d089059

bit-exact algorithm minor change

81a9be1

calad0i force-pushed the da4ml-v2 branch from 2105f0e to 81a9be1 Compare July 3, 2025 14:35

vloncar added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jul 3, 2025

vloncar merged commit 46b7a88 into fastmachinelearning:main Jul 3, 2025
6 of 9 checks passed



		def _get_input_kif(node: Layer):
		"""Get the input k, i, f to a layer.

Uh oh!

Conversation

calad0i commented Feb 11, 2025

Description

Type of change

Tests

Checklist

Uh oh!

JanFSchulte commented Jun 27, 2025

Uh oh!

JanFSchulte commented Jun 27, 2025

Uh oh!

JanFSchulte left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vloncar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanFSchulte commented Jul 2, 2025

Uh oh!

vloncar commented Jul 2, 2025

Uh oh!

JanFSchulte commented Jul 2, 2025

Uh oh!

vloncar commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vloncar left a comment •

edited

Loading