Skip to content

Distributed Arithmetic strategy for Dense, Conv1/2D, and EinsumDense#1191

Merged
vloncar merged 84 commits into
fastmachinelearning:mainfrom
calad0i:da4ml-v2
Jul 3, 2025
Merged

Distributed Arithmetic strategy for Dense, Conv1/2D, and EinsumDense#1191
vloncar merged 84 commits into
fastmachinelearning:mainfrom
calad0i:da4ml-v2

Conversation

@calad0i

@calad0i calad0i commented Feb 11, 2025

Copy link
Copy Markdown
Contributor

Description

This PR introduces a new strategy, distributed_arithmetic for

  • Dense (io parallel / stream)
  • Conv1/2D (io parallel / stream)
  • EinsumDense (io parallel)

With this strategy, all matmul like operations in there layers are decomposed into optimized adder trees. Heavy lifting tasks are offloaded to da4ml, where everything is jitted with numba. There, CMVM problem is optimized with greedy common subexpression elimination. A reduction of LUT consumption of over 30% is frequently seen when WRAP is used as overflow mode with improved latency. DSP consumption will almost always be 0 with this strategy.

This PR depends on the s-quark-pr and includes all changes made there. (QEinsumDense not available otherwise)

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

Tests added to test_hgq_layers.py and test_einsum_dense.py. EinsumDense test will NOT be triggered in the current configuration due to keras v3 dependency.

Checklist

  • No docs for now.

@calad0i calad0i added the please test Trigger testing by creating local PR branch label Mar 7, 2025
@calad0i calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 8, 2025
@calad0i calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 8, 2025
@calad0i calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 9, 2025
@calad0i calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 9, 2025
@JanFSchulte

Copy link
Copy Markdown
Contributor

pre-commit.ci autofix

@JanFSchulte

Copy link
Copy Markdown
Contributor

pre-commit.ci autofix

@JanFSchulte JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 27, 2025

@JanFSchulte JanFSchulte left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a few preliminary comments, there are still some parts that I need to look at in a bit more detail.

} else if (CONFIG_T::strategy == nnet::resource || CONFIG_T::strategy == nnet::resource_unrolled) {
conv_2d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases);
} else {
assert(false && "Invalid strategy for conv_2d_cl");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be caught earlier and not at runtime of the HLS code?

void maximum(input1_T data1[CONFIG_T::n_elem], input2_T data2[CONFIG_T::n_elem], res_T res[CONFIG_T::n_elem]) {
for (int ii = 0; ii < CONFIG_T::n_elem; ii++) {
res[ii] = (data1[ii] > data2[ii]) ? static_cast<res_T>(data1[ii]) : static_cast<res_T>(data2[ii]);
res[ii] = (data1[ii] > data2[ii]) ? static_cast<res_T>(data1[ii]) : static_cast<res_T>(data2[ii]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this line duplicated?

self.add_weights_variable(name='table', var_name='table{index}', precision=table_t, data=self.table)
self.table = self.attributes['table_data']
self.attributes['table_size'] = len(self.table)
self.table_size = len(self.table)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to table_size both in the attributes dict and as a standalone attribute?

node.model.config.layer_name_precision[node.name] = str(new_type)
return False

if not all(isinstance(l, FixedPointQuantizer) for l in out_layers):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this case covered by the assert above?

Ks, Is, Fs = Ks[0], Is[0], Fs[0] # remove batch dimension
else:
Ks = np.ones(inp_shape, dtype=np.int8)
Is = Fs = np.full(inp_shape, 126, dtype=np.int8)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why 8 bit in this fallback option?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 8 bit for bw everywhere for the bit_exact.py file.



def _get_input_kif(node: Layer):
"""Get the input k, i, f to a layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for future maintainability of this code it would be good to document somewhere (here or elsewhere, what k, i, f, and b stand for.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in bit_exact.py


layer_config = None
if model_arch['class_name'] == 'Sequential':
print('Interpreting Sequential')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment for this PR: I personally like that currently hls4ml is giving the user some feedback on what it is actually doing at a given time, especially when running multiple steps in sequence. And printing out the layer structure it creates is also helpful as a sanity check. So I would prefer not to remove all these print outs.

@calad0i calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jul 2, 2025

@vloncar vloncar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We looked at it, thrice, and seems to be in good shape. There may be follow-ups to address any emerging bugs in tests. If no one objects, we'll merge this by Friday.

@JanFSchulte

Copy link
Copy Markdown
Contributor

The latest set of pytests showed some actual issues, so I want to re-run them one more time. Otherwise I'm ok with merging, my comments are mostly nitpicky.

@JanFSchulte JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jul 2, 2025
@vloncar

vloncar commented Jul 2, 2025

Copy link
Copy Markdown
Contributor

Apart from the issue with test_merge that we're investigating, the others seem to be stability issues (not actual issues, but do need to be addressed too).

@JanFSchulte

Copy link
Copy Markdown
Contributor

There was also an issue with da4ml, but I see that Chang bumped the version of that and it is gone. So all good from my side.

calad0i and others added 7 commits July 2, 2025 13:57
* added Cropping1D and Cropping2D keras layers support

* removed .bak templates files

* added cropping layers tests for vivado and vitis

* [pre-commit.ci] auto fixes from pre-commit hooks

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@vloncar vloncar added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jul 3, 2025
@vloncar

vloncar commented Jul 3, 2025

Copy link
Copy Markdown
Contributor

The test_merge failure and the other precision test failure will be addressed in a subsequent PR. Merging this. Huge thanks to Chang for the monumental effort of building all of it.

@vloncar vloncar merged commit 46b7a88 into fastmachinelearning:main Jul 3, 2025
6 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

please test Trigger testing by creating local PR branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants