Skip to content

Better handle types and overflows in softmax (especially lagacy and stable)#1476

Draft
jmitrevs wants to merge 6 commits into
fastmachinelearning:mainfrom
jmitrevs:sofmax_fix
Draft

Better handle types and overflows in softmax (especially lagacy and stable)#1476
jmitrevs wants to merge 6 commits into
fastmachinelearning:mainfrom
jmitrevs:sofmax_fix

Conversation

@jmitrevs

@jmitrevs jmitrevs commented May 20, 2026

Copy link
Copy Markdown
Contributor

Description

The legacy softmax was prone to overflow, so this PR made it use an accumulator type. Also, for stable, the default types are better chosen. There are some changes in latency, as well, to avoid overflow, but my feeling is that latency can often behave badly, so this doesn't fix that.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change which adds functionality)

Tests

The standard softmax tests should still pass. One could consider adding specific overflow tests, like we have in the current version of the brevitas tutorial.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label May 20, 2026
@jmitrevs jmitrevs requested a review from Copilot June 6, 2026 20:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on making the Vivado softmax implementations more robust against overflow by improving internal type handling (notably for legacy and latency variants), while also improving default type choices for stable softmax and adding auto-precision inference support for Softmax accumulators.

Changes:

  • Updated Vivado softmax templates to use accumulator types for summation / inversion-table addressing and to split exp/inv table types (legacy).
  • Updated Vivado stream softmax templates accordingly (latency/stable/legacy).
  • Added Softmax-specific accum_t inference in InferPrecisionTypes and adjusted FPGA backend Softmax default type attributes (unsigned table defaults, new inp_norm_t attribute).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
hls4ml/templates/vivado/nnet_utils/nnet_activation.h Aligns softmax (latency/stable/legacy) internal types to reduce overflow risk and clarifies exp/inv table type usage.
hls4ml/templates/vivado/nnet_utils/nnet_activation_stream.h Mirrors softmax type changes for streaming implementations (but currently has inconsistencies that need fixing).
hls4ml/model/optimizer/passes/infer_precision.py Adds Softmax-specific accumulator type inference when precision is set to auto.
hls4ml/backends/fpga/fpga_backend.py Updates Softmax type attribute defaults (unsigned tables, adds inp_norm_t, adjusts accumulator handling).
Comments suppressed due to low confidence (1)

hls4ml/templates/vivado/nnet_utils/nnet_activation_stream.h:295

  • This legacy stream implementation still uses CONFIG_T::table_size to compute/clamp index, but exp_table is now allocated with CONFIG_T::exp_table_size. If these differ (and they can, since softmax exposes separate exp/inv table sizes), index can go out of bounds or the scaling will be wrong. Use exp_table_size consistently for exp table addressing here.
                    auto data_round = (data_cache[j] - data_cache[i]) * CONFIG_T::table_size / 16;
                    int index = data_round + 8 * CONFIG_T::table_size / 16;
                    if (index < 0)
                        index = 0;
                    if (index > CONFIG_T::table_size - 1)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hls4ml/templates/vivado/nnet_utils/nnet_activation_stream.h
Comment thread hls4ml/templates/vivado/nnet_utils/nnet_activation_stream.h Outdated
Comment thread hls4ml/model/optimizer/passes/infer_precision.py Outdated
Comment thread hls4ml/model/optimizer/passes/infer_precision.py
@jmitrevs jmitrevs marked this pull request as draft June 8, 2026 21:48
@jmitrevs

jmitrevs commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I should also update the oneAPI backend. Truthfully that is a bit more involved. I will look into it this month, but have to work some hardware tests the next two weeks, so this for the moment will be lower priority.

@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 9, 2026
@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 9, 2026
@jmitrevs

jmitrevs commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

On the oneAPI (and Quartus) side there are already some softmax changes in PR #1432 so we need to coordinate.

@jmitrevs jmitrevs mentioned this pull request Jun 25, 2026
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

please test Trigger testing by creating local PR branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants