[JAX] Use const scale in MHA after softmax by anko-intel · Pull Request #2466 · intel/neural-compressor

anko-intel · 2026-05-12T11:28:54Z

Type of Change

feature

Description

For tensor quantized inside MultiHeadAttention after softmax const range of values can be assumed. This way calibration or min.max finding for dynamic quantization is not required for this tensor.

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

For tensor quantized after const softmax const range of values can be assumed. Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

Copilot

Pull request overview

This PR updates JAX quantization support for MultiHeadAttention by treating the post-softmax attention probabilities as having a known fixed range, avoiding calibration/min-max collection for that tensor.

Changes:

Added a calibration-status helper (MinMaxObserver.is_calibrated()) to detect whether observer stats were populated.
Introduced fixed_range support in both static and dynamic QDQ helper layers, enabling scale computation without observers/per-batch min/max.
Applied fixed_range=(0.0, 1.0) to the post-softmax attention tensor (a_qdq) in both static and dynamic MultiHeadAttention quantization paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`neural_compressor/jax/quantization/layers_static.py`	Adds `fixed_range` support to static QDQ, introduces `is_calibrated()`, and uses fixed [0,1] range for MHA attention probabilities post-softmax.
`neural_compressor/jax/quantization/layers_dynamic.py`	Adds `fixed_range` support to dynamic QDQ by precomputing scale/zero-point and uses fixed [0,1] range for MHA attention probabilities post-softmax.

Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

bkowalskiINTEL

LGTM

anko-intel added 2 commits May 11, 2026 16:51

Use const scale in MHA after softmax

614e397

For tensor quantized after const softmax const range of values can be assumed. Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

Do not expicitly decide which path will be used in calibration

8a0adab

Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

anko-intel requested a review from Copilot May 12, 2026 11:28

Copilot started reviewing on behalf of anko-intel May 12, 2026 11:29 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread neural_compressor/jax/quantization/layers_static.py Outdated

Comment thread neural_compressor/jax/quantization/layers_static.py

Fix typo

4e6ec44

Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

anko-intel force-pushed the dev/anko/know_scale branch from 409f71c to 4e6ec44 Compare May 12, 2026 12:22

anko-intel requested review from bkowalskiINTEL and wpietka May 12, 2026 12:30

bkowalskiINTEL reviewed May 12, 2026

View reviewed changes

Comment thread neural_compressor/jax/quantization/layers_dynamic.py

Comment thread neural_compressor/jax/quantization/layers_dynamic.py

Apply review offline comments

8cb0507

Signed-off-by: Andrzej Kotłowski <andrzej.kotlowski@intel.com>

anko-intel requested a review from bkowalskiINTEL May 12, 2026 14:40

bkowalskiINTEL approved these changes May 12, 2026

View reviewed changes

anko-intel merged commit ec84358 into master May 12, 2026
14 checks passed

anko-intel deleted the dev/anko/know_scale branch May 12, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Use const scale in MHA after softmax#2466

[JAX] Use const scale in MHA after softmax#2466
anko-intel merged 4 commits into
masterfrom
dev/anko/know_scale

anko-intel commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bkowalskiINTEL left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anko-intel commented May 12, 2026

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bkowalskiINTEL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants