Skip to content

Enable assymetric quantization for all MultiHeadAttention qdq layers#2468

Merged
wpietka merged 1 commit into
masterfrom
dev/wpietkax/always-asymmetric-qdq-for-int8
May 13, 2026
Merged

Enable assymetric quantization for all MultiHeadAttention qdq layers#2468
wpietka merged 1 commit into
masterfrom
dev/wpietkax/always-asymmetric-qdq-for-int8

Conversation

@wpietka

@wpietka wpietka commented May 12, 2026

Copy link
Copy Markdown
Contributor

Type of Change

Improvement

Description

Currently QDynamicMultiHeadAttention creates 4 separate qdq layers. Two of them are potentially asymmetrical - depending on activation dtype - and two others are always symmetrical. There are two problems here: firstly the symmetrical/asymmetrical policy differs from Static version which allows asymmetrical computation for all qdq layers and secondly dynamic version doesn't need 4 separate qdq layers. Since scale is computed in runtime and not preserved in the layer itself a single qdq layer can be reused for queries, keys and values. Attention qdq stays separate due to fixed range.

Expected Behavior & Potential Risk

Slightly increased dynamic layers accuracy

How has this PR been tested?

Vit benchmark has been run with different configurations, and the results show better accuracy with asymmetric layers

Dependency Change?

No dependency changes

@wpietka wpietka force-pushed the dev/wpietkax/always-asymmetric-qdq-for-int8 branch from e07f851 to 5a2d00b Compare May 13, 2026 07:23
Signed-off-by: Wojciech Piętka <wojciechx.pietka@intel.com>

@bkowalskiINTEL bkowalskiINTEL left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wpietka wpietka merged commit 03f79f3 into master May 13, 2026
14 checks passed
@wpietka wpietka deleted the dev/wpietkax/always-asymmetric-qdq-for-int8 branch May 13, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants