You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ContribOperators.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2624,6 +2624,8 @@ This version of the operator has been available since version 1 of the 'com.micr
2624
2624
<dd>left_window_size for local attention (like Mistral). Default value is -1 meaning unused.</dd>
2625
2625
<dt><tt>num_heads</tt> : int (required)</dt>
2626
2626
<dd>Number of attention heads for q</dd>
2627
+
<dt><tt>qk_norm_epsilon</tt> : float</dt>
2628
+
<dd>Epsilon used by the per-head RMS norm applied to Q and K when q_norm_weight and k_norm_weight inputs are provided. Default value is 1e-6.</dd>
2627
2629
<dt><tt>qk_output</tt> : int</dt>
2628
2630
<dd>Output values of QK matrix multiplication before (1) or after (2) softmax normalization. Default value is 0 (don't output).</dd>
2629
2631
<dt><tt>rotary_interleaved</tt> : int</dt>
@@ -2638,7 +2640,7 @@ This version of the operator has been available since version 1 of the 'com.micr
2638
2640
<dd>Quantization type for V cache. One of 'NONE', 'PER_TENSOR', 'PER_CHANNEL'.</dd>
2639
2641
</dl>
2640
2642
2641
-
#### Inputs (7 - 14)
2643
+
#### Inputs (7 - 16)
2642
2644
2643
2645
<dl>
2644
2646
<dt><tt>query</tt> : T</dt>
@@ -2669,6 +2671,10 @@ This version of the operator has been available since version 1 of the 'com.micr
2669
2671
<dd>Scale tensor for past_key.</dd>
2670
2672
<dt><tt>v_scale</tt> (optional) : T_KV_SCALE</dt>
2671
2673
<dd>Scale tensor for past_value.</dd>
2674
+
<dt><tt>q_norm_weight</tt> (optional) : T</dt>
2675
+
<dd>Optional 1D tensor of shape (head_size). When provided together with k_norm_weight, the kernel applies a per-head RMS normalization to Q (and K) before any rotary embedding. Used by Qwen3-style models that wrap their Q/K projections in a Reshape -> SimplifiedLayerNormalization -> Reshape stack; downstream graph fusion folds that pattern into this input. Currently honored by the native WebGPU execution provider only; JSEP WebGPU/JS and other EPs must reject the node when this input is set.</dd>
2676
+
<dt><tt>k_norm_weight</tt> (optional) : T</dt>
2677
+
<dd>Optional 1D tensor of shape (head_size). See q_norm_weight. Must be provided together with q_norm_weight.</dd>
0 commit comments