Fix stack buffer overflows in ynnpack channelwise quantized tensor and reduce by mohammadmseet-hue · Pull Request #9843 · google/XNNPACK

mohammadmseet-hue · 2026-04-01T06:32:45Z

Summary

Two stack buffer overflows in ynnpack:

Bug 1: Stack buffer overflow in `xnn_define_channelwise_quantized_tensor_value_v3`

File: ynnpack/xnnpack/tensor.cc:146

std::copy_n(dims, channel_dim + 1, quantization_dims) copies channel_dim + 1 elements into quantization_dims[YNN_MAX_TENSOR_RANK] (size 8) without checking channel_dim < num_dims or channel_dim < YNN_MAX_TENSOR_RANK. Additionally, xnn_validate_channelwise_quantized_tensor (line 112-116) is empty (return xnn_status_success).

With channel_dim >= 8, this writes past the end of the stack buffer — 88 bytes of controlled overflow confirmed.

ASAN output:

==ERROR: AddressSanitizer: stack-buffer-overflow
WRITE of size 88
    #0 memmove
    #8 xnn_define_channelwise_quantized_tensor_value_v3
[96, 160) 'quantization_dims' (line 145) <== Memory access at offset 160 overflows this variable

Reachable from public API: xnn_define_channelwise_quantized_tensor_value(), xnn_define_channelwise_quantized_tensor_value_v2(), xnn_define_channelwise_quantized_tensor_value_v3().

Fix: Add channel_dim >= num_dims || num_dims > YNN_MAX_TENSOR_RANK check before the std::copy_n.

Bug 2: Stack buffer overflow in `get_reduce_identity_value`

File: ynnpack/subgraph/reduce.cc:243

For ynn_reduce_min_max with keep_dims=true on a rank-8 tensor, output.extents.push_back(2) (line 345) increases the output rank to 9. Then get_reduce_identity_value computes rank = output.rank() = 9 and accesses dims[rank - 1] = dims[8], writing one element past the size_t dims[YNN_MAX_TENSOR_RANK] (size 8) stack array.

Fix: Add rank bounds check in the ynn_reduce_min_max case before accessing dims[rank - 1].

…d reduce Bug 1: xnn_define_channelwise_quantized_tensor_value_v3 (tensor.cc:146) std::copy_n(dims, channel_dim + 1, quantization_dims) copies channel_dim + 1 elements into quantization_dims[YNN_MAX_TENSOR_RANK] (size 8) without checking channel_dim < num_dims or channel_dim < YNN_MAX_TENSOR_RANK. With channel_dim >= 8, this writes past the stack buffer. ASAN trace: ==ERROR: AddressSanitizer: stack-buffer-overflow WRITE of size 88 google#8 xnn_define_channelwise_quantized_tensor_value_v3 [96, 160) 'quantization_dims' (line 145) <== overflows this variable Fix: Add channel_dim >= num_dims and num_dims > YNN_MAX_TENSOR_RANK checks. Bug 2: get_reduce_identity_value (reduce.cc:243) For ynn_reduce_min_max with keep_dims=true on a rank-8 tensor, output.extents.push_back(2) increases rank to 9. Then dims[rank - 1] = dims[8] writes one element past the size-8 stack array. Fix: Add rank bounds check before array access.

- tensor.cc: Split combined check into separate num_dims and channel_dim validations with YNN_LOG_ERROR messages. Replace asserts with proper error returns for channelwise_zero_point. Remove assert(data) per reviewer (XNNPACK limitation, not YNNPACK). - reduce.cc: Change define_reduce to return ynn_status. Add output rank validation after min_max dimension push. Keep rank >= 1 as assert (internal invariant). Propagate error via YNN_RETURN_IF_ERROR at call site.

mohammadmseet-hue · 2026-04-08T19:23:54Z

ASan proof for the `reduce.cc` rank-9 stack-buffer-overflow in this PR

Re-verified on today's master (HEAD 37d77e566). The recent refactor 6c008cf68 Optimize the subgraph glue code for reduce kernels did not add the missing validate_rank in ynn_define_reduce, so the bug is still live.

Trigger chain (public C API only)

ynn_create_subgraph
ynn_define_tensor_value with num_dims = YNN_MAX_TENSOR_RANK = 8
ynn_define_reduce(subgraph, ynn_reduce_min_max, /*num_axes=*/0, nullptr, input_id, YNN_INVALID_VALUE_ID, &output_id, YNN_NODE_FLAG_KEEP_DIMS)
ynn_define_reduce (reduce.cc:413) skips validate_rank — unlike stack, static_expand_dims, split_dim, stencil_copy, static_transpose which all validate.
In define_reduce, the keep-dims loop (reduce.cc:310-318) leaves output.extents.size() == 8, then reduce.cc:322 unconditionally runs output.extents.push_back(2) → size becomes 9.
get_reduce_identity_value(subgraph, output, ynn_reduce_min_max) then executes rank = output.rank() == 9; dims[rank - 1] = 2; → 8-byte OOB write on the 8-element size_t dims[YNN_MAX_TENSOR_RANK] stack array. std::reverse(dims, dims + 9) immediately after compounds the OOB.

Also reachable from the TFLite-delegate dynamic-quant path via ynnpack/xnnpack/dynamic_quantization.cc:214.

Build

g++ -std=c++17 -fsanitize=address,undefined -fno-omit-frame-pointer -g -O0 poc_reduce_minmax_rank9.cc -o poc
./poc

Output (clang/g++ ASan + UBSan)

poc_reduce_minmax_rank9.cc:70:20: runtime error: index 8 out of bounds for type 'long unsigned int [8]'
poc_reduce_minmax_rank9.cc:70:22: runtime error: store to address 0x7f...6a0 with insufficient space for an object of type 'size_t'
=================================================================
==1131527==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f...6a0
WRITE of size 8 at 0x7f...6a0 thread T0
    #0 get_reduce_identity_value(ynn_value const&, ynn_reduce_operator)
       poc_reduce_minmax_rank9.cc:70
    #1 main poc_reduce_minmax_rank9.cc:105

Address 0x7f...6a0 is located in stack of thread T0 at offset 160 in frame
    #0 get_reduce_identity_value(ynn_value const&, ynn_reduce_operator) poc_reduce_minmax_rank9.cc:50

  This frame has 3 object(s):
    [48, 52) '<unknown>'
    [64, 72) 'value_f32' (line 51)
    [96, 160) 'dims' (line 53) <== Memory access at offset 160 overflows this variable

SUMMARY: AddressSanitizer: stack-buffer-overflow in get_reduce_identity_value

The PoC file is a standalone C++ file that copies the vulnerable get_reduce_identity_value and the define_reduce shape-propagation prelude verbatim from subgraph/reduce.cc, so ASan acts on the real function body.

Suggested regression test for the merge: calling ynn_define_reduce with a rank-8 tensor + ynn_reduce_min_max + YNN_NODE_FLAG_KEEP_DIMS should return xnn_status_unsupported_parameter.

dsharlet reviewed Apr 1, 2026

View reviewed changes

Comment thread ynnpack/subgraph/reduce.cc Outdated

Comment thread ynnpack/xnnpack/tensor.cc

mohammadmseet-hue added 2 commits April 1, 2026 10:16

mohammadmseet-hue force-pushed the fix/channelwise-quantized-channel-dim-check branch from be986be to f781ad1 Compare April 1, 2026 08:16

dsharlet approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stack buffer overflows in ynnpack channelwise quantized tensor and reduce#9843

Fix stack buffer overflows in ynnpack channelwise quantized tensor and reduce#9843
mohammadmseet-hue wants to merge 2 commits intogoogle:masterfrom
mohammadmseet-hue:fix/channelwise-quantized-channel-dim-check

mohammadmseet-hue commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

mohammadmseet-hue commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mohammadmseet-hue commented Apr 1, 2026

Summary

Bug 1: Stack buffer overflow in xnn_define_channelwise_quantized_tensor_value_v3

Bug 2: Stack buffer overflow in get_reduce_identity_value

Uh oh!

Uh oh!

Uh oh!

mohammadmseet-hue commented Apr 8, 2026

ASan proof for the reduce.cc rank-9 stack-buffer-overflow in this PR

Trigger chain (public C API only)

Build

Output (clang/g++ ASan + UBSan)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug 1: Stack buffer overflow in `xnn_define_channelwise_quantized_tensor_value_v3`

Bug 2: Stack buffer overflow in `get_reduce_identity_value`

ASan proof for the `reduce.cc` rank-9 stack-buffer-overflow in this PR