Skip to content

Handle int overflow in rnn#28003

Merged
tianleiwu merged 7 commits intomainfrom
tlwu/20260407/rnn_fix
Apr 13, 2026
Merged

Handle int overflow in rnn#28003
tianleiwu merged 7 commits intomainfrom
tlwu/20260407/rnn_fix

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu commented Apr 7, 2026

Description

Fixes two overflow/underflow bugs in the CPU RNN kernel (rnn.cc):

  • SafeInt for GEMM M-dimension: seq_length * batch_size was computed as a raw int64_t multiply before narrow<int>(), meaning an overflow would be UB before the check could fire. Replaced with SafeInt<int64_t>(seq_length) * batch_size for a checked multiply.

  • seq_length == 0 guard in Assign_Y_h: For the forward direction, last_time_step = seq_length - 1 underflows to -1 when seq_length == 0, producing a negative y_offset and out-of-bounds read. Added an early-exit that zero-fills Y_h for the direction and returns. Also handles sequence_lens[batch] == 0 (same underflow path), zeroing the affected batch slot and skipping via continue.

Motivation and Context

Silent UB from integer overflow/underflow in shape-derived index arithmetic can corrupt memory or produce incorrect results without any diagnostic signal. These cases are legal per the ONNX spec (empty sequences, per-batch zero-length sequences) and must be handled explicitly.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CPU RNN implementation to use onnxruntime::narrow<int>() instead of unchecked static_cast<int>() when passing shape-derived values into APIs that take int, aiming to prevent silent truncation/overflow.

Changes:

  • Add an explicit include for core/common/narrow.h in rnn.cc.
  • Replace several static_cast<int>(...) conversions with onnxruntime::narrow<int>(...) in CopyVector and Gemm calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/rnn/rnn.cc Outdated
Comment thread onnxruntime/core/providers/cpu/rnn/rnn.cc Outdated
@tianleiwu
Copy link
Copy Markdown
Contributor Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 7, 2026

@copilot apply changes based on the comments in this thread

Applied both changes from the review thread in commit 88cff57:

  1. SafeInt for multiplication overflow: Changed onnxruntime::narrow<int>(seq_length * batch_size) to onnxruntime::narrow<int>(SafeInt<int64_t>(seq_length) * batch_size) so the multiplication itself is overflow-checked before narrowing.

  2. seq_length == 0 handling in Assign_Y_h: Added an early-exit guard at the top of Assign_Y_h — if seq_length == 0, Y_h for the current direction is zeroed out and the function returns immediately. Also added handling for the case where sequence_lens[batch] == 0, which would also produce last_time_step = -1 — those batches are now zero-filled and skipped via continue.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/rnn/rnn.cc Outdated
Comment thread onnxruntime/core/providers/cpu/rnn/rnn.cc
Comment thread onnxruntime/core/common/safeint.h Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cpu/rnn/rnn_op_test.cc
hariharans29
hariharans29 previously approved these changes Apr 11, 2026
@tianleiwu tianleiwu requested a review from hariharans29 April 13, 2026 22:26
@tianleiwu tianleiwu merged commit ffbc5e8 into main Apr 13, 2026
119 of 127 checks passed
@tianleiwu tianleiwu deleted the tlwu/20260407/rnn_fix branch April 13, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants