Skip to content

[BugFix] Align tir.round to ties-to-even across all backends#19368

Merged
tlopex merged 3 commits into
apache:mainfrom
swjng:fix-tir-round-ties-to-even
Apr 8, 2026
Merged

[BugFix] Align tir.round to ties-to-even across all backends#19368
tlopex merged 3 commits into
apache:mainfrom
swjng:fix-tir-round-ties-to-even

Conversation

@swjng
Copy link
Copy Markdown
Contributor

@swjng swjng commented Apr 8, 2026

Problem

tir.round constant-folds using std::nearbyint (IEEE 754 ties-to-even), but all backends lower it to platform round() which uses ties-away-from-zero. This means compiled code can produce different results from constant-folded code for midpoint values:

Input Constant-fold (ties-to-even) Compiled (ties-away)
0.5 0.0 1.0
2.5 2.0 3.0
-0.5 0.0 -1.0

This was identified as a follow-up to #19367 — see this comment.

Fix

Align all backends to use ties-to-even intrinsics, matching the constant-folding behavior:

Backend Before After
LLVM/ROCm/Hexagon llvm::Intrinsic::round llvm::Intrinsic::nearbyint
NVPTX __nv_round[f] __nv_nearbyint[f]
CUDA round/roundf nearbyint/nearbyintf (f16/bf16 already used hrint)
Metal/OpenCL round rint
Vulkan/SPIR-V GLSLstd450Round GLSLstd450RoundEven

Also fixes OpenCL codegen where tir.nearbyint was incorrectly mapped to OpenCL round() instead of rint().

Updates op.h documentation to explicitly state ties-to-even semantics for both round() and nearbyint().

Testing

python -m pytest tests/python/tirx-base/test_tir_intrin.py -xvs

New test_round_ties_to_even verifies midpoint inputs [0.5, 1.5, 2.5, 3.5, -0.5, -1.5, -2.5, -3.5] produce ties-to-even results on the LLVM backend. All 12 tests pass (10 passed, 2 skipped for CUDA).

@swjng swjng changed the title [BugFix] Align tir.round to ties-to-even across all backends [BugFix] Align tir.round to ties-to-even across all backends Apr 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request standardizes the behavior of tirx.round and tirx.nearbyint across multiple backends (including LLVM, NVPTX, CUDA, Metal, and OpenCL) to consistently use ties-to-even (banker's rounding) semantics, aligning them with constant-folding behavior. A new test case was added to verify these semantics. Review feedback identifies missing namespace prefixes for StringImm in the Metal and OpenCL lowering rules which would cause compilation errors, and suggests adding descriptive error messages to FFI checks in the NVPTX implementation for better diagnostics.

Comment thread src/target/source/intrin_rule_metal.cc Outdated
Comment thread src/target/source/intrin_rule_opencl.cc Outdated
Comment thread src/target/llvm/intrin_rule_nvptx.cc Outdated
tir.round constant-folds using std::nearbyint (ties-to-even), but
backends lowered it to platform round() (ties-away-from-zero). This
inconsistency meant compiled code could produce different results from
constant-folded code for midpoint values like 0.5, 2.5, etc.

Fix all backends to use ties-to-even intrinsics:
- LLVM/ROCm/Hexagon: llvm::Intrinsic::round -> nearbyint
- NVPTX: __nv_round -> __nv_nearbyint
- CUDA: round/roundf -> nearbyint/nearbyintf (f16/bf16 already used hrint)
- Metal/OpenCL: round -> rint
- Vulkan/SPIR-V: GLSLstd450Round -> GLSLstd450RoundEven
- OpenCL codegen: fix nearbyint mapping from round() to rint()

Also update op.h documentation to explicitly state ties-to-even semantics
and add test_round_ties_to_even regression test.

Follow-up to apache#19367.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@swjng swjng force-pushed the fix-tir-round-ties-to-even branch from 3da7c03 to 21229ea Compare April 8, 2026 06:10
swjng added 2 commits April 8, 2026 15:16
Redirect tirx.round to tirx.nearbyint op and let
DispatchPureExternLibDevice generate __nv_nearbyint[f], instead of
duplicating the name construction and arg copying in a lambda.
tir.round semantics changed to ties-to-even (nearbyint) in this branch,
but ONNX runtime uses std::round (ties-away-from-zero) for MaxRoiPool
coordinate mapping. Use floor(x + 0.5) explicitly to match ONNX runtime
behavior and be independent of tir.round semantics.

Also fix roi_pool_python.py reference which had the same issue: Python's
built-in round() is ties-to-even and was already inconsistent with ONNX
runtime.
Copy link
Copy Markdown
Member

@tlopex tlopex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@tlopex tlopex merged commit 2e6ee08 into apache:main Apr 8, 2026
8 checks passed
@LeiWang1999
Copy link
Copy Markdown
Contributor

just curious and I came aross an issue for this change, if user want to codegen roundf in cuda, is there currently no suitable method for that?"

@tlopex
Copy link
Copy Markdown
Member

tlopex commented May 11, 2026

@LeiWang1999 What issue did you meet?
also cc @swjng

@swjng
Copy link
Copy Markdown
Contributor Author

swjng commented May 11, 2026

@LeiWang1999 @tlopex Implementation shouldn't be hard. The lowerings this PR replaced map almost 1:1 onto a new intrinsic. Adding e.g. tir.round_ties_away (→ llvm.round, __nv_round[f], CUDA roundf, OpenCL/Metal round, SPIR-V GLSLstd450Round, std::round for constant folding) should cover it.

@LeiWang1999
Copy link
Copy Markdown
Contributor

After thinking about it a bit more deeply, I agree with aligning the default rounding behavior with Python instead of cuda/metal. Thanks for your response and no further issues from my side!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants