[BugFix] Align tir.round to ties-to-even across all backends#19368
Conversation
tir.round to ties-to-even across all backends
There was a problem hiding this comment.
Code Review
This pull request standardizes the behavior of tirx.round and tirx.nearbyint across multiple backends (including LLVM, NVPTX, CUDA, Metal, and OpenCL) to consistently use ties-to-even (banker's rounding) semantics, aligning them with constant-folding behavior. A new test case was added to verify these semantics. Review feedback identifies missing namespace prefixes for StringImm in the Metal and OpenCL lowering rules which would cause compilation errors, and suggests adding descriptive error messages to FFI checks in the NVPTX implementation for better diagnostics.
tir.round constant-folds using std::nearbyint (ties-to-even), but backends lowered it to platform round() (ties-away-from-zero). This inconsistency meant compiled code could produce different results from constant-folded code for midpoint values like 0.5, 2.5, etc. Fix all backends to use ties-to-even intrinsics: - LLVM/ROCm/Hexagon: llvm::Intrinsic::round -> nearbyint - NVPTX: __nv_round -> __nv_nearbyint - CUDA: round/roundf -> nearbyint/nearbyintf (f16/bf16 already used hrint) - Metal/OpenCL: round -> rint - Vulkan/SPIR-V: GLSLstd450Round -> GLSLstd450RoundEven - OpenCL codegen: fix nearbyint mapping from round() to rint() Also update op.h documentation to explicitly state ties-to-even semantics and add test_round_ties_to_even regression test. Follow-up to apache#19367. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3da7c03 to
21229ea
Compare
Redirect tirx.round to tirx.nearbyint op and let DispatchPureExternLibDevice generate __nv_nearbyint[f], instead of duplicating the name construction and arg copying in a lambda.
tir.round semantics changed to ties-to-even (nearbyint) in this branch, but ONNX runtime uses std::round (ties-away-from-zero) for MaxRoiPool coordinate mapping. Use floor(x + 0.5) explicitly to match ONNX runtime behavior and be independent of tir.round semantics. Also fix roi_pool_python.py reference which had the same issue: Python's built-in round() is ties-to-even and was already inconsistent with ONNX runtime.
|
just curious and I came aross an issue for this change, if user want to codegen |
|
@LeiWang1999 What issue did you meet? |
|
@LeiWang1999 @tlopex Implementation shouldn't be hard. The lowerings this PR replaced map almost 1:1 onto a new intrinsic. Adding e.g. |
|
After thinking about it a bit more deeply, I agree with aligning the default rounding behavior with Python instead of cuda/metal. Thanks for your response and no further issues from my side! |
Problem
tir.roundconstant-folds usingstd::nearbyint(IEEE 754 ties-to-even), but all backends lower it to platformround()which uses ties-away-from-zero. This means compiled code can produce different results from constant-folded code for midpoint values:This was identified as a follow-up to #19367 — see this comment.
Fix
Align all backends to use ties-to-even intrinsics, matching the constant-folding behavior:
llvm::Intrinsic::roundllvm::Intrinsic::nearbyint__nv_round[f]__nv_nearbyint[f]round/roundfnearbyint/nearbyintf(f16/bf16 already usedhrint)roundrintGLSLstd450RoundGLSLstd450RoundEvenAlso fixes OpenCL codegen where
tir.nearbyintwas incorrectly mapped to OpenCLround()instead ofrint().Updates
op.hdocumentation to explicitly state ties-to-even semantics for bothround()andnearbyint().Testing
New
test_round_ties_to_evenverifies midpoint inputs[0.5, 1.5, 2.5, 3.5, -0.5, -1.5, -2.5, -3.5]produce ties-to-even results on the LLVM backend. All 12 tests pass (10 passed, 2 skipped for CUDA).