[CUDA][TIR] Standard math intrinsics should not lower to fast-math `__*f` by default

Currently, some standard TIR math intrinsics on CUDA lower to CUDA fast-math device functions by default.

For example, `tir.exp` / `tirx.exp` on `float32` lowers to  `__expf(x)` instead of the precise CUDA math function expf(x). **This happens even when --use_fast_math is not passed to NVCC.**

## Why this is a problem

`__expf`,` __logf`, `__sinf`, etc. are CUDA fast-math intrinsics. They trade accuracy for performance and can introduce visible precision loss in numerically sensitive kernels.

Users generally expect standard math intrinsics such as `T.exp`, `T.log`, `T.sin`, and `T.cos` to preserve normal CUDA math semantics unless fast math is explicitly requested.

Fast-math behavior should ideally be opt-in, for example through a target option, compiler flag, or an explicit fast-math intrinsic.

Standard TIR math intrinsics should lower to precise CUDA math functions by default:

| TIR op | Expected CUDA |
| --- | --- |
| tirx.exp | expf |
| tirx.exp10 | exp10f |
| tirx.log | logf |
| tirx.log2 | log2f |
| tirx.log10 | log10f |
| tirx.sin | sinf |
| tirx.cos | cosf |
| tirx.tan | tanf |

Fast-math variants such as __expf, __logf, __sinf, and __cosf should only be emitted when fast math is explicitly enabled.

Suggested fix: use CUDAMath instead of CUDAFastMath for standard CUDA math intrinsic lowering:

TVM_REGISTER_OP("tirx.exp")
    .set_attr<FLowerIntrinsic>("cuda.FLowerIntrinsic", DispatchPureExtern<CUDAMath>);

If fast-math lowering is desired, it would be better to gate it behind an explicit fast-math option rather than making it the default behavior for standard math intrinsics.


cc @Hzfengsy @junrushao @quic-sanirudh @shingjan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA][TIR] Standard math intrinsics should not lower to fast-math `__*f` by default #19546

Why this is a problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

TIR op	Expected CUDA
tirx.exp	expf
tirx.exp10	exp10f
tirx.log	logf
tirx.log2	log2f
tirx.log10	log10f
tirx.sin	sinf
tirx.cos	cosf
tirx.tan	tanf

[CUDA][TIR] Standard math intrinsics should not lower to fast-math __*f by default #19546

Description

Why this is a problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[CUDA][TIR] Standard math intrinsics should not lower to fast-math `__*f` by default #19546