Skip to content

Commit c56fa24

Browse files
committed
Loosen tf32 matmul sample tolerance for Ampere
Signed-off-by: Jay Gu <jagu@nvidia.com>
1 parent 2df0fc2 commit c56fa24

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

samples/MatMul.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ def cutile_matmul(A: torch.Tensor, B: torch.Tensor, persistent: bool = False) ->
296296

297297
if torch.cuda.get_device_capability()[0] <= 8:
298298
# Ampere tfloat32 numerics is loose
299-
atol, rtol = 5e-3, 5e-3
299+
atol, rtol = 1e-2, 1e-2
300300
else:
301301
atol, rtol = 1e-4, 1e-3
302302

samples/templates/MatMul.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ def cutile_matmul(A: torch.Tensor, B: torch.Tensor, persistent: bool = False) ->
129129

130130
if torch.cuda.get_device_capability()[0] <= 8:
131131
# Ampere tfloat32 numerics is loose
132-
atol, rtol = 5e-3, 5e-3
132+
atol, rtol = 1e-2, 1e-2
133133
else:
134134
atol, rtol = 1e-4, 1e-3
135135

0 commit comments

Comments
 (0)