Skip to content

Commit 19198da

Browse files
committed
Changelog for fixing vec-add sample
Signed-off-by: Jay Gu <jagu@nvidia.com>
1 parent a0b64c6 commit 19198da

2 files changed

Lines changed: 4 additions & 7 deletions

File tree

changelog.d/fix-vec-add.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<!--- SPDX-FileCopyrightText: Copyright (c) <2025> NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
2+
<!--- SPDX-License-Identifier: Apache-2.0 -->
3+
4+
Fix redundant check in vectoradd sample (by @NikeNano)

samples/templates/VectorAddition.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,6 @@ def vec_add(a: torch.Tensor, b: torch.Tensor, use_gather: bool = False) -> torch
8080
# (TILE_X * TILE_Y) around 1024 (a common block size limit for threads).
8181
TILE_X = max(1, 1024 // TILE_Y)
8282

83-
# Further adjustment to ensure TILE_X * TILE_Y is not excessively large
84-
# if N (and thus TILE_Y) is small, or to prevent TILE_X from becoming zero.
85-
if TILE_X * TILE_Y > 1024 and TILE_X > 1:
86-
TILE_X = 1024 // TILE_Y
87-
if TILE_X == 0:
88-
TILE_X = 1 # Ensure TILE_X is at least 1
89-
9083
# Calculate the 2D grid dimensions for launching the kernel.
9184
# `math.ceil(M / TILE_X)` blocks along rows, `math.ceil(N / TILE_Y)` blocks along columns.
9285
grid = (math.ceil(M / TILE_X), math.ceil(N / TILE_Y), 1)

0 commit comments

Comments
 (0)