Skip to content

Commit e51a8b9

Browse files
authored
Correct stride pseudocode in p27 (#240)
1 parent aed4a84 commit e51a8b9

2 files changed

Lines changed: 12 additions & 4 deletions

File tree

book/i18n/ko/src/puzzle_27/puzzle_27.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,12 @@ GPU 스레드 블록 (128 스레드, 4개 또는 2개 워프, 하드웨어 조
4848
# 복잡한 블록 전체 리덕션 (기존 방식 - Puzzle 12에서):
4949
shared_memory[local_i] = my_value
5050
barrier()
51-
for stride in range(64, 0, -1):
51+
stride = 64
52+
while stride > 0:
5253
if local_i < stride:
5354
shared_memory[local_i] += shared_memory[local_i + stride]
5455
barrier()
56+
stride //= 2
5557
if local_i == 0:
5658
output[block_idx.x] = shared_memory[0]
5759
@@ -81,10 +83,12 @@ if local_i == 0:
8183
shared_memory[local_i] = my_value
8284
barrier()
8385
# 스트라이드 기반 인덱싱을 사용한 트리 리덕션...
84-
for stride in range(64, 0, -1):
86+
stride = 64
87+
while stride > 0:
8588
if local_i < stride:
8689
shared_memory[local_i] += shared_memory[local_i + stride]
8790
barrier()
91+
stride //= 2
8892
```
8993

9094
### **중간 단계: 워프 프로그래밍 (Puzzle 24)**

book/src/puzzle_27/puzzle_27.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,12 @@ Learn the complete parallel programming toolkit from `gpu.primitives.block`:
4646
# Complex block-wide reduction (traditional approach - from Puzzle 12):
4747
shared_memory[local_i] = my_value
4848
barrier()
49-
for stride in range(64, 0, -1):
49+
stride = 64
50+
while stride > 0:
5051
if local_i < stride:
5152
shared_memory[local_i] += shared_memory[local_i + stride]
5253
barrier()
54+
stride //= 2
5355
if local_i == 0:
5456
output[block_idx.x] = shared_memory[0]
5557
@@ -79,10 +81,12 @@ Complex but educational - explicit shared memory, barriers, and tree reduction:
7981
shared_memory[local_i] = my_value
8082
barrier()
8183
# Tree reduction with stride-based indexing...
82-
for stride in range(64, 0, -1):
84+
stride = 64
85+
while stride > 0:
8386
if local_i < stride:
8487
shared_memory[local_i] += shared_memory[local_i + stride]
8588
barrier()
89+
stride //= 2
8690
```
8791

8892
### **The intermediate step: Warp programming (Puzzle 24)**

0 commit comments

Comments
 (0)