Skip to content

Commit d470881

Browse files
authored
Remove stale reference to LayoutTensor code (#245)
1 parent ff53970 commit d470881

1 file changed

Lines changed: 30 additions & 30 deletions

File tree

book/src/puzzle_09/second_case.md

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -153,20 +153,20 @@ Each position should sum its neighbors: [left + center + right]
153153
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
154154

155155
CUDA thread hit application kernel entry function breakpoint, p09_process_sliding_window_...
156-
<<<(1,1,1),(4,1,1)>>> (output=..., input=...)
157-
at /home/ubuntu/workspace/mojo-gpu-puzzles/problems/p09/p09.mojo:30
158-
30 input: TileTensor[mut=False, dtype, vector_layout],
156+
<<<(1,1,1),(4,1,1)>>> (output=..., a=...)
157+
at /home/ubuntu/workspace/mojo-gpu-puzzles/problems/p09/p09.mojo:36
158+
36 a: TileTensor[mut=False, dtype, VectorLayout, ImmutAnyOrigin],
159159
```
160160

161161
#### Step 4: Navigate to the main logic
162162

163163
```bash
164164
(cuda-gdb) n
165-
29 output: TileTensor[mut=True, dtype, vector_layout],
165+
35 output: TileTensor[mut=True, dtype, VectorLayout, MutAnyOrigin],
166166
(cuda-gdb) n
167-
32 thread_id = thread_idx.x
167+
38 var thread_id = thread_idx.x
168168
(cuda-gdb) n
169-
38 for offset in range(ITER):
169+
44 for offset in range(ITER):
170170
```
171171
172172
#### Step 5: Test variable accessibility - crucial discovery
@@ -206,15 +206,15 @@ $3 = {{0}, {1}, {2}, {3}}
206206
#### Step 6: Set up loop monitoring
207207
208208
```bash
209-
(cuda-gdb) b 42
210-
Breakpoint 1 at 0x7fffd326ffd0: file problems/p09/p09.mojo, line 42.
209+
(cuda-gdb) b 45
210+
Breakpoint 1 at 0x7fffd326ffd0: file problems/p09/p09.mojo, line 45.
211211
(cuda-gdb) c
212212
Continuing.
213213
214214
CUDA thread hit Breakpoint 1, p09_process_sliding_window_...
215-
<<<(1,1,1),(4,1,1)>>> (output=..., input=...)
216-
at /home/ubuntu/workspace/mojo-gpu-puzzles/problems/p09/p09.mojo:42
217-
42 idx = thread_id + offset - 1
215+
<<<(1,1,1),(4,1,1)>>> (output=..., a=...)
216+
at /home/ubuntu/workspace/mojo-gpu-puzzles/problems/p09/p09.mojo:45
217+
45 var idx = Int(thread_id) + offset - 1
218218
```
219219
220220
**🔍 We're now inside the loop body. Let's count iterations manually.**
@@ -223,50 +223,50 @@ CUDA thread hit Breakpoint 1, p09_process_sliding_window_...
223223
224224
```bash
225225
(cuda-gdb) n
226-
43 if 0 <= idx < SIZE:
226+
46 if 0 <= idx < SIZE:
227227
(cuda-gdb) n
228-
41 for offset in range(ITER):
228+
44 for offset in range(ITER):
229229
```
230230
231-
**First iteration complete**: Loop went from line 4243 → back to 41. The loop continues.
231+
**First iteration complete**: Loop went from line 4546 → back to 44. The loop continues.
232232
233233
#### Step 8: Second loop iteration (offset = 1)
234234
235235
```bash
236236
(cuda-gdb) n
237237
238238
CUDA thread hit Breakpoint 1, p09_process_sliding_window_...
239-
42 idx = thread_id + offset - 1
239+
45 var idx = Int(thread_id) + offset - 1
240240
(cuda-gdb) n
241-
43 if 0 <= idx < SIZE:
241+
46 if 0 <= idx < SIZE:
242242
(cuda-gdb) n
243-
44 value = rebind[Scalar[dtype]](input[idx])
243+
47 var value = rebind[Scalar[dtype]](a[idx])
244244
(cuda-gdb) n
245-
45 window_sum += value
245+
48 window_sum += value
246246
(cuda-gdb) n
247-
43 if 0 <= idx < SIZE:
247+
46 if 0 <= idx < SIZE:
248248
(cuda-gdb) n
249-
41 for offset in range(ITER):
249+
44 for offset in range(ITER):
250250
```
251251
252-
**Second iteration complete**: This time it went through the if-block (lines 44-45).
252+
**Second iteration complete**: This time it went through the if-block (lines 47-48).
253253
254254
#### Step 9: testing for third iteration
255255
256256
```bash
257257
(cuda-gdb) n
258-
47 output[thread_id] = window_sum
258+
50 output[thread_id] = window_sum
259259
```
260260
261-
**CRITICAL DISCOVERY**: The loop exited after only 2 iterations! It went directly to line 47 instead of hitting our breakpoint at line 42 again.
261+
**CRITICAL DISCOVERY**: The loop exited after only 2 iterations! It went directly to line 50 instead of hitting our breakpoint at line 45 again.
262262
263263
**Conclusion**: The loop ran exactly **2 iterations** and then exited.
264264
265265
#### Step 10: Complete kernel execution and context loss
266266
267267
```bash
268268
(cuda-gdb) n
269-
31 fn process_sliding_window(
269+
34 def process_sliding_window(
270270
(cuda-gdb) n
271271
[Switching to Thread 0x7ffff7cc0e00 (LWP 110927)]
272272
0x00007ffff064f84a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
@@ -290,10 +290,10 @@ From our debugging session, we observed:
290290
291291
Looking at what each thread should compute:
292292
293-
- **Thread 0**: window_sum = input[-1] + input[0] + input[1] = (boundary) + 0 + 1 = 1.0
294-
- **Thread 1**: window_sum = input[0] + input[1] + input[2] = 0 + 1 + 2 = 3.0
295-
- **Thread 2**: window_sum = input[1] + input[2] + input[3] = 1 + 2 + 3 = 6.0
296-
- **Thread 3**: window_sum = input[2] + input[3] + input[4] = 2 + 3 + (boundary) = 5.0
293+
- **Thread 0**: window_sum = a[-1] + a[0] + a[1] = (boundary) + 0 + 1 = 1.0
294+
- **Thread 1**: window_sum = a[0] + a[1] + a[2] = 0 + 1 + 2 = 3.0
295+
- **Thread 2**: window_sum = a[1] + a[2] + a[3] = 1 + 2 + 3 = 6.0
296+
- **Thread 3**: window_sum = a[2] + a[3] + a[4] = 2 + 3 + (boundary) = 5.0
297297
298298
#### Step 12: Trace the actual execution for thread 0
299299
@@ -309,13 +309,13 @@ With only 2 iterations (offset = 0, 1):
309309
310310
- `idx = thread_id + offset - 1 = 0 + 1 - 1 = 0`
311311
- `if 0 <= idx < SIZE:` → `if 0 <= 0 < 4:` → **True**
312-
- `window_sum += input[0]` → `window_sum += 0`
312+
- `window_sum += a[0]` → `window_sum += 0`
313313
314314
**Missing Iteration 3 (offset = 2)**:
315315
316316
- `idx = thread_id + offset - 1 = 0 + 2 - 1 = 1`
317317
- `if 0 <= idx < SIZE:` → `if 0 <= 1 < 4:` → **True**
318-
- `window_sum += input[1]` → `window_sum += 1` ← **THIS NEVER HAPPENS**
318+
- `window_sum += a[1]` → `window_sum += 1` ← **THIS NEVER HAPPENS**
319319
320320
**Result**: Thread 0 gets `window_sum = 0` instead of `window_sum = 0 + 1 = 1`
321321

0 commit comments

Comments
 (0)