Skip to content

Commit b7e87e1

Browse files
committed
Updated the docs
1 parent 1cc5575 commit b7e87e1

19 files changed

Lines changed: 138 additions & 106 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ cd gpu-programming-101
217217

218218
# Inside container: verify GPU access and start learning
219219
/workspace/test-gpu.sh
220-
cd modules/module1 && make && ./01_vector_addition_cuda
220+
cd modules/module1 && make && ./build/01_vector_addition_cuda
221221
```
222222

223223
### Option 2: Native Installation

docker/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ docker/
7373

7474
### ROCm Development Container
7575
**Image**: `gpu-programming-101:rocm`
76-
**Base**: `rocm/dev-ubuntu-22.04:7.0`
76+
**Base**: `rocm/dev-ubuntu-22.04:7.0-complete`
7777

7878
**Features**:
7979
- ROCm 7.0 with HIP development environment
@@ -298,7 +298,7 @@ sudo apt update && sudo apt upgrade docker-ce docker-compose
298298

299299
# Check base image availability
300300
docker pull nvidia/cuda:12.9.1-devel-ubuntu22.04
301-
docker pull rocm/dev-ubuntu-22.04:7.0
301+
docker pull rocm/dev-ubuntu-22.04:7.0-complete
302302
```
303303

304304
**"Permission denied errors"**

modules/module1/README.md

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,25 +20,34 @@ After completing this module, you will be able to:
2020

2121
### Prerequisites
2222
- NVIDIA GPU with CUDA support OR AMD GPU with ROCm support
23-
- CUDA Toolkit 11.0+ or ROCm 4.0+
23+
- CUDA Toolkit 12.0+ or ROCm 6.0+ (Docker images provide CUDA 12.9.1 and ROCm 7.0)
2424
- C/C++ compiler (GCC, Clang, or MSVC)
2525

26+
Tip: You can skip native installs by using our Docker environment (recommended):
27+
```
28+
./docker/scripts/run.sh --auto
29+
```
30+
2631
### Running Examples
2732

2833
Navigate to the examples directory:
2934
```bash
3035
cd examples/
3136
```
3237

33-
Build and run examples:
38+
Build and run examples (binaries are written to `build/`):
3439
```bash
35-
# Build all examples
40+
# Build all examples for your detected GPU
3641
make
3742

38-
# Run specific examples
39-
./01_vector_addition_cuda
40-
./04_device_info_cuda
41-
./05_performance_comparison
43+
# Run specific examples (CUDA)
44+
./build/01_vector_addition_cuda
45+
./build/04_device_info_cuda
46+
./build/05_performance_comparison_cuda || ./build/05_performance_comparison
47+
48+
# Or HIP versions (cross-platform)
49+
./build/02_vector_addition_hip
50+
./build/04_device_info_hip
4251
```
4352

4453
## Examples Overview
@@ -48,9 +57,14 @@ make
4857
| `01_vector_addition_cuda.cu` | Basic CUDA vector addition | Kernels, memory management, error handling |
4958
| `02_vector_addition_hip.cpp` | Cross-platform HIP version | HIP API, portability |
5059
| `03_matrix_addition_cuda.cu` | 2D matrix operations | 2D threading, indexing |
60+
| `03_matrix_addition_hip.cpp` | HIP 2D matrix operations | HIP indexing, portability |
5161
| `04_device_info_cuda.cu` | GPU properties and capabilities | Device queries, system info |
52-
| `05_performance_comparison.cu` | CPU vs GPU benchmarking | Performance analysis, timing |
53-
| `06_debug_example.cu` | Debugging and optimization | Error checking, occupancy |
62+
| `04_device_info_hip.cpp` | HIP device and platform info | HIP device queries |
63+
| `05_performance_comparison_cuda.cu` | CPU vs GPU benchmarking (CUDA) | Performance analysis, timing |
64+
| `05_performance_comparison_hip.cpp` | Benchmarking (HIP) | HIP performance, memory bandwidth |
65+
| `06_debug_example_cuda.cu` | Debugging and optimization (CUDA) | Error checking, occupancy |
66+
| `06_debug_example_hip.cpp` | Debugging and optimization (HIP) | HIP debugging |
67+
| `07_cross_platform_comparison.cpp` | AMD vs NVIDIA comparison | Portability, tuning |
5468

5569
## Topics Covered
5670

modules/module1/content.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Module 1: Foundations of GPU Programming with CUDA and HIP
22
*Heterogeneous Data Parallel Computing*
33

4+
> Environment note: Examples are validated in containers using CUDA 12.9.1 (Ubuntu 22.04) and ROCm 7.0 (rocm/dev-ubuntu-22.04:7.0-complete). Using Docker is recommended for a consistent setup.
5+
46
## Learning Objectives
57
After completing this module, you will be able to:
68
- Understand the fundamental differences between CPU and GPU architectures

modules/module1/examples/README.md

Lines changed: 39 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This directory contains practical examples that accompany Module 1 of the GPU Pr
66

77
### CUDA Examples (NVIDIA)
88
| File | Description | Key Concepts |
9-
|------|-------------||--------------|
9+
|------|-------------|--------------|
1010
| `01_vector_addition_cuda.cu` | Basic CUDA vector addition with error handling | Kernels, memory management, error checking |
1111
| `03_matrix_addition_cuda.cu` | 2D matrix addition with thread indexing | 2D threading, grid configuration |
1212
| `04_device_info_cuda.cu` | Query and display GPU properties | Device queries, capability checking |
@@ -15,7 +15,7 @@ This directory contains practical examples that accompany Module 1 of the GPU Pr
1515

1616
### HIP Examples (AMD/NVIDIA Cross-Platform)
1717
| File | Description | Key Concepts |
18-
|------|-------------||--------------|
18+
|------|-------------|--------------|
1919
| `02_vector_addition_hip.cpp` | Cross-platform vector addition using HIP | HIP API, portability |
2020
| `03_matrix_addition_hip.cpp` | 2D matrix addition with HIP | Cross-platform 2D threading |
2121
| `04_device_info_hip.cpp` | HIP device properties and platform detection | HIP device queries, platform abstraction |
@@ -26,14 +26,14 @@ This directory contains practical examples that accompany Module 1 of the GPU Pr
2626
## Prerequisites
2727

2828
### For CUDA Examples
29-
- NVIDIA GPU with compute capability 3.5+
30-
- NVIDIA drivers (version 450+)
31-
- CUDA Toolkit 11.0+
29+
- NVIDIA GPU with compute capability 5.0+
30+
- NVIDIA drivers 550+ recommended
31+
- CUDA Toolkit 12.0+ (Docker uses CUDA 12.9.1)
3232
- GCC/Clang compiler
3333

3434
### For HIP Examples
3535
- AMD GPU with ROCm support OR NVIDIA GPU
36-
- ROCm 4.0+ (for AMD) or CUDA 11.0+ (for NVIDIA backend)
36+
- ROCm 6.0+ (for AMD) or CUDA 12.0+ (for NVIDIA backend)
3737
- HIP compiler (hipcc)
3838

3939
## Quick Start
@@ -59,23 +59,25 @@ make help
5959

6060
### Manual Compilation
6161

62+
Binaries are written to `build/` by the Makefile.
63+
6264
**CUDA Examples:**
6365
```bash
64-
nvcc -o vector_add 01_vector_addition_cuda.cu
65-
nvcc -o matrix_add 03_matrix_addition_cuda.cu
66-
nvcc -o device_info 04_device_info_cuda.cu
67-
nvcc -o performance 05_performance_comparison.cu
68-
nvcc -o debug 06_debug_example.cu
66+
nvcc -o build/01_vector_addition_cuda 01_vector_addition_cuda.cu
67+
nvcc -o build/03_matrix_addition_cuda 03_matrix_addition_cuda.cu
68+
nvcc -o build/04_device_info_cuda 04_device_info_cuda.cu
69+
nvcc -o build/05_performance_comparison_cuda 05_performance_comparison_cuda.cu
70+
nvcc -o build/06_debug_example_cuda 06_debug_example_cuda.cu
6971
```
7072

7173
**HIP Examples:**
7274
```bash
73-
hipcc -o vector_add_hip 02_vector_addition_hip.cpp
74-
hipcc -o matrix_add_hip 03_matrix_addition_hip.cpp
75-
hipcc -o device_info_hip 04_device_info_hip.cpp
76-
hipcc -o performance_hip 05_performance_comparison_hip.cpp
77-
hipcc -o debug_hip 06_debug_example_hip.cpp
78-
hipcc -o cross_platform 07_cross_platform_comparison.cpp
75+
hipcc -o build/02_vector_addition_hip 02_vector_addition_hip.cpp
76+
hipcc -o build/03_matrix_addition_hip 03_matrix_addition_hip.cpp
77+
hipcc -o build/04_device_info_hip 04_device_info_hip.cpp
78+
hipcc -o build/05_performance_comparison_hip 05_performance_comparison_hip.cpp
79+
hipcc -o build/06_debug_example_hip 06_debug_example_hip.cpp
80+
hipcc -o build/07_cross_platform_comparison 07_cross_platform_comparison.cpp
7981
```
8082

8183
## Example Descriptions
@@ -91,8 +93,8 @@ Demonstrates:
9193

9294
**Usage:**
9395
```bash
94-
make vector_add_cuda
95-
./vector_add_cuda
96+
make
97+
./build/01_vector_addition_cuda
9698
```
9799

98100
**Expected Output:**
@@ -116,8 +118,8 @@ Demonstrates:
116118

117119
**Usage:**
118120
```bash
119-
make vector_add_hip
120-
./vector_add_hip
121+
make hip
122+
./build/02_vector_addition_hip
121123
```
122124

123125
### 3. Matrix Addition (CUDA)
@@ -131,8 +133,8 @@ Demonstrates:
131133

132134
**Usage:**
133135
```bash
134-
make matrix_add_cuda
135-
./matrix_add_cuda
136+
make
137+
./build/03_matrix_addition_cuda
136138
```
137139

138140
### 3b. Matrix Addition (HIP)
@@ -146,8 +148,8 @@ Demonstrates:
146148

147149
**Usage:**
148150
```bash
149-
make matrix_add_hip
150-
./matrix_add_hip
151+
make hip
152+
./build/03_matrix_addition_hip
151153
```
152154

153155
### 4. Device Information (CUDA)
@@ -161,8 +163,8 @@ Demonstrates:
161163

162164
**Usage:**
163165
```bash
164-
make device_info_cuda
165-
./device_info_cuda
166+
make
167+
./build/04_device_info_cuda
166168
```
167169

168170
### 4b. Device Information (HIP)
@@ -176,8 +178,8 @@ Demonstrates:
176178

177179
**Usage:**
178180
```bash
179-
make device_info_hip
180-
./device_info_hip
181+
make hip
182+
./build/04_device_info_hip
181183
```
182184

183185
### 5. Performance Comparison (CUDA)
@@ -191,8 +193,8 @@ Demonstrates:
191193

192194
**Usage:**
193195
```bash
194-
make performance_cuda
195-
./performance_cuda
196+
make
197+
./build/05_performance_comparison_cuda
196198
```
197199

198200
### 5b. Performance Comparison (HIP)
@@ -207,8 +209,8 @@ Demonstrates:
207209

208210
**Usage:**
209211
```bash
210-
make performance_hip
211-
./performance_hip
212+
make hip
213+
./build/05_performance_comparison_hip
212214
```
213215

214216
### 6. Debug Example (CUDA)
@@ -222,8 +224,8 @@ Demonstrates:
222224

223225
**Usage:**
224226
```bash
225-
make debug_cuda
226-
./debug_cuda
227+
make debug
228+
./build/06_debug_example_cuda
227229
```
228230

229231
### 6b. Debug Example (HIP)
@@ -238,8 +240,8 @@ Demonstrates:
238240

239241
**Usage:**
240242
```bash
241-
make debug_hip
242-
./debug_hip
243+
make debug hip
244+
./build/06_debug_example_hip
243245
```
244246

245247
### 7. Cross-Platform Comparison

modules/module2/README.md

Lines changed: 37 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Module 2: Multi-Dimensional Data Processing
1+
# Module 2: Advanced GPU Memory Management
22

33
## Overview
4-
This module explores multidimensional grid organization, thread mapping to data structures, image processing kernels, and matrix multiplication algorithms.
4+
This module focuses on GPU memory hierarchy mastery and performance optimization: shared memory tiling, memory coalescing, texture/read-only memory usage, unified memory, and bandwidth optimization.
55

66
## Learning Objectives
77
After completing this module, you will be able to:
@@ -12,16 +12,31 @@ After completing this module, you will be able to:
1212
- Handle boundary conditions in multidimensional algorithms
1313

1414
## Module Content
15-
- **[content.md](content.md)** - Complete module content (Coming Soon)
16-
- **[examples/](examples/)** - Practical code examples (Coming Soon)
15+
- **[content.md](content.md)** - Complete module content
16+
- **[examples/](examples/)** - Practical code examples
1717

18-
## Status: 🚧 Under Development
18+
## Quick Start
1919

20-
This module is currently being developed. Check back soon for:
21-
- Comprehensive theory and explanations
22-
- Working code examples
23-
- Hands-on exercises
24-
- Performance benchmarks
20+
### Prerequisites
21+
- NVIDIA GPU with CUDA support OR AMD GPU with ROCm support
22+
- CUDA Toolkit 12.0+ or ROCm 6.0+ (Docker images provide CUDA 12.9.1 and ROCm 7.0)
23+
- C/C++ compiler (GCC, Clang, or MSVC)
24+
25+
Recommended: use our Docker dev environment
26+
```
27+
./docker/scripts/run.sh --auto
28+
```
29+
30+
### Build and Run
31+
```bash
32+
cd modules/module2/examples
33+
make # auto-detects your GPU and builds accordingly
34+
35+
# Run a few examples (binaries in build/)
36+
./build/01_shared_memory_transpose_cuda # or _hip on AMD
37+
./build/02_memory_coalescing_cuda # or _hip on AMD
38+
./build/04_unified_memory_cuda
39+
```
2540

2641
## Topics to be Covered
2742

@@ -30,20 +45,20 @@ This module is currently being developed. Check back soon for:
3045
- Grid size calculations for arbitrary data sizes
3146
- Thread-to-data mapping strategies
3247

33-
### 2. Image Processing Applications
34-
- Image convolution kernels
35-
- Color space transformations
36-
- Image filtering and enhancement
48+
### 2. Memory Access Patterns
49+
- Coalesced vs strided access
50+
- Structure of Arrays vs Array of Structures
51+
- Read-only/texture cache benefits
3752

38-
### 3. Matrix Operations
39-
- Matrix multiplication algorithms
40-
- Tiled matrix multiplication
41-
- Memory access optimization
53+
### 3. Shared Memory and Tiling
54+
- Tiled transpose with bank-conflict avoidance
55+
- Block-level cooperation and synchronization
56+
- Padding strategies to avoid bank conflicts
4257

43-
### 4. Advanced Indexing
44-
- Row-major vs column-major layouts
45-
- Handling non-square matrices
46-
- Boundary checking techniques
58+
### 4. Unified Memory and Bandwidth
59+
- Unified memory prefetch and advice
60+
- Measuring and optimizing memory bandwidth
61+
- Analyzing profiler metrics for memory performance
4762

4863
---
4964
**Duration**: 6-8 hours

modules/module2/content.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Module 2: Advanced GPU Memory Management and Optimization
22
*Mastering GPU Memory Hierarchies and Performance Optimization*
33

4+
> Environment note: Examples are tested in Docker containers with CUDA 12.9.1 and ROCm 7.0 (rocm/dev-ubuntu-22.04:7.0-complete). Prefer Docker for reproducible builds.
5+
46
## Learning Objectives
57
After completing this module, you will be able to:
68
- Master GPU memory hierarchy and optimization strategies

0 commit comments

Comments
 (0)