Skip to content

Commit a7d9fbd

Browse files
committed
CUDA Topic Added
Addition of CUDA topic README.md and file structure.
1 parent dfe2c44 commit a7d9fbd

3 files changed

Lines changed: 180 additions & 0 deletions

File tree

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# CUDA
2+
3+
## Overview
4+
5+
CUDA (Compute Unified Device Architecture) is NVIDIA’s parallel computing platform and API that enables software developers to write programs that run on GPUs. For high-frequency trading (HFT) and quantitative systems, CUDA unlocks the ability to accelerate latency-sensitive workloads such as real-time signal processing, backtesting, portfolio optimisation, and deep learning inference.
6+
7+
In a domain where nanoseconds matter, offloading computation to GPUs can deliver measurable performance gains in execution speed, throughput, and energy efficiency.
8+
9+
---
10+
11+
## Status: ⚪ Advanced
12+
13+
| Who should learn this? |
14+
|------------------------|
15+
| ✅ Quant developers seeking GPU acceleration |
16+
| ✅ HFT engineers exploring hardware optimisation |
17+
| ✅ AI/ML practitioners deploying models at low latency |
18+
| ✅ Systems engineers building backtest engines or RL simulators |
19+
20+
---
21+
22+
## Prerequisites
23+
24+
- Strong C/C++ programming ability
25+
- Understanding of parallel programming (OpenMP, multithreading, etc.)
26+
- Familiarity with memory hierarchies and compiler toolchains
27+
- Basic linear algebra and numerical computation
28+
- Recommended: Completion of `systems-programming/`, `numerical-computing/`, and `parallel-computing/`
29+
30+
---
31+
32+
## Learning Objectives
33+
34+
- Understand the CUDA programming model and memory architecture
35+
- Write, compile, and run custom CUDA kernels
36+
- Profile and optimise GPU code for latency and throughput
37+
- Integrate CUDA pipelines into backtesting, RL agents, or order book models
38+
- Compare GPU-based vs CPU-based implementations in trading contexts
39+
40+
---
41+
42+
## Key Concepts
43+
44+
- **Kernels** – GPU-side functions executed by thousands of threads
45+
- **Thread Blocks & Grids** – Organisation of parallel execution
46+
- **Shared, Global, Constant Memory** – Understanding memory types and their access costs
47+
- **Warp Divergence & Occupancy** – Performance tuning considerations
48+
- **Pinned Memory & Streams** – Optimising CPU–GPU communication latency
49+
50+
---
51+
52+
## Applications in Algorithmic Trading
53+
54+
- **Accelerated Backtesting** – Speeding up historical simulations for large datasets
55+
- **GPU-Driven Inference** – Running ML models at microsecond latency per decision
56+
- **Real-Time Feature Extraction** – Tick-by-tick feature computation
57+
- **Options Pricing & Monte Carlo** – Thousands of simulations in parallel
58+
- **Market Microstructure Modelling** – High-resolution stochastic agent-based simulation
59+
60+
---
61+
## Study Materials
62+
63+
### 📚 Books
64+
65+
#### 📘 Beginner
66+
67+
| Title | Author(s) | Description | Link |
68+
|-------|-----------|-------------|------|
69+
| *CUDA by Example* | Jason Sanders, Edward Kandrot | Friendly introduction using C, with step-by-step projects | [NVIDIA Press](https://developer.nvidia.com/cuda-example) |
70+
| *Hands-On GPU Programming with CUDA in Python* | Dr. Brian Tuomanen | Teaches Python-based CUDA via Numba and CuPy | [Packt](https://www.packtpub.com/product/hands-on-gpu-programming-with-cuda-in-python/9781788624290) |
71+
72+
#### 📗 Intermediate
73+
74+
| Title | Author(s) | Description | Link |
75+
|-------|-----------|-------------|------|
76+
| *Programming Massively Parallel Processors (4th Ed)* | David B. Kirk, Wen-mei W. Hwu | In-depth treatment of parallelism, optimisation, and hardware theory | [Morgan Kaufmann](https://www.elsevier.com/books/programming-massively-parallel-processors/kirk/978-0-12-822323-3) |
77+
| *CUDA for Engineers* | Duane Storti, Mete Yurtoglu | Bridges performance computing with engineering applications | [Pearson](https://www.pearson.com/en-us/subject-catalog/p/cuda-for-engineers-an-introduction-to-parallel-programming/P200000003223) |
78+
| *The CUDA Handbook* | Nicholas Wilt | Deep dive into CUDA architecture, compilation, memory models, and API design | [Amazon](https://www.amazon.com/CUDA-Handbook-Guide-Programming-GPUs/dp/0321809467) |
79+
80+
#### 📙 Advanced
81+
82+
| Title | Author(s) | Description | Link |
83+
|-------|-----------|-------------|------|
84+
| *GPU Parallel Program Development Using CUDA* | Tolga Soyata | Includes latency benchmarks and system-level design with GPUs | [Morgan Kaufmann](https://www.elsevier.com/books/gpu-parallel-program-development-using-cuda/soyata/978-0-12-416970-2) |
85+
| *High Performance CUDA for Engineers and Scientists* | Massimiliano Fatica (NVIDIA) | Covers scientific workflows, CUDA tuning, memory models, and HPC strategies | [Springer](https://link.springer.com/book/10.1007/978-3-030-47060-9) |
86+
| *High Performance Python* | Micha Gorelick, Ian Ozsvald | Though not CUDA-exclusive, discusses vectorisation and GPU workflows | [O'Reilly](https://www.oreilly.com/library/view/high-performance-python/9781449361747/) |
87+
88+
---
89+
90+
### 🎓 Courses
91+
92+
#### 📘 Beginner
93+
94+
| Course Title | Provider | Level | Description |
95+
|--------------|----------|--------|-------------|
96+
| [Intro to Parallel Programming (CS344)](https://www.udacity.com/course/intro-to-parallel-programming--cs344) | Udacity | Beginner | CUDA-focused introduction to data parallelism and GPU concepts |
97+
| [MIT 6.189: Parallel Programming Intro](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-189-a-gentle-introduction-to-parallel-programming-january-iap-2007/) | MIT OCW | Beginner | Conceptual intro to parallel programming and shared memory |
98+
99+
#### 📗 Intermediate
100+
101+
| Course Title | Provider | Level | Description |
102+
|--------------|----------|--------|-------------|
103+
| [Parallel Programming with CUDA](https://developer.nvidia.com/parallel-thread-execution) | NVIDIA | Intermediate | Developer-focused CUDA tutorials and docs |
104+
| [High-Performance Scientific Computing](https://github.com/HP-SCL/learning-cuda) | HP-SCL | Intermediate | Practical CUDA, OpenMP, and MPI examples with code |
105+
| [CS193G: Programming Massively Parallel Processors](https://web.stanford.edu/class/cs193g/) | Stanford | Intermediate | CUDA C++, memory optimisation, project-driven course (archived) |
106+
107+
#### 📙 Advanced
108+
109+
| Course Title | Provider | Level | Description |
110+
|--------------|----------|--------|-------------|
111+
| [GPU Computing Specialisation (UIC)](https://www.coursera.org/specializations/gpu-computing) | Coursera | Advanced | Designed for HPC professionals; includes simulation and finance case studies |
112+
| [GPU-Accelerated Computing with CUDA and Python](https://learnopencv.com/gpu-computing-with-cuda-and-python/) | LearnOpenCV | Advanced | Real-world examples including computer vision and ML inference pipelines |
113+
114+
---
115+
116+
### 🏅 Certifications & Developer Programs
117+
118+
| Credential | Provider | Description |
119+
|------------|----------|-------------|
120+
| **CUDA Programming Certificate** | NVIDIA DLI | Completion badge for hands-on CUDA C/C++ course via NVIDIA’s Deep Learning Institute |
121+
| **Certified CUDA Developer** | NVIDIA | Recognition for successful completion of CUDA development workshops and assessments |
122+
| **Jetson AI Specialist** | NVIDIA | Validates knowledge of deploying CUDA-accelerated AI models on edge devices |
123+
| **NVIDIA Developer Program** | NVIDIA | Free access to CUDA SDKs, tools, and exclusive learning tracks |
124+
| **Intel oneAPI GPU Programming Badge** *(optional)* | Intel | Demonstrates cross-vendor parallel compute skills (non-CUDA) |
125+
126+
---
127+
128+
## 🛠️ Tools & Libraries
129+
130+
- **NVIDIA Nsight Compute / Nsight Systems** – CUDA performance diagnostics and profiling
131+
- **nvcc** – CUDA compiler for building `.cu` programs
132+
- **CuPy / Numba / RAPIDS** – Python-based GPU acceleration frameworks
133+
- **TorchScript + TensorRT** – GPU inference for ML workloads
134+
- **Backtrader + Numba** – Accelerated strategy backtesting
135+
- **Thrust** – STL-like C++ template library for parallel algorithms on CUDA
136+
- **CUDA SDK Examples** – Starter kernel implementations from NVIDIA
137+
138+
---
139+
140+
## 🧪 Hands-On Projects
141+
142+
- Port a matrix multiplication function to CUDA and benchmark it
143+
- Accelerate a tick data parser or streaming windowed average calculator
144+
- Run an inference loop on GPU using PyTorch with TorchScript
145+
- Profile execution time across CPU-only vs CUDA-enabled backtests
146+
- Build a GPU-enabled Monte Carlo simulation for options pricing
147+
148+
---
149+
150+
## ✅ Assessment
151+
152+
- Can you explain when CUDA outperforms traditional CPU solutions?
153+
- Can you write, compile, and profile a basic CUDA kernel?
154+
- Can you integrate GPU acceleration into an existing Python/C++ trading pipeline?
155+
- Do you understand the memory model and how to minimise divergence or contention?
156+
157+
---
158+
159+
## ❓ FAQs
160+
161+
**Q: Can I learn CUDA without an NVIDIA GPU?**
162+
A: You can start with emulators or cloud instances, but true performance testing requires a compatible GPU.
163+
164+
**Q: Do I need to master CUDA if I use Python libraries like CuPy or Numba?**
165+
A: Not necessarily, but understanding what’s happening under the hood will help you write better vectorised and accelerated code.
166+
167+
**Q: Is this useful outside of HFT?**
168+
A: Absolutely — CUDA is used in ML training, video processing, simulation, and scientific computing.
169+
170+
---
171+
172+
173+
174+
## 🔗 Next Steps
175+
176+
- [Parallel Computing](../parallel-computing/) – Foundational knowledge for GPU programming
177+
- [Numerical Computing](../numerical-computing/) – Algorithms that benefit from acceleration
178+
- [Machine Learning](../machine-learning/) – Where inference and training need performance
179+
- [Backtesting Engines](../../trading-systems/backtesting/) – Integrate GPU-optimised pipelines
180+

roadmap/computer-science/cuda/resources/books/.gitkeep

Whitespace-only changes.

roadmap/computer-science/cuda/resources/courses/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)