Skip to content

Commit 1dad4e1

Browse files
committed
Updated the framework of blog and code blocks
1 parent a6a5b7a commit 1dad4e1

6 files changed

Lines changed: 32 additions & 33 deletions

File tree

blog/cuda-optimization.html

Lines changed: 5 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<!-- Tailwind CSS -->
88
<script src="https://cdn.tailwindcss.com"></script>
99
<!-- Font Awesome -->
10-
<link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
10+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
1111
<!-- Google Fonts - Inter -->
1212
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
1313
<!-- Prism.js for syntax highlighting -->
@@ -133,7 +133,7 @@ <h1 class="text-[clamp(1.8rem,3vw,2.5rem)] font-bold mb-6">Optimizing CUDA Kerne
133133
<h3>Why Optimization Matters (Numbers Included)</h3>
134134
<p>Our initial CUDA examples (in <code class="bg-gray-100 px-1 py-0.5 rounded">02_basic_kernels</code>) are great for learning, but they leave significant performance on the table. For example, our naive matrix multiplication kernel:</p>
135135

136-
<code class="bg-gray-100 px-1 py-0.5 rounded block my-4">// Naive implementation (02_basic_kernels/matrix_mult.cu)
136+
<pre class="code-block"><code class="language-c">// Naive implementation (02_basic_kernels/matrix_mult.cu)
137137
__global__ void matrixMultiply(float *C, float *A, float *B, int N) {
138138
int row = blockIdx.y * blockDim.y + threadIdx.y;
139139
int col = blockIdx.x * blockDim.x + threadIdx.x;
@@ -145,7 +145,7 @@ <h3>Why Optimization Matters (Numbers Included)</h3>
145145
}
146146
C[row * N + col] = sum;
147147
}
148-
}</code>
148+
}</code></pre>
149149

150150
<p>Runs at ~120 GFLOPS on an NVIDIA RTX 3090. With our new optimizations? <strong>1.8 TFLOPS</strong> — a 15x improvement. Here’s how we did it.</p>
151151

@@ -155,7 +155,7 @@ <h3>Key Optimizations in the New Module</h3>
155155
<h3>1. Shared Memory Tiling</h3>
156156
<p>Reduces global memory access by reusing data in shared memory (NVIDIA’s on-chip cache). Our <code class="bg-gray-100 px-1 py-0.5 rounded">tiled_matrix_mult.cu</code> implements 16x16 tiles:</p>
157157

158-
<code class="bg-gray-100 px-1 py-0.5 rounded block my-4">// Tiled implementation (04_cuda_optimization/tiled_matrix_mult.cu)
158+
<pre class="code-block"><code class="language-c">// Tiled implementation (04_cuda_optimization/tiled_matrix_mult.cu)
159159
__global__ void tiledMatrixMult(float *C, float *A, float *B, int N) {
160160
__shared__ float As[16][16]; // Shared memory tile for A
161161
__shared__ float Bs[16][16]; // Shared memory tile for B
@@ -173,26 +173,7 @@ <h3>1. Shared Memory Tiling</h3>
173173
sum += As[threadIdx.y][k] * Bs[k][threadIdx.x];
174174
__syncthreads();
175175
}
176-
}</code>
177-
<pre class="code-block"><code class="language-c">// Tiled implementation (04_cuda_optimization/tiled_matrix_mult.cu)
178-
__global__ void tiledMatrixMult(float *C, float *A, float *B, int N) {
179-
__shared__ float As[16][16]; // Shared memory tile for A
180-
__shared__ float Bs[16][16]; // Shared memory tile for B
181-
182-
// ... (tile loading logic) ...
183-
184-
// Reuse tiles for 16x16 output elements
185-
for (int t = 0; t < N/16; t++) {
186-
// Load tiles from global memory to shared memory
187-
As[threadIdx.y][threadIdx.x] = A[...];
188-
Bs[threadIdx.y][threadIdx.x] = B[...];
189-
__syncthreads();
190-
191-
// Compute partial sum using shared memory
192-
sum += As[threadIdx.y][k] * Bs[k][threadIdx.x];
193-
__syncthreads();
194-
}
195-
}</code></pre>
176+
}</code></pre>
196177

197178
<h3>2. Memory Coalescing</h3>
198179
<p>Our <code class="bg-gray-100 px-1 py-0.5 rounded">memory_coalescing_demo.cu</code> shows how to align global memory access with GPU memory banks, reducing latency by 70% for strided access patterns.</p>

blog/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<!-- Tailwind CSS -->
88
<script src="https://cdn.tailwindcss.com"></script>
99
<!-- Font Awesome -->
10-
<link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
10+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
1111
<!-- Google Fonts - Inter -->
1212
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
1313

blog/quantum-ibm-integration.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<!-- Tailwind CSS -->
88
<script src="https://cdn.tailwindcss.com"></script>
99
<!-- Font Awesome -->
10-
<link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
10+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
1111
<!-- Google Fonts - Inter -->
1212
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
1313

blog/rl-agent-improvements.html

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,16 @@
77
<!-- Tailwind CSS -->
88
<script src="https://cdn.tailwindcss.com"></script>
99
<!-- Font Awesome -->
10-
<link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
10+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
1111
<!-- Google Fonts - Inter -->
1212
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
13-
13+
<!-- Prism.js for syntax highlighting -->
14+
<link href="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/themes/prism-tomorrow.min.css" rel="stylesheet">
15+
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/prism.min.js"></script>
16+
<!-- Add language support (e.g., Python, C++) -->
17+
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-python.min.js"></script>
18+
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-c.min.js"></script>
19+
1420
<!-- Tailwind Configuration -->
1521
<script>
1622
tailwind.config = {
@@ -41,6 +47,18 @@
4147
.blog-content h3 {
4248
@apply text-xl font-semibold mt-8 mb-4;
4349
}
50+
/* Code block styles */
51+
.code-block {
52+
@apply bg-gray-900 text-gray-100 rounded-lg p-4 my-6 overflow-x-auto;
53+
font-family: 'Fira Code', 'SFMono-Regular', Menlo, Monaco, Consolas, monospace;
54+
line-height: 1.6;
55+
font-size: 0.875rem; /* 14px */
56+
}
57+
/* Inline code styles */
58+
.blog-content code:not(.code-block code) {
59+
@apply bg-gray-100 text-gray-800 px-1.5 py-0.5 rounded text-sm font-medium;
60+
font-family: 'Fira Code', monospace;
61+
}
4462
}
4563
</style>
4664
</head>
@@ -117,8 +135,8 @@ <h3>Key Contributions We’re Celebrating</h3>
117135

118136
<h3>1. Priority Experience Replay (by @ml-engineer-jane)</h3>
119137
<p>Our original DQN used uniform experience replay, which wastes time on low-impact transitions. @ml-engineer-jane implemented prioritized replay, weighting transitions by their temporal difference (TD) error:</p>
120-
121-
<code class="bg-gray-100 px-1 py-0.5 rounded block my-4"># In 03_advanced_agents/dqn_prioritized.py
138+
139+
<pre class="code-block"><code class="language-python"># In 03_advanced_agents/dqn_prioritized.py
122140
class PrioritizedReplayBuffer:
123141
def __init__(self, capacity, alpha=0.6):
124142
self.capacity = capacity
@@ -139,7 +157,7 @@ <h3>1. Priority Experience Replay (by @ml-engineer-jane)</h3>
139157
probabilities /= probabilities.sum()
140158

141159
indices = np.random.choice(len(self.memory), batch_size, p=probabilities)
142-
# ... (return weighted samples) ...</code>
160+
# ... (return weighted samples) ...</code></pre>
143161

144162
<p>Result: DQN solves LunarLander in 400 episodes (down from 1,200) with 2x higher average reward.</p>
145163

privacy.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
<div class="container mx-auto px-4 py-3">
3434
<a href="index.html" class="flex items-center gap-2">
3535
<div class="w-10 h-10 rounded-lg bg-primary flex items-center justify-center">
36-
<i class="fa-solid fa-cloud-bolt text-white text-xl"></i>
36+
<i class="fas fa-bolt text-white text-xl"></i>
3737
</div>
3838
<span class="text-xl font-bold text-primary">AIComputing101</span>
3939
</a>

terms.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
<div class="container mx-auto px-4 py-3">
3434
<a href="index.html" class="flex items-center gap-2">
3535
<div class="w-10 h-10 rounded-lg bg-primary flex items-center justify-center">
36-
<i class="fa-solid fa-cloud-bolt text-white text-xl"></i>
36+
<i class="fas fa-bolt text-white text-xl"></i>
3737
</div>
3838
<span class="text-xl font-bold text-primary">AIComputing101</span>
3939
</a>

0 commit comments

Comments
 (0)