Updated the framework of blog and code blocks

stephen2ml · stephen2ml · commit 1dad4e1c2046 · 2025-10-13T11:52:55.000-04:00
diff --git a/blog/cuda-optimization.html b/blog/cuda-optimization.html
@@ -7,7 +7,7 @@
     <!-- Tailwind CSS -->
     <script src="https://cdn.tailwindcss.com"></script>
     <!-- Font Awesome -->
-    <link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
     <!-- Google Fonts - Inter -->
     <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
     <!-- Prism.js for syntax highlighting -->
@@ -133,7 +133,7 @@ <h1 class="text-[clamp(1.8rem,3vw,2.5rem)] font-bold mb-6">Optimizing CUDA Kerne
                     <h3>Why Optimization Matters (Numbers Included)</h3>
                     <p>Our initial CUDA examples (in <code class="bg-gray-100 px-1 py-0.5 rounded">02_basic_kernels</code>) are great for learning, but they leave significant performance on the table. For example, our naive matrix multiplication kernel:</p>
                     
-                    <code class="bg-gray-100 px-1 py-0.5 rounded block my-4">// Naive implementation (02_basic_kernels/matrix_mult.cu)
+                    <pre class="code-block"><code class="language-c">// Naive implementation (02_basic_kernels/matrix_mult.cu)
 __global__ void matrixMultiply(float *C, float *A, float *B, int N) {
     int row = blockIdx.y * blockDim.y + threadIdx.y;
     int col = blockIdx.x * blockDim.x + threadIdx.x;
@@ -145,7 +145,7 @@ <h3>Why Optimization Matters (Numbers Included)</h3>
         }
         C[row * N + col] = sum;
     }
-}</code>
+}</code></pre>
 
                     <p>Runs at ~120 GFLOPS on an NVIDIA RTX 3090. With our new optimizations? <strong>1.8 TFLOPS</strong> — a 15x improvement. Here’s how we did it.</p>
 
@@ -155,7 +155,7 @@ <h3>Key Optimizations in the New Module</h3>
                     <h3>1. Shared Memory Tiling</h3>
                     <p>Reduces global memory access by reusing data in shared memory (NVIDIA’s on-chip cache). Our <code class="bg-gray-100 px-1 py-0.5 rounded">tiled_matrix_mult.cu</code> implements 16x16 tiles:</p>
                     
-                    <code class="bg-gray-100 px-1 py-0.5 rounded block my-4">// Tiled implementation (04_cuda_optimization/tiled_matrix_mult.cu)
+                    <pre class="code-block"><code class="language-c">// Tiled implementation (04_cuda_optimization/tiled_matrix_mult.cu)
 __global__ void tiledMatrixMult(float *C, float *A, float *B, int N) {
     __shared__ float As[16][16];  // Shared memory tile for A
     __shared__ float Bs[16][16];  // Shared memory tile for B
@@ -173,26 +173,7 @@ <h3>1. Shared Memory Tiling</h3>
         sum += As[threadIdx.y][k] * Bs[k][threadIdx.x];
         __syncthreads();
     }
-}</code>
-                    <pre class="code-block"><code class="language-c">// Tiled implementation (04_cuda_optimization/tiled_matrix_mult.cu)
-                    __global__ void tiledMatrixMult(float *C, float *A, float *B, int N) {
-                        __shared__ float As[16][16];  // Shared memory tile for A
-                        __shared__ float Bs[16][16];  // Shared memory tile for B
-                        
-                        // ... (tile loading logic) ...
-                        
-                        // Reuse tiles for 16x16 output elements
-                        for (int t = 0; t < N/16; t++) {
-                            // Load tiles from global memory to shared memory
-                            As[threadIdx.y][threadIdx.x] = A[...];
-                            Bs[threadIdx.y][threadIdx.x] = B[...];
-                            __syncthreads();
-                            
-                            // Compute partial sum using shared memory
-                            sum += As[threadIdx.y][k] * Bs[k][threadIdx.x];
-                            __syncthreads();
-                        }
-                    }</code></pre>
+}</code></pre>
 
                     <h3>2. Memory Coalescing</h3>
                     <p>Our <code class="bg-gray-100 px-1 py-0.5 rounded">memory_coalescing_demo.cu</code> shows how to align global memory access with GPU memory banks, reducing latency by 70% for strided access patterns.</p>
diff --git a/blog/index.html b/blog/index.html
@@ -7,7 +7,7 @@
     <!-- Tailwind CSS -->
     <script src="https://cdn.tailwindcss.com"></script>
     <!-- Font Awesome -->
-    <link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
     <!-- Google Fonts - Inter -->
     <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
     
diff --git a/blog/quantum-ibm-integration.html b/blog/quantum-ibm-integration.html
@@ -7,7 +7,7 @@
     <!-- Tailwind CSS -->
     <script src="https://cdn.tailwindcss.com"></script>
     <!-- Font Awesome -->
-    <link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
     <!-- Google Fonts - Inter -->
     <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
     
diff --git a/blog/rl-agent-improvements.html b/blog/rl-agent-improvements.html
@@ -7,10 +7,16 @@
     <!-- Tailwind CSS -->
     <script src="https://cdn.tailwindcss.com"></script>
     <!-- Font Awesome -->
-    <link href="https://cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" rel="stylesheet">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
     <!-- Google Fonts - Inter -->
     <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
-    
+    <!-- Prism.js for syntax highlighting -->
+    <link href="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/themes/prism-tomorrow.min.css" rel="stylesheet">
+    <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/prism.min.js"></script>
+    <!-- Add language support (e.g., Python, C++) -->
+    <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-python.min.js"></script>
+    <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-c.min.js"></script>
+
     <!-- Tailwind Configuration -->
     <script>
         tailwind.config = {
@@ -41,6 +47,18 @@
             .blog-content h3 {
                 @apply text-xl font-semibold mt-8 mb-4;
             }
+            /* Code block styles */
+            .code-block {
+                @apply bg-gray-900 text-gray-100 rounded-lg p-4 my-6 overflow-x-auto;
+                font-family: 'Fira Code', 'SFMono-Regular', Menlo, Monaco, Consolas, monospace;
+                line-height: 1.6;
+                font-size: 0.875rem; /* 14px */
+            }
+            /* Inline code styles */
+            .blog-content code:not(.code-block code) {
+                @apply bg-gray-100 text-gray-800 px-1.5 py-0.5 rounded text-sm font-medium;
+                font-family: 'Fira Code', monospace;
+            }            
         }
     </style>
 </head>
@@ -117,8 +135,8 @@ <h3>Key Contributions We’re Celebrating</h3>
 
                     <h3>1. Priority Experience Replay (by @ml-engineer-jane)</h3>
                     <p>Our original DQN used uniform experience replay, which wastes time on low-impact transitions. @ml-engineer-jane implemented prioritized replay, weighting transitions by their temporal difference (TD) error:</p>
-                    
-                    <code class="bg-gray-100 px-1 py-0.5 rounded block my-4"># In 03_advanced_agents/dqn_prioritized.py
+
+                    <pre class="code-block"><code class="language-python"># In 03_advanced_agents/dqn_prioritized.py
 class PrioritizedReplayBuffer:
     def __init__(self, capacity, alpha=0.6):
         self.capacity = capacity
@@ -139,7 +157,7 @@ <h3>1. Priority Experience Replay (by @ml-engineer-jane)</h3>
         probabilities /= probabilities.sum()
         
         indices = np.random.choice(len(self.memory), batch_size, p=probabilities)
-        # ... (return weighted samples) ...</code>
+        # ... (return weighted samples) ...</code></pre>
 
                     <p>Result: DQN solves LunarLander in 400 episodes (down from 1,200) with 2x higher average reward.</p>
 
diff --git a/privacy.html b/privacy.html
@@ -33,7 +33,7 @@
         <div class="container mx-auto px-4 py-3">
             <a href="index.html" class="flex items-center gap-2">
                 <div class="w-10 h-10 rounded-lg bg-primary flex items-center justify-center">
-                    <i class="fa-solid fa-cloud-bolt text-white text-xl"></i>
+                    <i class="fas fa-bolt text-white text-xl"></i>
                 </div>
                 <span class="text-xl font-bold text-primary">AIComputing101</span>
             </a>
diff --git a/terms.html b/terms.html
@@ -33,7 +33,7 @@
         <div class="container mx-auto px-4 py-3">
             <a href="index.html" class="flex items-center gap-2">
                 <div class="w-10 h-10 rounded-lg bg-primary flex items-center justify-center">
-                    <i class="fa-solid fa-cloud-bolt text-white text-xl"></i>
+                    <i class="fas fa-bolt text-white text-xl"></i>
                 </div>
                 <span class="text-xl font-bold text-primary">AIComputing101</span>
             </a>