Skip to content

Commit ba1f23e

Browse files
committed
Updated post of evlution of compilers
1 parent fcc982a commit ba1f23e

1 file changed

Lines changed: 123 additions & 4 deletions

File tree

blog/evolution-of-compilers.html

Lines changed: 123 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,6 @@
3131
</script>
3232

3333
<style type="text/tailwindcss">
34-
/* Replaced Tailwind @apply rules with plain CSS equivalents to avoid linter warnings
35-
Keep the original utility classes in the markup; these rules provide fallback styles
36-
in environments where Tailwind's postcss processor isn't available. */
3734
.content-auto { content-visibility: auto; }
3835
.blog-content p { margin-bottom: 1.5rem; line-height: 1.7; }
3936
.blog-content h3 { font-size: 1.125rem; font-weight: 600; margin-top: 2rem; margin-bottom: 1rem; color: #1E293B; }
@@ -46,6 +43,16 @@
4643
.blog-content th { background-color: #F9FAFB; font-weight: 600; }
4744
.card-hover { transition: all 0.3s ease; }
4845
.card-hover:hover { box-shadow: 0 8px 20px rgba(0,0,0,0.08); transform: translateY(-4px); }
46+
.diagram {
47+
font-family: monospace;
48+
white-space: pre;
49+
background-color: #f8fafc;
50+
padding: 1rem;
51+
border-radius: 0.5rem;
52+
border: 1px solid #e2e8f0;
53+
margin: 1.5rem 0;
54+
overflow-x: auto;
55+
}
4956
</style>
5057
</head>
5158
<body class="font-inter bg-light text-dark antialiased">
@@ -120,6 +127,15 @@ <h1 class="text-[clamp(1.8rem,3vw,2.5rem)] font-bold mb-6 leading-tight">The Evo
120127
<h3>1. The Minimal Compiler: Foundations of "Lex → Parse → Codegen"</h3>
121128
<p>Your POC compiler is the "hello world" of compiler design—and it’s where every key concept starts. Here’s its role:</p>
122129

130+
<div class="diagram">
131+
Minimal Compiler Workflow
132+
-------------------------
133+
Source Code → Tokens → AST → IR → Machine Code
134+
("return 42;") → (via Lexer) → (via Parser) → (IR Generator) → (Codegen)
135+
[TOKEN_RETURN, [ReturnStmt { [IR_Return { [movl $42, %eax;
136+
TOKEN_INTEGER}] value=42 }] value=42 }] ret]
137+
</div>
138+
123139
<ul>
124140
<li><strong>Core Stages</strong>: It turns raw text (<code>return 42;</code>) into executable code via four steps:
125141
<ol class="list-decimal pl-6 mt-2 mb-2">
@@ -135,6 +151,18 @@ <h3>1. The Minimal Compiler: Foundations of "Lex → Parse → Codegen"</h3>
135151

136152
<h3>2. Modern C++ Compilers (GCC, Clang): Optimizing for General-Purpose CPUs</h3>
137153
<p>As code grew more complex (think C++ templates, OOP, or multi-threading), compilers like GCC 14 or Clang 17 built on your POC’s core stages but added critical layers:</p>
154+
155+
<div class="diagram">
156+
Modern C++ Compiler (GCC/Clang)
157+
-------------------------------
158+
Source Code → Preprocessor → Tokens/AST → Optimization Passes → Machine Code
159+
(C++ Code) → (Macros, #include) (Analysis) → (Constant folding, → (x86_64/ARM/
160+
vectorization, etc.) other ISAs)
161+
162+
Link to Libraries
163+
(OpenBLAS, etc.)
164+
</div>
165+
138166
<ul>
139167
<li><strong>Advanced Optimizations</strong>: They turn naive code into fast code:
140168
<ul class="list-disc pl-6 mt-1 mb-1">
@@ -150,6 +178,27 @@ <h3>3. GPU Compilers: From Traditional (NVCC, ROCm HIP) to ML-Focused (Triton)</
150178

151179
<h4>NVIDIA’s NVCC: CUDA Ecosystem Specialization</h4>
152180
<p>NVCC (NVIDIA CUDA Compiler) is tightly integrated with NVIDIA’s GPU hardware, prioritizing performance for NVIDIA’s SM (Streaming Multiprocessor) architecture:</p>
181+
182+
<div class="diagram">
183+
NVIDIA NVCC Workflow
184+
--------------------
185+
CUDA Source Code
186+
187+
├─→ Host Code ─→ LLVM IR ─→ x86_64/ARM Assembly
188+
189+
└─→ Device Code
190+
191+
├─→ Parse CUDA Extensions (__global__, threadIdx)
192+
193+
├─→ Generate PTX (Parallel Thread Execution)
194+
195+
├─→ Optimize for NVIDIA SM Architecture
196+
197+
└─→ Compile to Cubin (GPU-specific binary)
198+
199+
└─→ Link with CUDA Libraries (cuBLAS, cuFFT)
200+
</div>
201+
153202
<ul>
154203
<li><strong>Heterogeneous Code Splitting</strong>:
155204
<ul class="list-disc pl-6 mt-1 mb-1">
@@ -169,6 +218,27 @@ <h4>NVIDIA’s NVCC: CUDA Ecosystem Specialization</h4>
169218

170219
<h4>AMD’s ROCm HIP: Portability-First Parallelism</h4>
171220
<p>AMD’s ROCm (Radeon Open Compute) uses HIP (Heterogeneous-Compute Interface for Portability) to balance cross-vendor compatibility with AMD GPU performance. It’s designed to let developers write code once and run it on both AMD and NVIDIA GPUs:</p>
221+
222+
<div class="diagram">
223+
AMD ROCm HIP Workflow
224+
---------------------
225+
HIP Source Code
226+
227+
├─→ Host Code ─→ LLVM IR ─→ x86_64/ARM Assembly
228+
229+
└─→ Device Code
230+
231+
├─→ Parse HIP Extensions (__global__, hipThreadIdx_x)
232+
233+
├─→ Generate LLVM IR (with AMD GPU metadata)
234+
235+
├─→ Optimize for AMD CDNA Architecture
236+
237+
└─→ Compile to Code Objects (AMD binary format)
238+
239+
└─→ Link with ROCm Libraries (hipBLAS, hipFFT)
240+
</div>
241+
172242
<ul>
173243
<li><strong>HIP: A Familiar, Portable Abstraction</strong>:
174244
<p>HIP mimics CUDA syntax (e.g., <code>__global__</code> for kernels) but compiles to AMD’s hardware via <strong>HIP-Clang</strong>—a LLVM-based compiler that parses HIP code and splits it into host/device paths.</p>
@@ -191,6 +261,30 @@ <h4>AMD’s ROCm HIP: Portability-First Parallelism</h4>
191261

192262
<h4>Triton Compiler: ML-Focused GPU Programming for Everyone</h4>
193263
<p>Developed by OpenAI (now open-source), Triton represents a shift in GPU compiler design: it prioritizes <strong>ML workloads</strong> and <strong>programmer productivity</strong> without sacrificing performance. Unlike NVCC/HIP (which require low-level kernel writing), Triton lets developers write GPU-accelerated ML code in Python-like syntax.</p>
264+
265+
<div class="diagram">
266+
Triton Compiler Workflow
267+
------------------------
268+
Python-like Triton Code
269+
270+
├─→ Parse Triton Syntax
271+
272+
├─→ Generate Triton IR (ML-optimized)
273+
274+
├─→ ML-specific Optimizations
275+
│ (Autotuning, Tensor Coalescing, Operator Fusion)
276+
277+
└─→ Generate Target Code
278+
279+
├─→ NVIDIA GPUs: PTX → Cubin
280+
281+
├─→ AMD GPUs: LLVM IR → Code Objects
282+
283+
└─→ CPUs: LLVM IR → x86_64/ARM Assembly
284+
285+
└─→ Integration with PyTorch/TensorFlow
286+
</div>
287+
194288
<ul>
195289
<li><strong>Core Philosophy</strong>: "Write once, run fast on any GPU." Triton abstracts away GPU-specific details (threads, warps, shared memory) so ML researchers can focus on algorithms, not hardware.</li>
196290
<li><strong>Compilation Pipeline</strong>:
@@ -264,6 +358,31 @@ <h4>Best Practices for Modern GPU Compilers</h4>
264358

265359
<h3>4. CUDA-Q: Quantum-Classical Hybrids</h3>
266360
<p>The next frontier? Quantum computing. Compilers like NVIDIA’s CUDA-Q extend GPU compiler principles to quantum processors, linking classical CPU/GPU code with quantum circuits (e.g., <code>h(q)</code> for Hadamard gates) via a new abstraction layer: <strong>Quantum IR (QIR)</strong>.</p>
361+
362+
<div class="diagram">
363+
CUDA-Q Workflow
364+
---------------
365+
Quantum-Classical Code
366+
367+
├─→ Classical Code (CPU/GPU)
368+
│ │
369+
│ └─→ Compiled via NVCC/HIP/Triton
370+
371+
└─→ Quantum Code
372+
373+
├─→ Parse Quantum Operations (h(q), cnot(q1,q2))
374+
375+
├─→ Generate Quantum IR (QIR)
376+
377+
├─→ Translate to OpenQASM
378+
379+
└─→ Execute on
380+
381+
├─→ Quantum Hardware (e.g., NVIDIA DGX Quantum)
382+
383+
└─→ Quantum Simulators (via cuQuantum)
384+
</div>
385+
267386
<p>CUDA-Q splits code into three paths: classical CPU/GPU logic (compiled via NVCC/HIP/Triton), quantum circuits (compiled to QIR → OpenQASM), and runtime integration with quantum hardware (e.g., NVIDIA DGX Quantum) or simulators (via <code>cuQuantum</code>).</p>
268387

269388
<h3>The Evolutionary Thread: Abstraction + Specialization</h3>
@@ -437,4 +556,4 @@ <h4 class="text-white font-semibold mb-4">Stay Updated</h4>
437556
});
438557
</script>
439558
</body>
440-
</html>
559+
</html>

0 commit comments

Comments
 (0)