fix vdcores path

jiange91 · jiange91 · commit 6cbbd7c5c17d · 2026-02-19T20:36:10.000-08:00
diff --git a/content/posts/vdcores.md b/content/posts/vdcores.md
@@ -23,7 +23,7 @@ In this post, we will cover:
 
 ## 1. GPUs Are Becoming Asynchronous, Kernel Programming Is Becoming Messy
 
-<img src="../../images/dae/simd_vs_decouple.png" alt="comparison" />
+<img src="../../images/vdcores/simd_vs_decouple.png" alt="comparison" />
 
 Modern GPUs are no longer "just" wide SIMD machines. They are increasingly asynchronous systems with {{< highlight-text >}}**heterogeneous resources**{{< /highlight-text >}} that each operate on their own timelines: tensor cores run independently, memory pipelines have their own queues, and async copy engines allow data movement to proceed concurrently with computation. Programming should adapt to this asynchronous style --- and the performance rewards for doing so are real.
 
@@ -37,7 +37,7 @@ This coupling amplifies complexity. Performance features like prefetching, pipel
 
 > We adopt the key principle of how [software systems](https://en.wikipedia.org/wiki/Actor_model) controls the complexity of asynchonous: **Resource/state isolation** and **asynchronous through message passing**, and rebuild GPU SMs to **decoupled cores**.
 
-<img src="../../images/dae/rt-overview.jpg" alt="runtime" />
+<img src="../../images/vdcores/rt-overview.jpg" alt="runtime" />
 
 In the VDCores model, virtual cores are the unit of execution and composition. Instead of a single monolithic kernel, execution is decomposed into independent instruction streams executed by loosely coupled cores.
 
@@ -57,13 +57,13 @@ We build VDCores by composing only 5 basic compute instructions and 23 memory/co
 
 VDCores do not get this edge by hand-tunning better kernels, but instead through decouopled runtime and flexbile programming interface. We illustrate this with two exmples in this process.
 
-<img src="../../images/dae/performance.png" style="width: 55%;display: flex;justify-content: center;" alt="QWen-8B Performance" />
+<img src="../../images/vdcores/performance.png" style="width: 55%;display: flex;justify-content: center;" alt="QWen-8B Performance" />
 
 ### Example 1: **Free** "Prefetch" Non-Dependent Memory Buffers
 
 Consider an attention kernel followed by a linear projection with residual addition. In VDCores, we connect them by dependencies rather than manually fusing/staging: (Also note that in VDCores we do not have the notion of kernel boundary; we mark the original kernel boundary in the example for easy to understand.)
 
-{{< dae/example1 >}}
+{{< vdcores/example1 >}}
 
 This is the key shift: {{< highlight-text >}} Overlap **emerges automatically**{{< /highlight-text >}} from runtime dependency resolving, without humans splitting code into explicit "prefetch" stages or manually fusing kernels to force concurrency.
 
@@ -73,7 +73,7 @@ Another secret sause of VDCores is the **composbility** of it's components. Same
 
 Consider a MLP block: GEMV (Up + Gate) followed by SiLU activation and GEMV (Down). Input is shape [1, 4096], Up and Gate outputs are [1, 12288], and Down output is [1, 4096]. We can tile Gate and Up along the N dimension and Down along the K dimension.
 
-<img src="../../images/dae/example2.jpg" alt="flexible core composition: two schedules" style="width: 100%;" />
+<img src="../../images/vdcores/example2.jpg" alt="flexible core composition: two schedules" style="width: 100%;" />
 
 **Schedule 1** executes the operations in order and fuses SiLU with Up—straightforward and amenable to kernel fusion for optimization.
 
diff --git a/layouts/shortcodes/vdcores/example1.html b/layouts/shortcodes/vdcores/example1.html
@@ -18,11 +18,11 @@
       </div>
     </div>
     <div style="flex: 6;">
-      <img class="step-img" data-gallery="gallery-ex1" data-step="0" src="{{ "images/dae/example1-1.jpg" | relURL }}" alt="overview: two kernels with color-coded dependencies" style="width: 100%; display: none;" />
-      <img class="step-img" data-gallery="gallery-ex1" data-step="1" src="{{ "images/dae/example1-2.jpg" | relURL }}" alt="step 1: compute core waits for buffer 0 and 1" style="width: 100%; display: none;" />
-      <img class="step-img" data-gallery="gallery-ex1" data-step="2" src="{{ "images/dae/example1-3.jpg" | relURL }}" alt="step 2: prefetch weights during kernel 0" style="width: 100%; display: none;" />
-      <img class="step-img" data-gallery="gallery-ex1" data-step="3" src="{{ "images/dae/example1-4.jpg" | relURL }}" alt="step 3: proceed with matrix when buffer 3 ready" style="width: 100%; display: none;" />
-      <img class="step-img" data-gallery="gallery-ex1" data-step="4" src="{{ "images/dae/example1-5.jpg" | relURL }}" alt="step 4: prefetch residual before vec_add" style="width: 100%; display: none;" />
+      <img class="step-img" data-gallery="gallery-ex1" data-step="0" src="{{ "images/vdcores/example1-1.jpg" | relURL }}" alt="overview: two kernels with color-coded dependencies" style="width: 100%; display: none;" />
+      <img class="step-img" data-gallery="gallery-ex1" data-step="1" src="{{ "images/vdcores/example1-2.jpg" | relURL }}" alt="step 1: compute core waits for buffer 0 and 1" style="width: 100%; display: none;" />
+      <img class="step-img" data-gallery="gallery-ex1" data-step="2" src="{{ "images/vdcores/example1-3.jpg" | relURL }}" alt="step 2: prefetch weights during kernel 0" style="width: 100%; display: none;" />
+      <img class="step-img" data-gallery="gallery-ex1" data-step="3" src="{{ "images/vdcores/example1-4.jpg" | relURL }}" alt="step 3: proceed with matrix when buffer 3 ready" style="width: 100%; display: none;" />
+      <img class="step-img" data-gallery="gallery-ex1" data-step="4" src="{{ "images/vdcores/example1-5.jpg" | relURL }}" alt="step 4: prefetch residual before vec_add" style="width: 100%; display: none;" />
       <div style="display: flex; align-items: center; justify-content: center; gap: 1rem; margin-top: 0.4rem;">
         <span class="step-prev" data-gallery="gallery-ex1" style="cursor: pointer; color: #268BD2; font-size: 1rem; user-select: none; padding: 0.2rem 0.5rem; border: 1px solid #268BD2; border-radius: 4px; transition: background 0.2s;" onmouseover="this.style.background='#268BD2';this.style.color='#fff'" onmouseout="this.style.background='transparent';this.style.color='#268BD2'">&#8592; prev</span>
         <span class="step-counter" data-gallery="gallery-ex1" style="font-size: 0.8rem; color: #888;">1 / 5</span>