|
| 1 | +--- |
| 2 | +title: "Building AI Development Infrastructure: Hardware Upgrades and Workflow Optimization" |
| 3 | +description: Scaling up local LLM infrastructure and optimizing the development environment for multi-model AI workflows. |
| 4 | +date: 2025-10-21 |
| 5 | +--- |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | +## TL;DR |
| 10 | +Current hardware limits concurrent LLM development. Upgrading storage (3x faster), |
| 11 | +migrating to Arch Linux (stability), and planning new build with 256GB RAM, faster |
| 12 | +CPU, and exploring AMD GPUs for better VRAM/$ ratio. |
| 13 | + |
| 14 | +# The Challenge: Running Multiple LLMs Concurrently |
| 15 | + |
| 16 | +My recent work with local LLMs has revealed practical limitations in my development |
| 17 | +setup. In my [pgvector demo](/posts/notes/post-2-demo-pgvector-2025082001/), I ran |
| 18 | +into RAM exhaustion when loading multiple models simultaneously, and I've |
| 19 | +continued to bump into the rails of GPU memory management experimenting with DSPy. |
| 20 | + |
| 21 | +For the prompt optimization experiments I'm pursuing, I need to run: |
| 22 | +* Embedding models for vector generation (e.g., text-embedding models) |
| 23 | +* Small inference models for rapid prototyping (Qwen-1.5B, Phi-3) |
| 24 | +* and the largest models I can manage for quality comparison evals. |
| 25 | + |
| 26 | +The pair of RTX 3060s I picked up on eBay last year just isn't cutting it. With |
| 27 | +8GB VRAM each, I can run a Qwen 1.5B model comfortably, but running larger models |
| 28 | +has been tough. |
| 29 | + |
| 30 | +Running on the CPU is painfully slow and my system is both dated and unoptimized for |
| 31 | +my new AI hobby. |
| 32 | + |
| 33 | +## The Solution: A New Build |
| 34 | +I'm going to build a new PC and try to optimize for performance in a way that doesn't |
| 35 | +sacrifice capability or cost more than I'm willing to spend. |
| 36 | + |
| 37 | +So I'm going for an AMD Ryzen 9 9950X with 256GB RAM and a 13,400 MB/s SSD. I'm also |
| 38 | +going to try running vLLM with the Radeon 580 that got displaced by the 3060s and, if |
| 39 | +I can get it working, I'm going to look at the 32GB AMD video cards. They are |
| 40 | +considerably cheaper than the NVIDIA cards. |
| 41 | + |
| 42 | +As I explored in my [recent post on resource management](/posts/notes/post-4-next-ai-demo/), |
| 43 | +managing different profiles that balance VRAM, context length, and concurrent model loading |
| 44 | +in different ways is turning out to be table stakes for building anything interesting, but |
| 45 | +things get a lot easier with more available VRAM. |
| 46 | + |
| 47 | +# Hardware Upgrade Path |
| 48 | + |
| 49 | +## Storage: The Unexpected Bottleneck |
| 50 | + |
| 51 | +While researching components for a new build, I discovered my current NVMe SSD was capped at |
| 52 | +2400MB/sec. Current PCIe 4.0 drives that my motherboard supports are hitting 7000MB/sec. |
| 53 | + |
| 54 | +Storage bandwidth matters. A $250 upgrade to a 2TB PCIe 4.0 drive effectively triples |
| 55 | +sequential throughput while doubling capacity. For workflows that repeatedly load models |
| 56 | +from disk, this is a meaningful improvement. |
| 57 | + |
| 58 | +I'm resolved to maximize the performance of the SSD in the new build and I've found a |
| 59 | +new 4TB M.2 drive with a 13,400 MB/s transfer speed to do just that. |
| 60 | + |
| 61 | +## Planning for Scale |
| 62 | + |
| 63 | +The goal is infrastructure that supports the experiments outlined in my upcoming demos: |
| 64 | +running multiple models simultaneously, testing hybrid VRAM/RAM configurations, and |
| 65 | +supporting longer context windows without slamming into memory barriers. |
| 66 | + |
| 67 | +As the new workhorse, this build will focus on: |
| 68 | +* More VRAM capacity for concurrent model loading |
| 69 | +* More system DRAM running as fast as possible (256GB) |
| 70 | +* High-bandwidth NVMe storage for model loading |
| 71 | + |
| 72 | +# Development Environment Upgrade: Linux Infrastructure |
| 73 | + |
| 74 | +With the new SSD in hand for my current workstation (a bit of a preview of toys to come), |
| 75 | +this was an opportune moment to address a persistent development environment issue. |
| 76 | + |
| 77 | +## The Stability Problem |
| 78 | + |
| 79 | +My primary development system ran Ubuntu 24.04 LTS with KDE Plasma 5, which had become |
| 80 | +increasingly unstable. I'm not sure if it was related to the inference experimentation or |
| 81 | +just buggy drivers, but having all the menu dropdowns in my desktop environment go black |
| 82 | +in the middle of the day is just not acceptable. |
| 83 | + |
| 84 | +After evaluating several distributions, I migrated to Arch Linux, in no small part to try |
| 85 | +out KDE Plasma 6. |
| 86 | + |
| 87 | +**Why Arch for AI development:** |
| 88 | +* Plasma 6 with Wayland is supposed to be more stable. |
| 89 | +* Rolling release model should make it easier to keep up with the latest software, which |
| 90 | + seems important given that AI tooling has to be aging faster than any other category of |
| 91 | + software. |
| 92 | +* AUR provides easy access to specialized tools (vLLM, various Python ML packages). |
| 93 | +* Pacman's dependency management avoids the snap/flatpak complications I've encountered |
| 94 | + on Ubuntu. |
| 95 | + |
| 96 | +The installation process was non-trivial (Arch's manual setup is definitely not streamlined), |
| 97 | +but the resulting system has so far been rock-solid for development work. |
| 98 | + |
| 99 | +# Workspace Ergonomics: A Brief Tangent |
| 100 | + |
| 101 | +With a stable foundation in place, I encountered an unexpected friction point: window |
| 102 | +management ergonomics. |
| 103 | + |
| 104 | +## Visual Density and Cognitive Load |
| 105 | + |
| 106 | +My workflow typically involves: |
| 107 | +* IDE (PyCharm with split editors for model code and prompts) |
| 108 | +* Multiple terminal windows (log tails, virtualenvs, git repositories) |
| 109 | +* Browser tabs (documentation, dashboards, etc.) |
| 110 | +* Note-taking application (experiment logs, parameter tracking, todo lists) |
| 111 | + |
| 112 | +Without visual separation between windows, my attention stumbles from one application to |
| 113 | +the next, especially when dark mode makes it even more difficult to track the boundaries. |
| 114 | +Frankly, it's a bit jarring to switch between applications and windows without the clear |
| 115 | +separation. |
| 116 | + |
| 117 | +But the KWin window gaps script I had relied on in Plasma 5 was abandoned. |
| 118 | + |
| 119 | +## Quick Fixes: Forking and Adapting |
| 120 | + |
| 121 | +I forked two projects to solve immediate workflow issues: |
| 122 | + |
| 123 | +**Window Gaps for Plasma 6:** Merged an outstanding PR and fixed some edge cases in the |
| 124 | +tiling mathematics. Added logic to detect window dragging events, which resolved a |
| 125 | +long-standing bug where other windows would resize unexpectedly during drag operations. |
| 126 | + |
| 127 | +**SuperPaper for multi-monitor wallpapers:** Used PyCharm's AI assistant to adapt this |
| 128 | +tool for my specific multi-monitor setup with activity-specific wallpaper configurations. |
| 129 | +This makes it really easy to tell which activity you're working in and with the Window |
| 130 | +Gaps layout, it's visually stunning. |
| 131 | + |
| 132 | +Both of these were relatively minor modifications (a few hours of work), but the |
| 133 | +ergonomic improvement was substantial. When you're context-switching between model |
| 134 | +configurations, query results, and code multiple times per hour, reducing visual |
| 135 | +friction adds up. |
| 136 | + |
| 137 | +# Development Infrastructure Results |
| 138 | + |
| 139 | +The combined upgrades have tangibly improved my AI development workflow: |
| 140 | + |
| 141 | +**Storage performance:** Model loading times reduced significantly. vLLM's model |
| 142 | +caching is noticeably snappier, and Docker container startup times for my |
| 143 | +PostgreSQL/pgvector stack are much improved. |
| 144 | + |
| 145 | +**System stability:** No more desktop environment crashes. |
| 146 | + |
| 147 | +**Workspace clarity:** The visual separation between windows and the visual cues |
| 148 | +for the activities reduces cognitive load when context switching. |
| 149 | + |
| 150 | +# Next Steps |
| 151 | +Now that my main workstation is a little bit faster and a lot more stable, I'm ready for |
| 152 | +phase two of my AI development infrastructure upgrade: |
| 153 | + |
| 154 | +* a new AM5 PC built for speed |
| 155 | +* a KVM switch to quickly switch machines when needed |
| 156 | +* and I'll be bringing my Framework 13 laptop into the mix to take advantage of its |
| 157 | + Ryzen 7040 that includes an AI NPU supported by Ryzen AI Engine (XDNA). |
| 158 | + |
| 159 | +I'm excited to see how much more performant my pipeline experiments will be with the |
| 160 | +addition of GPU capacity for larger models with the compute firepower for running |
| 161 | +multiple small models at once. |
| 162 | + |
| 163 | +DSPy's compilation approach uses larger models to iteratively optimize prompts and |
| 164 | +workflows for smaller models, improving their performance through automated refinement. |
| 165 | + |
| 166 | +With DSPy's more efficient use of GPU resources and with significantly more GPU capacity |
| 167 | +at the ready, I'm hoping to build a system that can perform reasonably well for the |
| 168 | +advanced workflows I've been seeing in IDE-based AI assistants. Stitching together these |
| 169 | +examples will be a good test of the infrastructure and an interesting challenge. |
| 170 | + |
| 171 | +I've got a lot of build-time and wire organization efforts in my short term future, |
| 172 | +but, long-term, I'm working toward a framework that will pair system designers with |
| 173 | +domain experts for curation and feedback. So, first, I'm growing the AI lab, then it's |
| 174 | +onward and upward to orchestrating inference, embedding, and agent tooling |
| 175 | +in a multi-host environment. |
0 commit comments