Skip to content

Commit 43e90d9

Browse files
FredworkLemmasFred McDavid
andauthored
adds post on hardware and os upgrade (#6)
Co-authored-by: Fred McDavid <fred.mcdavid@queen.one>
1 parent 42fe6f6 commit 43e90d9

2 files changed

Lines changed: 175 additions & 0 deletions

File tree

2.49 MB
Loading
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
---
2+
title: "Building AI Development Infrastructure: Hardware Upgrades and Workflow Optimization"
3+
description: Scaling up local LLM infrastructure and optimizing the development environment for multi-model AI workflows.
4+
date: 2025-10-21
5+
---
6+
7+
![](images/office_screens.png "Multi-monitor development environment")
8+
9+
## TL;DR
10+
Current hardware limits concurrent LLM development. Upgrading storage (3x faster),
11+
migrating to Arch Linux (stability), and planning new build with 256GB RAM, faster
12+
CPU, and exploring AMD GPUs for better VRAM/$ ratio.
13+
14+
# The Challenge: Running Multiple LLMs Concurrently
15+
16+
My recent work with local LLMs has revealed practical limitations in my development
17+
setup. In my [pgvector demo](/posts/notes/post-2-demo-pgvector-2025082001/), I ran
18+
into RAM exhaustion when loading multiple models simultaneously, and I've
19+
continued to bump into the rails of GPU memory management experimenting with DSPy.
20+
21+
For the prompt optimization experiments I'm pursuing, I need to run:
22+
* Embedding models for vector generation (e.g., text-embedding models)
23+
* Small inference models for rapid prototyping (Qwen-1.5B, Phi-3)
24+
* and the largest models I can manage for quality comparison evals.
25+
26+
The pair of RTX 3060s I picked up on eBay last year just isn't cutting it. With
27+
8GB VRAM each, I can run a Qwen 1.5B model comfortably, but running larger models
28+
has been tough.
29+
30+
Running on the CPU is painfully slow and my system is both dated and unoptimized for
31+
my new AI hobby.
32+
33+
## The Solution: A New Build
34+
I'm going to build a new PC and try to optimize for performance in a way that doesn't
35+
sacrifice capability or cost more than I'm willing to spend.
36+
37+
So I'm going for an AMD Ryzen 9 9950X with 256GB RAM and a 13,400 MB/s SSD. I'm also
38+
going to try running vLLM with the Radeon 580 that got displaced by the 3060s and, if
39+
I can get it working, I'm going to look at the 32GB AMD video cards. They are
40+
considerably cheaper than the NVIDIA cards.
41+
42+
As I explored in my [recent post on resource management](/posts/notes/post-4-next-ai-demo/),
43+
managing different profiles that balance VRAM, context length, and concurrent model loading
44+
in different ways is turning out to be table stakes for building anything interesting, but
45+
things get a lot easier with more available VRAM.
46+
47+
# Hardware Upgrade Path
48+
49+
## Storage: The Unexpected Bottleneck
50+
51+
While researching components for a new build, I discovered my current NVMe SSD was capped at
52+
2400MB/sec. Current PCIe 4.0 drives that my motherboard supports are hitting 7000MB/sec.
53+
54+
Storage bandwidth matters. A $250 upgrade to a 2TB PCIe 4.0 drive effectively triples
55+
sequential throughput while doubling capacity. For workflows that repeatedly load models
56+
from disk, this is a meaningful improvement.
57+
58+
I'm resolved to maximize the performance of the SSD in the new build and I've found a
59+
new 4TB M.2 drive with a 13,400 MB/s transfer speed to do just that.
60+
61+
## Planning for Scale
62+
63+
The goal is infrastructure that supports the experiments outlined in my upcoming demos:
64+
running multiple models simultaneously, testing hybrid VRAM/RAM configurations, and
65+
supporting longer context windows without slamming into memory barriers.
66+
67+
As the new workhorse, this build will focus on:
68+
* More VRAM capacity for concurrent model loading
69+
* More system DRAM running as fast as possible (256GB)
70+
* High-bandwidth NVMe storage for model loading
71+
72+
# Development Environment Upgrade: Linux Infrastructure
73+
74+
With the new SSD in hand for my current workstation (a bit of a preview of toys to come),
75+
this was an opportune moment to address a persistent development environment issue.
76+
77+
## The Stability Problem
78+
79+
My primary development system ran Ubuntu 24.04 LTS with KDE Plasma 5, which had become
80+
increasingly unstable. I'm not sure if it was related to the inference experimentation or
81+
just buggy drivers, but having all the menu dropdowns in my desktop environment go black
82+
in the middle of the day is just not acceptable.
83+
84+
After evaluating several distributions, I migrated to Arch Linux, in no small part to try
85+
out KDE Plasma 6.
86+
87+
**Why Arch for AI development:**
88+
* Plasma 6 with Wayland is supposed to be more stable.
89+
* Rolling release model should make it easier to keep up with the latest software, which
90+
seems important given that AI tooling has to be aging faster than any other category of
91+
software.
92+
* AUR provides easy access to specialized tools (vLLM, various Python ML packages).
93+
* Pacman's dependency management avoids the snap/flatpak complications I've encountered
94+
on Ubuntu.
95+
96+
The installation process was non-trivial (Arch's manual setup is definitely not streamlined),
97+
but the resulting system has so far been rock-solid for development work.
98+
99+
# Workspace Ergonomics: A Brief Tangent
100+
101+
With a stable foundation in place, I encountered an unexpected friction point: window
102+
management ergonomics.
103+
104+
## Visual Density and Cognitive Load
105+
106+
My workflow typically involves:
107+
* IDE (PyCharm with split editors for model code and prompts)
108+
* Multiple terminal windows (log tails, virtualenvs, git repositories)
109+
* Browser tabs (documentation, dashboards, etc.)
110+
* Note-taking application (experiment logs, parameter tracking, todo lists)
111+
112+
Without visual separation between windows, my attention stumbles from one application to
113+
the next, especially when dark mode makes it even more difficult to track the boundaries.
114+
Frankly, it's a bit jarring to switch between applications and windows without the clear
115+
separation.
116+
117+
But the KWin window gaps script I had relied on in Plasma 5 was abandoned.
118+
119+
## Quick Fixes: Forking and Adapting
120+
121+
I forked two projects to solve immediate workflow issues:
122+
123+
**Window Gaps for Plasma 6:** Merged an outstanding PR and fixed some edge cases in the
124+
tiling mathematics. Added logic to detect window dragging events, which resolved a
125+
long-standing bug where other windows would resize unexpectedly during drag operations.
126+
127+
**SuperPaper for multi-monitor wallpapers:** Used PyCharm's AI assistant to adapt this
128+
tool for my specific multi-monitor setup with activity-specific wallpaper configurations.
129+
This makes it really easy to tell which activity you're working in and with the Window
130+
Gaps layout, it's visually stunning.
131+
132+
Both of these were relatively minor modifications (a few hours of work), but the
133+
ergonomic improvement was substantial. When you're context-switching between model
134+
configurations, query results, and code multiple times per hour, reducing visual
135+
friction adds up.
136+
137+
# Development Infrastructure Results
138+
139+
The combined upgrades have tangibly improved my AI development workflow:
140+
141+
**Storage performance:** Model loading times reduced significantly. vLLM's model
142+
caching is noticeably snappier, and Docker container startup times for my
143+
PostgreSQL/pgvector stack are much improved.
144+
145+
**System stability:** No more desktop environment crashes.
146+
147+
**Workspace clarity:** The visual separation between windows and the visual cues
148+
for the activities reduces cognitive load when context switching.
149+
150+
# Next Steps
151+
Now that my main workstation is a little bit faster and a lot more stable, I'm ready for
152+
phase two of my AI development infrastructure upgrade:
153+
154+
* a new AM5 PC built for speed
155+
* a KVM switch to quickly switch machines when needed
156+
* and I'll be bringing my Framework 13 laptop into the mix to take advantage of its
157+
Ryzen 7040 that includes an AI NPU supported by Ryzen AI Engine (XDNA).
158+
159+
I'm excited to see how much more performant my pipeline experiments will be with the
160+
addition of GPU capacity for larger models with the compute firepower for running
161+
multiple small models at once.
162+
163+
DSPy's compilation approach uses larger models to iteratively optimize prompts and
164+
workflows for smaller models, improving their performance through automated refinement.
165+
166+
With DSPy's more efficient use of GPU resources and with significantly more GPU capacity
167+
at the ready, I'm hoping to build a system that can perform reasonably well for the
168+
advanced workflows I've been seeing in IDE-based AI assistants. Stitching together these
169+
examples will be a good test of the infrastructure and an interesting challenge.
170+
171+
I've got a lot of build-time and wire organization efforts in my short term future,
172+
but, long-term, I'm working toward a framework that will pair system designers with
173+
domain experts for curation and feedback. So, first, I'm growing the AI lab, then it's
174+
onward and upward to orchestrating inference, embedding, and agent tooling
175+
in a multi-host environment.

0 commit comments

Comments
 (0)