|
| 1 | +--- |
| 2 | +title: "Demo: Pgvector" |
| 3 | +description: A vector database implemented using Pgvector and PostgreSQL. |
| 4 | +date: 2025-08-20 |
| 5 | +--- |
| 6 | + |
| 7 | +# Terra firma: playing at scale |
| 8 | + creepin' outside my office window") |
| 9 | + |
| 10 | +## My last demo was impressive, but pitiable in a lot of ways |
| 11 | +For the past year or so, I've been thinking about how People can benefit from |
| 12 | +LLMs and I've been noodling out a design for sharing context in an interesting |
| 13 | +way with friends, coworkers, and customers, but I've hardly touched any code |
| 14 | +that actually does anything interesting with an LLM or any other more typically |
| 15 | +AI-adjacent construct. |
| 16 | + |
| 17 | +But, just before that, I did a code up quick demo that showed off a simple RAG |
| 18 | +pipeline. It worked remarkably well but it was dead simple: a lightweight |
| 19 | +model, a Chroma vector store, and some custom chunking code. |
| 20 | + |
| 21 | +A few weeks later, I built something similar at work to mine Basecamp |
| 22 | +conversations for support information. Again, though just a demo, the |
| 23 | +results were pretty badass. |
| 24 | + |
| 25 | +## There were some pretty obvious scaling limitations: |
| 26 | +* at some point, I figured I'd want to put so much data in the database that it |
| 27 | + wouldn't fit in memory and Chroma was an in-memory vector database. I want to |
| 28 | + see the day when we can search curated libraries that house vast amounts of |
| 29 | + text, so persistent, non-resident, non-super-expensive storage is crucial. |
| 30 | +* the model I chose didn't fit in the VRAM so I had to run it with the CPU which |
| 31 | + made it pretty slow (but not terrible really). |
| 32 | +* it had to download the model every time it ran. |
| 33 | + |
| 34 | +## Scoping out the future |
| 35 | +Because I know I'll be crossing these bridges at some point, I've been eyeing |
| 36 | +a bunch of answers to the demo's shortcomings. PostgreSQL is an easy choice if |
| 37 | +it works since I've been using it for years. Caching the model is an obvious |
| 38 | +upgrade too. |
| 39 | + |
| 40 | +Docling was a bit of an unknown, but it performed admirably as did vLLM in a |
| 41 | +Docker container, once I'd upgraded my drivers to the 580 version. |
| 42 | + |
| 43 | +## This demo |
| 44 | +[Demo: Pgvector](https://github.com/FredworkLemmas/demo_pgvector) is simply a |
| 45 | +proof-of-concept that shows off the same sort of RAG pipeline and query solution |
| 46 | +I'd built in the past, but with some enhancements: |
| 47 | + |
| 48 | +* A PostgreSQL-backed vector index, removing in-memory constraints. |
| 49 | +* Docling for cleaner chunking strategies. |
| 50 | +* vLLM with model caching, which makes small models easy to run repeatedly. |
| 51 | +* A Dockerized GPU environment, which turned out to be easier to configure than |
| 52 | + in the past. |
| 53 | +* EPUB ingestion, Project Gutenberg unlocked! |
| 54 | + |
| 55 | +## Reflections |
| 56 | +* Embedding dimensions are strict: mismatches are non-negotiable...it's a choice |
| 57 | + that's made when the DB table is created. |
| 58 | +* Model quality has improved: Qwen 1.5B was unexpectedly strong for its size. |
| 59 | +* Search thresholds were surprisingly low: semantic similarity scores were far |
| 60 | + lower than I expected, making me wonder if it's actually possible to set that |
| 61 | + as a constant. it may need to reflect the content somehow. and the oddly low |
| 62 | + number also makes me think I should be baking in some sort of full-text search |
| 63 | + (which happens to be pretty easy with postgresql). |
| 64 | + |
| 65 | +## Closing thoughts |
| 66 | +There were no "Eureka!" moments with this demo, but it was pretty easy to get to |
| 67 | +where all the moving parts were in place and working and the future is bright: |
| 68 | + |
| 69 | +* the AI assistant in PyCharm was super-helpful with some key bits of this |
| 70 | + effort. |
| 71 | +* with a vector database that can scale beyond a machine's RAM capacity, a |
| 72 | + surpisingly capable but smallish model, and some solid caching options with |
| 73 | + vLLM, it looks like reliable performance on modest hardware is indeed |
| 74 | + possible. |
| 75 | + |
| 76 | +## What's next? |
| 77 | +* There's a lot to be done in the context department - searching chunks is cool |
| 78 | + but it's a subset of what a real-world use case will need. |
| 79 | +* It seems like there are some good ways to integrate MCP capabilities. |
| 80 | +* I'd like to see bigger models and multi-modal I/O. |
| 81 | + |
| 82 | + |
| 83 | +# Additional Notes |
| 84 | +THERE ARE NO TESTS!! HERE BE DRAGONS! RUN AWAY! |
0 commit comments