Skip to content

Commit 9bb5e84

Browse files
authored
Merge pull request #8 from MeshJS/feat/generic-postgres
Migrate from supabase to generic postgres db
2 parents 3be46f0 + f52a31b commit 9bb5e84

44 files changed

Lines changed: 3099 additions & 1272 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.cursor/

mimir-docs/app/(home)/page.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -190,9 +190,9 @@ import { Separator } from "@/components/ui/separator";
190190
<Database className="size-5 text-primary" />
191191
</div>
192192
<div>
193-
<h3 className="mb-1 font-semibold leading-snug">Supabase Vector Store</h3>
193+
<h3 className="mb-1 font-semibold leading-snug">PostgreSQL Vector Store</h3>
194194
<div className="text-sm text-muted-foreground leading-relaxed">
195-
Built on Supabase for reliable, scalable vector storage. Fast semantic search with configurable similarity thresholds and match counts.
195+
Built on PostgreSQL with pgvector for reliable, scalable vector storage. Works with any Postgres provider (Supabase, Neon, self-hosted). Fast semantic search with configurable similarity thresholds and match counts.
196196
</div>
197197
</div>
198198
</div>
@@ -238,7 +238,7 @@ import { Separator } from "@/components/ui/separator";
238238
</div>
239239
<h3 className="mb-2 font-semibold text-lg leading-snug">Store</h3>
240240
<div className="text-sm text-muted-foreground leading-relaxed">
241-
Embeddings are stored in Supabase with source URLs, metadata, and full context. Everything is indexed and ready for semantic search.
241+
Embeddings are stored in PostgreSQL (pgvector) with source URLs, metadata, and full context. Everything is indexed and ready for semantic search.
242242
</div>
243243
</div>
244244

mimir-docs/app/layout.tsx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ export const metadata: Metadata = {
1414
default: 'Mimir - Contextual RAG for Code & Documentation',
1515
template: '%s | Mimir',
1616
},
17-
description: 'Mimir is a comprehensive contextual RAG (Retrieval Augmented Generation) system with MCP integration. Ingest code and documentation from multiple GitHub repositories into Supabase vector store. Supports TypeScript, Python, and more. OpenAI-compatible API and MCP protocol for AI assistants.',
17+
description: 'Mimir is a comprehensive contextual RAG (Retrieval Augmented Generation) system with MCP integration. Ingest code and documentation from multiple GitHub repositories into PostgreSQL (pgvector). Supports TypeScript, Python, and more. OpenAI-compatible API and MCP protocol for AI assistants.',
1818
keywords: [
1919
'RAG',
2020
'Retrieval Augmented Generation',
@@ -24,7 +24,8 @@ export const metadata: Metadata = {
2424
'Documentation Search',
2525
'Codebase Search',
2626
'Vector Database',
27-
'Supabase',
27+
'PostgreSQL',
28+
'pgvector',
2829
'TypeScript',
2930
'Python',
3031
'AI Documentation',

mimir-docs/content/docs/configuration.mdx

Lines changed: 111 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,11 @@ Configuration in Mimir is done through environment variables in a `.env` file. T
1313
Mimir uses environment variables organized into these categories:
1414

1515
1. **Server Configuration** - API keys and server settings
16-
2. **Supabase Configuration** - Database connection and settings
16+
2. **Database Configuration** - PostgreSQL connection and settings
1717
3. **GitHub Repositories** - Where to fetch code and documentation
1818
4. **Parser Configuration** - What to extract from code
19-
5. **LLM Configuration** - Embedding and chat model settings
19+
5. **Documentation Configuration** - Documentation URL generation
20+
6. **LLM Configuration** - Embedding and chat model settings
2021

2122
## 1. Server Configuration
2223

@@ -48,64 +49,71 @@ If webhooks fail or aren't configured, this sets up a scheduled backup that peri
4849
<Separator />
4950
</div>
5051

51-
## 2. Supabase Configuration
52+
## 2. Database Configuration
5253

53-
#### Supabase URL (required)
54+
Mimir works with any PostgreSQL database that supports the pgvector extension (Supabase, Neon, self-hosted, etc.).
55+
56+
#### Database URL (required)
5457

5558
```tsx
56-
MIMIR_SUPABASE_URL=https://your-project.supabase.co
59+
MIMIR_DATABASE_URL=postgresql://user:password@host:5432/database
5760
```
5861

59-
Your Supabase project's API endpoint. Mimir uses it to connect to your database and store/query vector embeddings. Find this in your Supabase dashboard under Project Settings → API → "Project URL".
62+
Your PostgreSQL connection string. Mimir uses it to connect directly to your database and store/query vector embeddings.
6063

61-
#### Supabase Service Role Key (required)
64+
<InfoCard>
65+
**Docker and managed Postgres (Supabase, Neon, etc.):** If the container cannot reach your database, run with **host network**: `docker run --rm --network host -v $(pwd)/.env:/app/.env:ro mimir-rag:local`. That uses the host's network and usually clears connection issues. Alternatively, for Supabase use the **Session Pooler** connection string (port 6543) from Dashboard → Settings → Database → Connection Pooling → Session mode.
66+
</InfoCard>
6267

63-
```tsx
64-
MIMIR_SUPABASE_SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
65-
```
68+
#### Embedding Dimension
6669

67-
This key gives Mimir full access to your Supabase database. It's needed to create tables, insert embeddings, and perform vector searches. Find it in Supabase dashboard → Project Settings → API → "service_role" key (not the "anon" key).
70+
The default schema uses `vector(3072)` for embeddings. If your embedding model uses a different dimension, update the database schema:
6871

69-
**⚠️ Security Note:** This key has full database access. Never commit it to version control or expose it publicly.
72+
1. Check your embedding model's dimension (e.g., OpenAI `text-embedding-3-small` uses 1536, `text-embedding-3-large` uses 3072)
73+
2. Update `prisma/migrations/0_init/migration.sql` to change `vector(3072)` to your model's dimension
74+
3. Update `prisma/schema.prisma` to change `Unsupported("vector(3072)")` to match
75+
4. Run migrations: `make setup-db`
7076

71-
#### Supabase Table (optional)
77+
#### Similarity Threshold (optional)
7278

7379
```tsx
74-
MIMIR_SUPABASE_TABLE=docs
80+
MIMIR_DATABASE_SIMILARITY_THRESHOLD=0.2
7581
```
7682

77-
**Default:** `"docs"`
83+
**Default:** `0.2`
84+
**Range:** 0.0 to 1.0
7885

79-
Specifies which table in your Supabase database stores the document chunks. Use different table names if you have multiple Mimir instances or want to separate different documentation sets.
86+
Controls how similar a document chunk must be to your query to be included in results. Lower values (0.1-0.3) return more results but may include less relevant content. Higher values (0.5-0.8) return fewer, more precise matches.
8087

81-
#### Supabase Database Password (optional)
88+
#### Match Count (optional)
8289

8390
```tsx
84-
MIMIR_SUPABASE_DB_PASSWORD=your-database-password
91+
MIMIR_DATABASE_MATCH_COUNT=10
8592
```
8693

87-
Mimir can automatically construct the full database connection URL from your Supabase URL and password. If provided, Mimir combines this with `MIMIR_SUPABASE_URL` to create `DATABASE_URL` automatically.
94+
**Default:** `10`
8895

89-
#### Similarity Threshold (optional)
96+
Limits how many document chunks are returned per query from vector search. More chunks provide more context but increase API costs and response time. Fewer chunks are faster but may miss relevant information.
97+
98+
#### BM25 Match Count (optional)
9099

91100
```tsx
92-
MIMIR_SUPABASE_SIMILARITY_THRESHOLD=0.2
101+
MIMIR_DATABASE_BM25_MATCH_COUNT=10
93102
```
94103

95-
**Default:** `0.2`
96-
**Range:** 0.0 to 1.0
104+
**Default:** `10`
97105

98-
Controls how similar a document chunk must be to your query to be included in results. Lower values (0.1-0.3) return more results but may include less relevant content. Higher values (0.5-0.8) return fewer, more precise matches.
106+
Limits how many document chunks are returned per query from full-text (BM25) search. Used when hybrid search is enabled.
99107

100-
#### Match Count (optional)
108+
#### Enable Hybrid Search (optional)
101109

102110
```tsx
103-
MIMIR_SUPABASE_MATCH_COUNT=10
111+
MIMIR_DATABASE_ENABLE_HYBRID_SEARCH=true
104112
```
105113

106-
**Default:** `10`
114+
**Default:** `true`
107115

108-
Limits how many document chunks are returned per query. More chunks provide more context but increase API costs and response time. Fewer chunks are faster but may miss relevant information.
116+
When enabled, combines vector similarity search with full-text (BM25) search for better results. Disable if you only want vector search.
109117

110118
<div className="my-6">
111119
<Separator />
@@ -117,7 +125,7 @@ Mimir fetches code and documentation from GitHub repositories. You can configure
117125

118126
### Single Repository
119127

120-
For most projects, start with a single repository:
128+
For most projects, start with a single repository that contains both code and documentation:
121129

122130
```tsx
123131
MIMIR_GITHUB_URL=https://github.com/your-org/your-repo
@@ -126,7 +134,7 @@ MIMIR_GITHUB_TOKEN=ghp_your_token_here
126134
```
127135

128136
<InfoCard>
129-
**MIMIR_GITHUB_URL:** The main repository URL. Mimir will fetch both code and documentation from here.
137+
**MIMIR_GITHUB_URL:** The main repository URL. Mimir will fetch both code and documentation from here. Required for single-repo setup. Optional if using separate code/docs repos or multiple repos.
130138
</InfoCard>
131139

132140
<InfoCard>
@@ -137,12 +145,20 @@ MIMIR_GITHUB_TOKEN=ghp_your_token_here
137145
**MIMIR_GITHUB_TOKEN:** GitHub personal access token. Required for private repositories or to avoid rate limits on public repos.
138146
</InfoCard>
139147

148+
<InfoCard>
149+
**MIMIR_GITHUB_DIRECTORY:** Base directory to start from in the main repo. Optional.
150+
</InfoCard>
151+
152+
<InfoCard>
153+
**MIMIR_GITHUB_INCLUDE_DIRECTORIES:** Comma-separated list of directories to include from the main repo. Optional.
154+
</InfoCard>
155+
140156
### Separate Code and Documentation Repos
141157

142-
If your code and docs are in different repositories:
158+
If your code and docs are in different repositories, use separate configuration:
143159

144160
```tsx
145-
# Code repository (TypeScript, Python, etc.)
161+
# Code repository (TypeScript, Python, Rust, etc.)
146162
MIMIR_GITHUB_CODE_URL=https://github.com/your-org/code-repo
147163
MIMIR_GITHUB_CODE_DIRECTORY=src
148164
MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES=src,lib
@@ -153,6 +169,8 @@ MIMIR_GITHUB_DOCS_DIRECTORY=docs
153169
MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES=docs,guides
154170
```
155171

172+
**Note:** When using `MIMIR_GITHUB_CODE_URL` or `MIMIR_GITHUB_DOCS_URL`, those take precedence over `MIMIR_GITHUB_URL` for that type. `MIMIR_GITHUB_URL` is used as a fallback if neither `MIMIR_GITHUB_CODE_URL` nor `MIMIR_GITHUB_DOCS_URL` is set.
173+
156174
<InfoCard>
157175
**DIRECTORY:** Base directory to start from. If your code is in `src/`, set this to `src` to avoid indexing root-level files.
158176
</InfoCard>
@@ -163,7 +181,7 @@ MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES=docs,guides
163181

164182
### Multiple Repositories
165183

166-
For larger projects with multiple codebases or documentation sources:
184+
For larger projects with multiple codebases or documentation sources, use numbered environment variables:
167185

168186
```tsx
169187
# ============================================
@@ -182,13 +200,16 @@ MIMIR_GITHUB_CODE_REPO_2_DIRECTORY=packages
182200
# ============================================
183201
MIMIR_GITHUB_DOCS_REPO_1_URL=https://github.com/your-org/docs1
184202
MIMIR_GITHUB_DOCS_REPO_1_DIRECTORY=docs
203+
MIMIR_GITHUB_DOCS_REPO_1_INCLUDE_DIRECTORIES=docs,guides
185204
MIMIR_GITHUB_DOCS_REPO_1_BASE_URL=https://docs.example.com
186205
MIMIR_GITHUB_DOCS_REPO_1_CONTENT_PATH=content/docs
187206

188207
MIMIR_GITHUB_DOCS_REPO_2_URL=https://github.com/your-org/docs2
189208
MIMIR_GITHUB_DOCS_REPO_2_BASE_URL=https://docs2.example.com
190209
```
191210

211+
**Note:** When using multiple repos (numbered variables), the single-repo variables (`MIMIR_GITHUB_CODE_URL`, `MIMIR_GITHUB_DOCS_URL`) are ignored. Number repos starting from 1 and increment sequentially (1, 2, 3, etc.). Each repo can have its own `DIRECTORY`, `INCLUDE_DIRECTORIES`, and `EXCLUDE_PATTERNS` settings.
212+
192213
<InfoCard>
193214
**EXCLUDE_PATTERNS:** Comma-separated patterns to skip. Useful for excluding test files, build artifacts, or generated code.
194215
</InfoCard>
@@ -243,11 +264,43 @@ Prevents test files and other non-production code from being indexed. This keeps
243264

244265
Common test patterns are excluded automatically if not specified.
245266

267+
#### Include Directories (optional)
268+
269+
```tsx
270+
MIMIR_GITHUB_INCLUDE_DIRECTORIES=src,lib,packages
271+
```
272+
273+
Comma-separated list of directories to include when parsing. Only files in these directories will be indexed. Useful for large repositories where you only want specific folders.
274+
246275
<div className="my-6">
247276
<Separator />
248277
</div>
249278

250-
## 5. LLM Configuration
279+
## 5. Documentation Configuration (Optional)
280+
281+
Configure how documentation URLs are generated in search results.
282+
283+
#### Documentation Base URL (optional)
284+
285+
```tsx
286+
MIMIR_DOCS_BASE_URL=https://docs.example.com
287+
```
288+
289+
The base URL where your documentation is hosted. Used to generate clickable links in search results when repository paths don't have per-repo `BASE_URL` configured.
290+
291+
#### Documentation Content Path (optional)
292+
293+
```tsx
294+
MIMIR_DOCS_CONTENT_PATH=content/docs
295+
```
296+
297+
The path prefix in your repository where documentation content lives. Used to correctly map repository paths to documentation URLs.
298+
299+
<div className="my-6">
300+
<Separator />
301+
</div>
302+
303+
## 6. LLM Configuration
251304

252305
Mimir uses LLMs for two purposes: creating embeddings (vector representations) and generating chat responses. You can use different providers for each.
253306

@@ -331,6 +384,25 @@ MIMIR_LLM_CHAT_TEMPERATURE=0
331384

332385
Controls randomness (0.0 to 2.0). Lower values (0-0.3) give more deterministic, factual answers. Higher values (0.7-1.0) give more creative responses.
333386

387+
#### Chat Max Output Tokens (optional)
388+
389+
```tsx
390+
MIMIR_LLM_CHAT_MAX_OUTPUT_TOKENS=8000
391+
```
392+
393+
**Default:** `8000`
394+
395+
Maximum number of tokens the chat model can generate in a single response. Increase for longer responses, decrease to limit response length.
396+
397+
#### Custom Base URLs (optional)
398+
399+
```tsx
400+
MIMIR_LLM_EMBEDDING_BASE_URL=https://api.openai.com/v1
401+
MIMIR_LLM_CHAT_BASE_URL=https://api.openai.com/v1
402+
```
403+
404+
Override the default API base URL for your LLM provider. Useful for self-hosted models or custom API endpoints.
405+
334406
### Mixing Providers
335407

336408
You can use different providers for embeddings and chat:
@@ -349,21 +421,14 @@ This allows you to optimize for cost (cheap embeddings) and quality (better chat
349421

350422
## Complete Example
351423

352-
Here's a complete `.env` file example with all common settings:
424+
Here's a minimal `.env` file example with required settings:
353425

354426
```tsx
355427
# Server (Required)
356428
MIMIR_SERVER_API_KEY=your-generated-api-key
357429

358-
# Supabase (Required)
359-
MIMIR_SUPABASE_URL=https://your-project.supabase.co
360-
MIMIR_SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
361-
362-
# Supabase (Optional)
363-
MIMIR_SUPABASE_DB_PASSWORD=your-db-password
364-
MIMIR_SUPABASE_TABLE=docs
365-
MIMIR_SUPABASE_SIMILARITY_THRESHOLD=0.2
366-
MIMIR_SUPABASE_MATCH_COUNT=10
430+
# Database (Required)
431+
MIMIR_DATABASE_URL=postgresql://user:password@host:5432/database
367432

368433
# GitHub - Single Repository
369434
MIMIR_GITHUB_URL=https://github.com/your-org/your-repo
@@ -379,9 +444,10 @@ MIMIR_LLM_EMBEDDING_API_KEY=sk-your-openai-key
379444
MIMIR_LLM_CHAT_PROVIDER=openai
380445
MIMIR_LLM_CHAT_MODEL=gpt-4
381446
MIMIR_LLM_CHAT_API_KEY=sk-your-openai-key
382-
MIMIR_LLM_CHAT_TEMPERATURE=0
383447
```
384448

449+
For a complete example with all optional settings, see `.env.example` in the mimir-rag directory.
450+
385451
## Next Steps
386452

387453
- Learn about [Deployment](/docs/deployment) options

mimir-docs/content/docs/deployment.mdx

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,19 @@ docker run --rm \
3434
mimir-rag:latest
3535
```
3636

37+
### Database connection from Docker
38+
39+
If the container cannot reach your database (e.g. connection timeouts or "Not IPv4 compatible" with Supabase/Neon), run the image with **host network mode**. The app then uses the host’s network and can connect to the DB without changing your connection string:
40+
41+
```bash
42+
docker run --rm \
43+
--network host \
44+
--env-file .env \
45+
mimir-rag:latest
46+
```
47+
48+
With host network, the app listens on port 3000 on the host; omit `-p 3000:3000`. On Linux this usually resolves DB connection issues. On macOS/Windows, host network behaves differently—prefer the Session Pooler (port 6543) for Supabase or a connection string that works from inside the container.
49+
3750
<div className="my-6">
3851
<Separator />
3952
</div>
@@ -79,11 +92,12 @@ Hetzner is a popular choice for deploying Mimir due to its simplicity and cost-e
7992
```bash
8093
docker run -d \
8194
--name mimir-rag \
82-
-p 3000:3000 \
95+
--network host \
8396
--env-file .env \
8497
--restart unless-stopped \
8598
mimir-rag:latest
8699
```
100+
Using `--network host` avoids database connection issues with managed Postgres (Supabase, Neon, etc.). The server listens on port 3000 on the host; do not use `-p 3000:3000` when using host network.
87101

88102
That's it! Your Mimir instance is now running on Hetzner.
89103

@@ -145,7 +159,7 @@ Deploy Mimir on your own infrastructure for full control.
145159
### Requirements
146160

147161
- Node.js 20 or later
148-
- Access to Supabase database
162+
- Access to PostgreSQL database with pgvector extension
149163
- LLM provider API keys
150164
- Process manager (PM2 recommended)
151165

@@ -193,7 +207,7 @@ pm2 startup
193207
Before deploying to production:
194208

195209
- [ ] Generate a secure API key
196-
- [ ] Set up Supabase with proper security settings
210+
- [ ] Set up PostgreSQL database with proper security settings
197211
- [ ] Configure GitHub webhook secret (if using webhooks)
198212
- [ ] Set appropriate similarity threshold and match count
199213
- [ ] Configure exclude patterns to avoid indexing test files
@@ -226,5 +240,5 @@ Expected response:
226240

227241
Mimir is stateless and can be scaled horizontally:
228242
- Run multiple instances behind a load balancer
229-
- All instances share the same Supabase database
243+
- All instances share the same PostgreSQL database
230244
- Use sticky sessions if needed (not required for most use cases)

0 commit comments

Comments
 (0)