You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A containerised RAG stack for your Markdown vault:
4
4
- Indexes Markdown with **Markdown-header splitting** first, then **sentence-aware fallback**, and finally **char-based** fallback.
5
5
- Persists embeddings in **Chroma**.
6
-
- Uses **Ollama** for both generator (**Granite 4.0 Tiny-H**) and embedder (**nomic-embed-text**).
6
+
- Uses **Ollama** for both generator (**Gemma 4 26B Q4**) and embedder (**nomic-embed-text**).
7
+
- Ollama runs **on the host** (Metal GPU on macOS) for faster inference and embedding; the containers talk to it via `host.containers.internal`.
7
8
-**Watchdog** sidecar auto-reindexes on vault changes (debounced).
8
9
9
10
## Quick start
10
-
1. Edit `.env` and set `HOST_VAULT_PATH` to your Markdown vault absolute path.
11
-
2.`make up`
12
-
3.`make pull` (first run to cache models)
13
-
4. Bring up the API (if it's not running) and start chatting:
11
+
12
+
1. Install and start [Ollama](https://ollama.com) on your host machine (it must be running before the stack starts).
13
+
2. Pull the required models:
14
+
```bash
15
+
make ollama-bootstrap
16
+
```
17
+
3. Edit `.env` and set `HOST_VAULT_PATH` to your Markdown vault absolute path.
18
+
4. Start the stack:
19
+
```bash
20
+
make up
21
+
```
22
+
5. Start chatting:
23
+
```bash
24
+
./chat.sh
25
+
```
26
+
27
+
## Changing models
28
+
29
+
Model names are defined as variables at the top of the `Makefile`:
30
+
31
+
```makefile
32
+
GENERATOR_MODEL ?= gemma4-26b-q4xl:latest
33
+
EMBED_MODEL ?= nomic-embed-text
34
+
```
35
+
36
+
To switch models, override them on the command line — no file edits required:
37
+
14
38
```bash
15
-
./chat.sh
39
+
# Pull and verify the new models first
40
+
make ollama-bootstrap GENERATOR_MODEL=llama3.2:latest
41
+
42
+
# Then start the stack with the same override
43
+
make up GENERATOR_MODEL=llama3.2:latest
16
44
```
17
45
46
+
The values are exported from Make and picked up by `docker-compose.yml` as environment variables. If you want a permanent change, edit the two lines in `Makefile` directly.
47
+
48
+
> **Note:** changing `EMBED_MODEL` requires a full reindex (`make reindex`) because
49
+
> the new embedding model will produce incompatible vectors.
50
+
18
51
## Manual calls
19
52
- Reindex: `make reindex` (also happens on startup, and on changes via watcher)
-`HOST_VAULT_PATH`: absolute path to your markdown vault on the host.
58
-
-`OLLAMA_BASE_URL`: override to use host Ollama (see “Use host Ollama”).
91
+
-**Makefile variables** (source of truth for model names):
92
+
-`GENERATOR_MODEL`: LLM used for answering (default `gemma4-26b-q4xl:latest`).
93
+
-`EMBED_MODEL`: embedding model (default `nomic-embed-text`).
59
94
-**Settings** (`app/settings.py`):
60
95
-`index_path`: Chroma persistence directory.
61
96
-`vault_path`: container path for mounted vault.
62
-
-`embed_model`: embedder name (e.g., `nomic-embed-text`).
63
-
-`generator_model`: LLM for answering (e.g., `ibm/granite4:tiny-h`).
64
97
-`timezone`: used for date parsing and display.
65
98
66
99
-**Container env (docker-compose.yml)**:
100
+
-`OLLAMA_BASE_URL`: points to `http://host.containers.internal:11434` so containers reach host Ollama.
67
101
-`REINDEX_ON_START`: when `true`, `app/run.sh` calls `POST /reindex/scan` after the API boots to enqueue only changed/removed files since the last index state.
-`RAG_URL`, `RAG_FILES_URL` (watcher): endpoints for full and partial reindex (defaults are fine in docker-compose).
@@ -83,26 +117,68 @@ markdown-rag/
83
117
84
118
## Indexing & retrieval behavior
85
119
-**Chunking**: header → sentence → char fallbacks to produce readable chunks.
86
-
-**Metadata stored**: `title`, `source`, `entry_date` (when detected), `people` (derived from title, filename, headings, and parent folders). Vector store metadata is sanitized to primitives.
87
-
-**Embeddings include metadata**: Each chunk text is prefixed with `[title] [people] [source] [date]` to strengthen person and title relevance.
120
+
-**Metadata stored**: `title`, `source`, `entry_date` (from date headings, frontmatter `date` field, or file mtime — in that priority order), `tags` (from frontmatter), `entities` (derived from title, filename, headings, and parent folders). Vector store metadata is sanitized to primitives.
121
+
-**Embeddings include metadata**: Each chunk text is prefixed with `[title] [entities] [source] [date] [tags]` to strengthen relevance in vector search.
88
122
-**Dates**:
89
-
- Query rules like “today”, “last 2 weeks”, or explicit ranges parsed by `date_parser.py`.
123
+
- Query rules like "today", "last 2 weeks", or explicit ranges parsed by `date_parser.py`.
90
124
- Retrieval filters strictly by date when a concrete window is parsed; otherwise a name-only fallback is used to avoid empty results.
91
125
-**People**:
92
126
- Names are extracted from queries (quotes/multi-word preferred; common non-name tokens filtered out).
93
-
- Retrieval requires all detected names to match `metadata.people` (or title/source) when any names are found.
127
+
- Retrieval requires all detected names to match `metadata.entities` (or title/source) when any names are found.
-`make test-install` → create `.venv` and install test dependencies
105
-
-`make test` → run the unit test suite with coverage report
130
+
131
+
### Ollama (host)
132
+
133
+
| Target | Description |
134
+
|--------|-------------|
135
+
|`make ollama-bootstrap`| Pull `GENERATOR_MODEL` and `EMBED_MODEL` to the host Ollama. Safe to re-run — `ollama pull` skips models that are already current. Run this before first `make up` and whenever you change model names. |
136
+
|`make ollama-status`| Show host Ollama version and list all pulled models alongside the model names required by the stack. |
|`make machine-start`| Start an existing Podman VM. |
175
+
176
+
### Tests
177
+
178
+
| Target | Description |
179
+
|--------|-------------|
180
+
|`make test-install`| One-time setup: create `.venv` and install test dependencies. |
181
+
|`make test`| Run the full test suite with coverage report. |
106
182
107
183
## Testing
108
184
@@ -132,21 +208,16 @@ make test # run all tests with coverage
132
208
|`name_parser.py`| 83% |
133
209
134
210
## Troubleshooting
135
-
-**No results for sentence queries with a name**: ensure your notes have the person name in title, filename, headings, or a parent folder (so it gets into `people`). Run `make reindex`.
211
+
-**No results for sentence queries with a name**: ensure your notes have the person name in title, filename, headings, or a parent folder (so it gets into `entities`). Run `make reindex`.
136
212
-**List-valued metadata error**: we sanitize metadata to primitives; if you changed metadata shapes, re-run `make reindex`.
137
-
-**Persist errors with Chroma**: new `langchain-chroma` handles persistence automatically; explicit `persist()` isn’t required.
138
-
-**Using host Ollama**: set `OLLAMA_BASE_URL=http://host.containers.internal:11434` in `docker-compose.yml` and remove bundled `ollama` service if desired.
213
+
-**Ollama not reachable**: ensure `ollama serve` is running on the host before `make up`. Verify with `make ollama-status`.
214
+
-**Wrong model loaded**: the stack reads `GENERATOR_MODEL` / `EMBED_MODEL` at container start. If you changed them, run `make down && make up GENERATOR_MODEL=<new>`.
215
+
-**Embedding model changed**: requires a full reindex — vectors from different embedding models are incompatible. Run `make reindex` after switching `EMBED_MODEL`.
139
216
140
217
## Watcher behavior
141
218
- The watcher debounces file events and calls `POST /reindex/files` with exact changed paths.
142
219
- If partial reindex fails, it falls back to `POST /reindex` (full) to self-heal.
143
220
144
-
145
-
## Use host Ollama
146
-
- Change `OLLAMA_BASE_URL` env for `rag` service to `http://host.containers.internal:11434`.
147
-
- Optionally remove the `ollama` service.
148
-
149
221
## Notes
150
222
- The loader **ignores**`.obsidian/` and expands `[[wikilinks]]` to their alias or target text.
151
223
- Citations include front-matter fields when present (e.g., `title`, `tags`).
0 commit comments