Skip to content

Commit 419682e

Browse files
committed
Switched to gemma4 for inference, updated date_parser to avoid slow fallbacks to LLM, changed to support thinking model responses.
1 parent 763195e commit 419682e

14 files changed

Lines changed: 373 additions & 286 deletions

CLAUDE.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,12 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Stack overview
66

7-
Containerised RAG system for Obsidian Markdown vaults. Uses **Podman** (not Docker) via `podman compose`. Three services:
7+
Containerised RAG system for Obsidian Markdown vaults. Uses **Podman** (not Docker) via `podman compose`. Two services:
88
- **rag** (`app/`) — FastAPI server + indexer, built from `app/Dockerfile`
9-
- **ollama** — local LLM and embedding model server
109
- **watcher** (`app/watcher.py`) — watchdog sidecar that posts changed paths to the RAG API
1110

11+
Ollama runs **on the host** (Metal GPU on macOS); containers reach it via `host.containers.internal:11434`. There is no `ollama` container service.
12+
1213
Embeddings and chunks persist in **Chroma** (`/index/chroma` volume). Index state (mtimes + chunk counts per file) is tracked in `index_state.json` alongside the Chroma DB.
1314

1415
## Common commands
@@ -22,8 +23,8 @@ make logs-watcher # tail watcher logs
2223
make ps # show container status
2324
make shell # bash into rag container
2425

25-
# First-time model pull (after make up)
26-
make pull
26+
# First-time model pull (run before make up)
27+
make ollama-bootstrap
2728

2829
# Indexing
2930
make reindex # full incremental reindex
@@ -32,14 +33,14 @@ make reindex-files # partial reindex for specific files (prompts for paths)
3233
make reindex-status # check last reindex result
3334

3435
# Debugging retrieval
35-
make debug-retrieve # vector search only, no metadata
36-
make debug-retrieve-dated # vector search with metadata (date, entities, etc.)
36+
make retrieve # vector search only, no metadata
37+
make retrieve-dated # vector search with metadata (date, entities, etc.)
3738
make parse-dates # test date parsing on a query
3839

3940
# Querying
4041
make ask # single question, blocking
4142
make ask-stream # streaming answer
42-
./chat.sh # interactive chat loop
43+
make chat # interactive chat loop (python3 chat.py)
4344

4445
# MCP
4546
make mcp-install # install scripts/requirements.txt for MCP server
@@ -59,15 +60,15 @@ make test # run full suite with coverage report
5960
### Data flow
6061

6162
1. **Indexing**: `.md` files → `md_loader.py` (front-matter parse + wikilink expansion) → `indexer.py` date-heading split → markdown header split → sentence chunking → char fallback → spaCy entity extraction → Chroma upsert
62-
2. **Query**: question → `date_parser.py` (regex rules, LLM fallback) + `name_parser.py` (heuristic regex, prefers quoted names) → augmented vector search with Chroma `where` filter → entity post-filter → optional recency sort → Ollama generate
63+
2. **Query**: question → `date_parser.py` (regex rules, `dateparser` library fallback) + `name_parser.py` (heuristic regex, prefers quoted names) → augmented vector search with Chroma `where` filter → entity post-filter → optional recency sort → Ollama generate
6364

6465
### Key files in `app/`
6566

6667
| File | Role |
6768
|------|------|
6869
| `rag_server.py` | FastAPI app; `_retrieve()` is the core retrieval function |
6970
| `indexer.py` | `build_index()` / `build_index_files()` + chunking logic; `_iter_chunks()` is the main pipeline |
70-
| `date_parser.py` | `DateParser.parse()` — regex-first, LLM fallback for ambiguous phrases |
71+
| `date_parser.py` | `DateParser.parse()` — regex-first, `dateparser` library fallback for ambiguous phrases |
7172
| `name_parser.py` | `extract_entities_from_text()` (spaCy, used at index time); `extract_name_terms()` (heuristic regex, used at query time) |
7273
| `md_loader.py` | `load_markdown_docs()` + `_expand_wikilinks()` |
7374
| `settings.py` | All config via env vars; all consumed through `settings` singleton |
@@ -86,12 +87,13 @@ make test # run full suite with coverage report
8687

8788
- `source` — vault-relative path
8889
- `title` — from front matter or filename
89-
- `entry_date` — ISO date from date heading or file mtime fallback
90+
- `entry_date` — ISO date from date heading > frontmatter `date` field > file mtime (priority order)
9091
- `entry_date_ts` — Unix timestamp of `entry_date` (for Chroma `$gte`/`$lte` numeric filters)
9192
- `entities` — comma-separated `prefix:Value` strings from spaCy NER (PERSON, ORG, GPE, WORK_OF_ART), merged from file-level and chunk-level extraction
93+
- `tags` — from frontmatter `tags` field (list or string, normalised to space-separated string)
9294
- `chunk_index` — position within the file
9395

94-
Each chunk's embedded text is prefixed with `[title: ...] [entities: ...] [source: ...] [date: ...]` to strengthen metadata relevance in vector search.
96+
Each chunk's embedded text is prefixed with `[title: ...] [entities: ...] [source: ...] [date: ...] [tags: ...]` to strengthen metadata relevance in vector search.
9597

9698
### Retrieval logic (`rag_server._retrieve`)
9799

@@ -125,7 +127,7 @@ All settings are in `app/settings.py` via env vars. Key ones:
125127
- `RAG_URL` — base URL for the running RAG API (default: `http://localhost:8000`)
126128

127129
**Tools:**
128-
- `search_notes(question, top_k)` — semantic search via `/debug/retrieve-dated`; requires RAG stack running
130+
- `search_notes(question, top_k)` — semantic search via `/retrieve/dated`; requires RAG stack running
129131
- `read_note`, `list_notes`, `create_note`, `update_note`, `delete_note`, `lint_note` — provided by obsidian-mcp-guard (path safety, write-vault isolation, mdlint-obsidian validation built in)
130132

131133
`fastmcp` and `obsidian-mcp-guard` are real installed packages in the test venv; no stubbing needed. `conftest.py` sets `HOST_VAULT_PATH=/tmp/test-vault-root` so `create_vault_server()` doesn't error at import time.
@@ -154,3 +156,4 @@ Tests live in `tests/` and run **locally** (no container). Use `make test` to ru
154156
- `entry_date_ts` was added later; a full reindex is needed on existing installations to backfill it.
155157
- Chroma persistence is automatic (`PersistentClient`); do not call `.persist()` explicitly.
156158
- The watcher uses `RAG_FILES_URL` to call `/reindex/files`; if that fails it falls back to `/reindex` (full).
159+
- `NUM_PREDICT` defaults to `-1` (unlimited). Do **not** set a low value (e.g. 256 or 800) — thinking models (like gemma4) consume their entire token budget reasoning before generating any response, so a low cap produces empty answers.

Makefile

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ GENERATOR_MODEL ?= gemma4-26b-q4xl:latest
1313
EMBED_MODEL ?= nomic-embed-text
1414
export GENERATOR_MODEL EMBED_MODEL
1515

16-
.PHONY: up down logs logs-watcher ollama-bootstrap ollama-status reindex reindex-scan reindex-files reindex-status debug-retrieve debug-retrieve-dated parse-dates ask ask-stream chat shell check ps restart machine-start machine-init test-install test
16+
.PHONY: up down logs logs-watcher ollama-bootstrap ollama-status reindex reindex-scan reindex-files reindex-status retrieve retrieve-dated parse-dates ask ask-stream chat shell check ps restart machine-start machine-init test-install test
1717

1818
up:
1919
podman compose -f docker-compose.yml up -d --build
@@ -69,21 +69,21 @@ reindex-files:
6969
reindex-status:
7070
curl -s -X GET http://localhost:8000/reindex/status | jq .
7171

72-
debug-retrieve:
72+
retrieve:
7373
@read -p "Query: " Q; \
74-
curl -s -G "http://localhost:8000/debug/retrieve" \
74+
curl -s -G "http://localhost:8000/retrieve" \
7575
--data-urlencode "q=$$Q" \
7676
--data-urlencode "k=5" | jq .
7777

78-
debug-retrieve-dated:
78+
retrieve-dated:
7979
@read -p "Query: " Q; \
80-
curl -s -G "http://localhost:8000/debug/retrieve-dated" \
80+
curl -s -G "http://localhost:8000/retrieve/dated" \
8181
--data-urlencode "q=$$Q" \
8282
--data-urlencode "k=5" | jq .
8383

8484
parse-dates:
8585
@read -p "Query: " Q; \
86-
curl -s -G "http://localhost:8000/debug/parse-dates" \
86+
curl -s -G "http://localhost:8000/utils/parse-dates" \
8787
--data-urlencode "q=$$Q" | jq .
8888

8989
ask:
@@ -99,7 +99,7 @@ ask-stream:
9999
-d "$$(jq -n --arg q "$$Q" '{question:$$q}')" ; echo
100100

101101
chat:
102-
bash ./chat.sh
102+
.venv/bin/python ./chat.py
103103

104104
shell:
105105
podman exec -it markdown-rag bash

README.md

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ A containerised RAG stack for your Markdown vault:
2121
```
2222
5. Start chatting:
2323
```bash
24-
./chat.sh
24+
make chat
2525
```
2626

2727
## Changing models
@@ -57,7 +57,7 @@ curl -X POST http://localhost:8000/query -H "Content-Type: application/json" \
5757
```
5858

5959
## Architecture
60-
- **rag_server** (`app/rag_server.py`): FastAPI app exposing debug and chat endpoints.
60+
- **rag_server** (`app/rag_server.py`): FastAPI app exposing search, utility, and chat endpoints.
6161
- **indexer** (`app/indexer.py`): Loads markdown, splits into chunks, extracts metadata, embeds and upserts to Chroma.
6262
- **name/date parsing**: `app/name_parser.py`, `app/date_parser.py` detect people terms and date ranges.
6363
- **watcher** (`app/watcher.py`): Monitors the vault and triggers partial reindex.
@@ -75,13 +75,13 @@ markdown-rag/
7575
indexer.py # Indexing pipeline and Chroma access
7676
md_loader.py # Markdown loading + wikilink expansion
7777
name_parser.py # Name detection (query + indexing)
78-
date_parser.py # Date range parsing (regex + LLM fallback)
78+
date_parser.py # Date range parsing (regex + dateparser fallback)
7979
watcher.py # Vault filesystem watcher
8080
system_prompt.txt # System prompt used for answering
8181
run.sh # Entrypoint used by container
8282
docker-compose.yml
8383
Makefile
84-
chat.sh # Simple local chat helper
84+
chat.py # Interactive chat CLI (streaming, think-tag filtering)
8585
README.md
8686
```
8787

@@ -103,14 +103,21 @@ markdown-rag/
103103
- `RAG_URL`, `RAG_FILES_URL` (watcher): endpoints for full and partial reindex (defaults are fine in docker-compose).
104104

105105
## API Endpoints (selected)
106-
- `GET /debug/parse-dates?q=...` → parsed `{start,end}`.
107-
- `GET /debug/retrieve?q=...&k=5` → top-k candidates (no dates in response).
108-
- `GET /debug/retrieve-dated?q=...&k=5` → candidates with metadata (source, entry_date, people, title, snippet).
106+
107+
### Search
108+
- `GET /retrieve?q=...&k=5` → top-k candidates from vector search (source, title, entry_date, snippet).
109+
- `GET /retrieve/dated?q=...&k=5` → top-k candidates with full metadata; response includes `filter` showing the parsed date range that was applied.
110+
111+
### Indexing
109112
- `POST /reindex` → full incremental reindex.
110113
- `POST /reindex/scan` → enumerate vault and queue only changed/removed files since last index state, then partial reindex.
111114
- `POST /reindex/files` → partial reindex of given `{"files": ["path.md", ...]}` relative to the vault.
112115
- `GET /reindex/status` → last reindex summary.
113116

117+
### Utilities
118+
- `GET /utils/parse-dates?q=...` → parsed `{start, end}` date range for a query string; useful for verifying date extraction.
119+
- `POST /utils/split-by-date` → show how a markdown document is split by date headings; POST form field `text` or upload a `file`.
120+
114121
## Startup indexing
115122
- On container start, if `REINDEX_ON_START=true`, `app/run.sh` triggers `POST /reindex/scan`.
116123
- The scan compares current vault mtimes vs `index_state.json` and queues only changed/removed files, then calls the same partial reindex worker path the watcher uses.
@@ -156,15 +163,15 @@ markdown-rag/
156163
| `make reindex-files` | Partial reindex for specific vault-relative paths (prompts for input). |
157164
| `make reindex-status` | Show last reindex result. |
158165

159-
### Querying / debugging
166+
### Querying
160167

161168
| Target | Description |
162169
|--------|-------------|
163170
| `make ask` | Interactive single question (blocking). |
164171
| `make ask-stream` | Interactive single question (streaming). |
165-
| `make debug-retrieve` | Vector search only, no metadata in response. |
166-
| `make debug-retrieve-dated` | Vector search with full metadata (date, entities, etc.). |
167-
| `make parse-dates` | Test date parsing on a query. |
172+
| `make retrieve` | Vector search, returns source/title/date/snippet per result. |
173+
| `make retrieve-dated` | Vector search with full metadata; shows the date filter that was applied. |
174+
| `make parse-dates` | Show the parsed date range for a query; useful for verifying date extraction. |
168175

169176
### Podman machine (macOS)
170177

@@ -197,15 +204,15 @@ make test # run all tests with coverage
197204

198205
### Coverage
199206

200-
230 tests across 8 files; overall coverage ~93% on `app/` modules:
207+
222 tests across 8 files; overall coverage ~91% on `app/` modules:
201208

202209
| Module | Coverage |
203210
|--------|----------|
204211
| `settings.py`, `md_loader.py` | 100% |
205212
| `rag_server.py` | 97% |
206-
| `date_parser.py`, `watcher.py` | 96% |
207-
| `indexer.py` | 84% |
208-
| `name_parser.py` | 83% |
213+
| `watcher.py` | 96% |
214+
| `date_parser.py` | 93% |
215+
| `indexer.py`, `name_parser.py` | 83% |
209216

210217
## Troubleshooting
211218
- **No results for sentence queries with a name**: ensure your notes have the person name in title, filename, headings, or a parent folder (so it gets into `entities`). Run `make reindex`.

app/date_parser.py

Lines changed: 41 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@
44
from datetime import datetime, timedelta
55
from zoneinfo import ZoneInfo
66
from typing import Optional
7-
import json
8-
import httpx
97

10-
from settings import settings
8+
try:
9+
import dateparser # type: ignore[import]
10+
except ImportError: # pragma: no cover
11+
dateparser = None # type: ignore[assignment]
1112

1213
ISO_DATE = re.compile(r"\b\d{4}-\d{2}-\d{2}\b")
1314
DMY_SLASH = re.compile(r"\b(\d{1,2})/(\d{1,2})/(\d{4})\b")
@@ -49,6 +50,15 @@
4950
)
5051
FORTNIGHT_RE = re.compile(r"\b(?:last|past|previous)?\s*fortnight\b", re.IGNORECASE)
5152

53+
_MON = r"(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:t(?:ember)?)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)"
54+
# Matches "in April", "during March 2025", "April 2025" (year required when no preposition,
55+
# to avoid treating the auxiliary verb "may" as a month name).
56+
MONTH_ONLY_RE = re.compile(
57+
r"(?:(?:in|during|for|of)\s+(?P<mp>" + _MON + r")(?:\s+(?P<yp>\d{4}))?(?!\s*\d)"
58+
r"|(?P<mn>" + _MON + r")\s+(?P<yn>\d{4}))",
59+
re.IGNORECASE,
60+
)
61+
5262
RANGE_RE = re.compile(
5363
r"\b(?:between\s+(?P<between_a>.+?)\s+and\s+(?P<between_b>.+?)|from\s+(?P<from_a>.+?)\s+(?:to|until)\s+(?P<from_b>.+?)|since\s+(?P<since>.+?)|after\s+(?P<after>.+?)|before\s+(?P<before>.+?))\b",
5464
re.IGNORECASE,
@@ -233,54 +243,41 @@ def parse(self, q: str, tz_name: str) -> tuple[Optional[str], Optional[str]]:
233243
if not start and not end:
234244
start = end = iso
235245

246+
# Bare month name: "in April", "during March 2025", "April 2025"
247+
if not start and not end:
248+
mm = MONTH_ONLY_RE.search(q)
249+
if mm:
250+
mon_str = mm.group('mp') or mm.group('mn')
251+
yr_str = mm.group('yp') or mm.group('yn')
252+
mon = self._parse_month(mon_str)
253+
yr = int(yr_str) if yr_str else now.year
254+
if mon:
255+
start, end = self._month_bounds(datetime(yr, mon, 1, tzinfo=tz))
256+
236257
# Normalize order
237258
if start and end and start > end:
238259
start, end = end, start
239260

240261
if start or end:
241262
return start, end
242263

243-
# LLM fallback for ambiguous phrases
264+
# dateparser fallback for phrases not covered by the regex rules above
265+
# (e.g. "a few weeks ago", "early March", "Q1 2025", "last Tuesday").
266+
# dateparser is pure-Python — no network call, no model load.
244267
try:
245-
prompt = (
246-
"You are a date range extractor. Given the current date/time and a user query, "
247-
"return a JSON object with keys start, end. Use ISO YYYY-MM-DD dates or null.\n"
248-
"Rules: start <= end when both present; interpret relative phrases relative to the current date/time and timezone.\n"
249-
"Output ONLY JSON. No extra text.\n\n"
250-
f"Current date/time: {now.strftime('%Y-%m-%d %H:%M')} {settings.timezone}\n"
251-
f"Query: {q}\n"
252-
)
253-
payload = {
254-
"model": settings.generator_model,
255-
"prompt": prompt,
256-
"options": {
257-
"temperature": 0,
258-
"num_ctx": getattr(settings, "num_ctx", 2048),
259-
"num_predict": 128,
260-
"keep_alive": "5m",
268+
parsed = dateparser.parse(
269+
q,
270+
settings={
271+
"PREFER_DATES_FROM": "past",
272+
"PREFER_DAY_OF_MONTH": "first",
273+
"RETURN_AS_TIMEZONE_AWARE": True,
274+
"TIMEZONE": tz_name,
275+
"TO_TIMEZONE": tz_name,
261276
},
262-
"stream": False,
263-
}
264-
with httpx.Client(base_url=settings.ollama_base_url, timeout=20.0) as client:
265-
r = client.post("/api/generate", json=payload)
266-
r.raise_for_status()
267-
data = r.json().get("response", "{}")
268-
obj = json.loads(data)
269-
s = obj.get("start")
270-
e = obj.get("end")
271-
# validate
272-
def _is_iso(d: Optional[str]) -> bool:
273-
if not d:
274-
return False
275-
try:
276-
datetime.fromisoformat(d)
277-
return True
278-
except Exception:
279-
return False
280-
s = s if _is_iso(s) else None
281-
e = e if _is_iso(e) else None
282-
if s and e and s > e:
283-
s, e = e, s
284-
return s, e
277+
)
278+
if parsed:
279+
d = parsed.date().isoformat()
280+
return d, d
285281
except Exception:
286-
return None, None
282+
pass
283+
return None, None

app/name_parser.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,13 @@ def extract_entities_from_text(text: str) -> List[str]:
120120
"His",
121121
"Her",
122122
"Their",
123+
# month names (full and abbreviated) — prevent date words being treated as names
124+
"January", "February", "March", "April", "May", "June",
125+
"July", "August", "September", "October", "November", "December",
126+
"Jan", "Feb", "Mar", "Apr", "Jun", "Jul", "Aug", "Sep", "Sept", "Oct", "Nov", "Dec",
127+
# day names
128+
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday",
129+
"Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun",
123130
# common non-name tokens seen in titles
124131
"Notes",
125132
"Note",

0 commit comments

Comments
 (0)