codeafix
diff --git a/‎CLAUDE.md‎
Lines changed: 15 additions & 12 deletions b/‎CLAUDE.md‎
Lines changed: 15 additions & 12 deletions
diff --git a/‎Makefile‎
Lines changed: 7 additions & 7 deletions b/‎Makefile‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎README.md‎
Lines changed: 22 additions & 15 deletions b/‎README.md‎
Lines changed: 22 additions & 15 deletions
diff --git a/‎app/date_parser.py‎
Lines changed: 41 additions & 44 deletions b/‎app/date_parser.py‎
Lines changed: 41 additions & 44 deletions
diff --git a/‎app/name_parser.py‎
Lines changed: 7 additions & 0 deletions b/‎app/name_parser.py‎
Lines changed: 7 additions & 0 deletions
@@ -4,11 +4,12 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Stack overview
 
-Containerised RAG system for Obsidian Markdown vaults. Uses **Podman** (not Docker) via `podman compose`. Three services:
+Containerised RAG system for Obsidian Markdown vaults. Uses **Podman** (not Docker) via `podman compose`. Two services:
 - **rag** (`app/`) — FastAPI server + indexer, built from `app/Dockerfile`
-- **ollama** — local LLM and embedding model server
 - **watcher** (`app/watcher.py`) — watchdog sidecar that posts changed paths to the RAG API
 
+Ollama runs **on the host** (Metal GPU on macOS); containers reach it via `host.containers.internal:11434`. There is no `ollama` container service.
+
 Embeddings and chunks persist in **Chroma** (`/index/chroma` volume). Index state (mtimes + chunk counts per file) is tracked in `index_state.json` alongside the Chroma DB.
 
 ## Common commands
@@ -22,8 +23,8 @@ make logs-watcher   # tail watcher logs
 make ps             # show container status
 make shell          # bash into rag container
 
-# First-time model pull (after make up)
-make pull
+# First-time model pull (run before make up)
+make ollama-bootstrap
 
 # Indexing
 make reindex         # full incremental reindex
@@ -32,14 +33,14 @@ make reindex-files   # partial reindex for specific files (prompts for paths)
 make reindex-status  # check last reindex result
 
 # Debugging retrieval
-make debug-retrieve        # vector search only, no metadata
-make debug-retrieve-dated  # vector search with metadata (date, entities, etc.)
+make retrieve              # vector search only, no metadata
+make retrieve-dated        # vector search with metadata (date, entities, etc.)
 make parse-dates           # test date parsing on a query
 
 # Querying
 make ask            # single question, blocking
 make ask-stream     # streaming answer
-./chat.sh           # interactive chat loop
+make chat           # interactive chat loop (python3 chat.py)
 
 # MCP
 make mcp-install    # install scripts/requirements.txt for MCP server
@@ -59,15 +60,15 @@ make test           # run full suite with coverage report
 ### Data flow
 
 1. **Indexing**: `.md` files → `md_loader.py` (front-matter parse + wikilink expansion) → `indexer.py` date-heading split → markdown header split → sentence chunking → char fallback → spaCy entity extraction → Chroma upsert
-2. **Query**: question → `date_parser.py` (regex rules, LLM fallback) + `name_parser.py` (heuristic regex, prefers quoted names) → augmented vector search with Chroma `where` filter → entity post-filter → optional recency sort → Ollama generate
+2. **Query**: question → `date_parser.py` (regex rules, `dateparser` library fallback) + `name_parser.py` (heuristic regex, prefers quoted names) → augmented vector search with Chroma `where` filter → entity post-filter → optional recency sort → Ollama generate
 
 ### Key files in `app/`
 
 | File | Role |
 |------|------|
 | `rag_server.py` | FastAPI app; `_retrieve()` is the core retrieval function |
 | `indexer.py` | `build_index()` / `build_index_files()` + chunking logic; `_iter_chunks()` is the main pipeline |
-| `date_parser.py` | `DateParser.parse()` — regex-first, LLM fallback for ambiguous phrases |
+| `date_parser.py` | `DateParser.parse()` — regex-first, `dateparser` library fallback for ambiguous phrases |
 | `name_parser.py` | `extract_entities_from_text()` (spaCy, used at index time); `extract_name_terms()` (heuristic regex, used at query time) |
 | `md_loader.py` | `load_markdown_docs()` + `_expand_wikilinks()` |
 | `settings.py` | All config via env vars; all consumed through `settings` singleton |
@@ -86,12 +87,13 @@ make test           # run full suite with coverage report
 
 - `source` — vault-relative path
 - `title` — from front matter or filename
-- `entry_date` — ISO date from date heading or file mtime fallback
+- `entry_date` — ISO date from date heading > frontmatter `date` field > file mtime (priority order)
 - `entry_date_ts` — Unix timestamp of `entry_date` (for Chroma `$gte`/`$lte` numeric filters)
 - `entities` — comma-separated `prefix:Value` strings from spaCy NER (PERSON, ORG, GPE, WORK_OF_ART), merged from file-level and chunk-level extraction
+- `tags` — from frontmatter `tags` field (list or string, normalised to space-separated string)
 - `chunk_index` — position within the file
 
-Each chunk's embedded text is prefixed with `[title: ...] [entities: ...] [source: ...] [date: ...]` to strengthen metadata relevance in vector search.
+Each chunk's embedded text is prefixed with `[title: ...] [entities: ...] [source: ...] [date: ...] [tags: ...]` to strengthen metadata relevance in vector search.
 
 ### Retrieval logic (`rag_server._retrieve`)
 
@@ -125,7 +127,7 @@ All settings are in `app/settings.py` via env vars. Key ones:
 - `RAG_URL` — base URL for the running RAG API (default: `http://localhost:8000`)
 
 **Tools:**
-- `search_notes(question, top_k)` — semantic search via `/debug/retrieve-dated`; requires RAG stack running
+- `search_notes(question, top_k)` — semantic search via `/retrieve/dated`; requires RAG stack running
 - `read_note`, `list_notes`, `create_note`, `update_note`, `delete_note`, `lint_note` — provided by obsidian-mcp-guard (path safety, write-vault isolation, mdlint-obsidian validation built in)
 
 `fastmcp` and `obsidian-mcp-guard` are real installed packages in the test venv; no stubbing needed. `conftest.py` sets `HOST_VAULT_PATH=/tmp/test-vault-root` so `create_vault_server()` doesn't error at import time.
@@ -154,3 +156,4 @@ Tests live in `tests/` and run **locally** (no container). Use `make test` to ru
 - `entry_date_ts` was added later; a full reindex is needed on existing installations to backfill it.
 - Chroma persistence is automatic (`PersistentClient`); do not call `.persist()` explicitly.
 - The watcher uses `RAG_FILES_URL` to call `/reindex/files`; if that fails it falls back to `/reindex` (full).
+- `NUM_PREDICT` defaults to `-1` (unlimited). Do **not** set a low value (e.g. 256 or 800) — thinking models (like gemma4) consume their entire token budget reasoning before generating any response, so a low cap produces empty answers.
@@ -13,7 +13,7 @@ GENERATOR_MODEL ?= gemma4-26b-q4xl:latest
 EMBED_MODEL     ?= nomic-embed-text
 export GENERATOR_MODEL EMBED_MODEL
 
-.PHONY: up down logs logs-watcher ollama-bootstrap ollama-status reindex reindex-scan reindex-files reindex-status debug-retrieve debug-retrieve-dated parse-dates ask ask-stream chat shell check ps restart machine-start machine-init test-install test
+.PHONY: up down logs logs-watcher ollama-bootstrap ollama-status reindex reindex-scan reindex-files reindex-status retrieve retrieve-dated parse-dates ask ask-stream chat shell check ps restart machine-start machine-init test-install test
 
 up:
 	podman compose -f docker-compose.yml up -d --build
@@ -69,21 +69,21 @@ reindex-files:
 reindex-status:
 	curl -s -X GET http://localhost:8000/reindex/status | jq .
 
-debug-retrieve:
+retrieve:
 	@read -p "Query: " Q; \
-	curl -s -G "http://localhost:8000/debug/retrieve" \
+	curl -s -G "http://localhost:8000/retrieve" \
 	  --data-urlencode "q=$$Q" \
 	  --data-urlencode "k=5" | jq .
 
-debug-retrieve-dated:
+retrieve-dated:
 	@read -p "Query: " Q; \
-	curl -s -G "http://localhost:8000/debug/retrieve-dated" \
+	curl -s -G "http://localhost:8000/retrieve/dated" \
 	  --data-urlencode "q=$$Q" \
 	  --data-urlencode "k=5" | jq .
 
 parse-dates:
 	@read -p "Query: " Q; \
-	curl -s -G "http://localhost:8000/debug/parse-dates" \
+	curl -s -G "http://localhost:8000/utils/parse-dates" \
 	  --data-urlencode "q=$$Q" | jq .
 
 ask:
@@ -99,7 +99,7 @@ ask-stream:
 	  -d "$$(jq -n --arg q "$$Q" '{question:$$q}')" ; echo
 
 chat:
-	bash ./chat.sh
+	.venv/bin/python ./chat.py
 
 shell:
 	podman exec -it markdown-rag bash
 
@@ -21,7 +21,7 @@ A containerised RAG stack for your Markdown vault:
    ```
 5. Start chatting:
    ```bash
-   ./chat.sh
+   make chat
    ```
 
 ## Changing models
@@ -57,7 +57,7 @@ curl -X POST http://localhost:8000/query -H "Content-Type: application/json" \
 ```
 
 ## Architecture
-- **rag_server** (`app/rag_server.py`): FastAPI app exposing debug and chat endpoints.
+- **rag_server** (`app/rag_server.py`): FastAPI app exposing search, utility, and chat endpoints.
 - **indexer** (`app/indexer.py`): Loads markdown, splits into chunks, extracts metadata, embeds and upserts to Chroma.
 - **name/date parsing**: `app/name_parser.py`, `app/date_parser.py` detect people terms and date ranges.
 - **watcher** (`app/watcher.py`): Monitors the vault and triggers partial reindex.
@@ -75,13 +75,13 @@ markdown-rag/
     indexer.py           # Indexing pipeline and Chroma access
     md_loader.py         # Markdown loading + wikilink expansion
     name_parser.py       # Name detection (query + indexing)
-    date_parser.py       # Date range parsing (regex + LLM fallback)
+    date_parser.py       # Date range parsing (regex + dateparser fallback)
     watcher.py           # Vault filesystem watcher
     system_prompt.txt    # System prompt used for answering
     run.sh               # Entrypoint used by container
   docker-compose.yml
   Makefile
-  chat.sh               # Simple local chat helper
+  chat.py               # Interactive chat CLI (streaming, think-tag filtering)
   README.md
 ```
 
@@ -103,14 +103,21 @@ markdown-rag/
   - `RAG_URL`, `RAG_FILES_URL` (watcher): endpoints for full and partial reindex (defaults are fine in docker-compose).
 
 ## API Endpoints (selected)
-- `GET /debug/parse-dates?q=...` → parsed `{start,end}`.
-- `GET /debug/retrieve?q=...&k=5` → top-k candidates (no dates in response).
-- `GET /debug/retrieve-dated?q=...&k=5` → candidates with metadata (source, entry_date, people, title, snippet).
+
+### Search
+- `GET /retrieve?q=...&k=5` → top-k candidates from vector search (source, title, entry_date, snippet).
+- `GET /retrieve/dated?q=...&k=5` → top-k candidates with full metadata; response includes `filter` showing the parsed date range that was applied.
+
+### Indexing
 - `POST /reindex` → full incremental reindex.
 - `POST /reindex/scan` → enumerate vault and queue only changed/removed files since last index state, then partial reindex.
 - `POST /reindex/files` → partial reindex of given `{"files": ["path.md", ...]}` relative to the vault.
 - `GET /reindex/status` → last reindex summary.
 
+### Utilities
+- `GET /utils/parse-dates?q=...` → parsed `{start, end}` date range for a query string; useful for verifying date extraction.
+- `POST /utils/split-by-date` → show how a markdown document is split by date headings; POST form field `text` or upload a `file`.
+
 ## Startup indexing
 - On container start, if `REINDEX_ON_START=true`, `app/run.sh` triggers `POST /reindex/scan`.
 - The scan compares current vault mtimes vs `index_state.json` and queues only changed/removed files, then calls the same partial reindex worker path the watcher uses.
@@ -156,15 +163,15 @@ markdown-rag/
 | `make reindex-files` | Partial reindex for specific vault-relative paths (prompts for input). |
 | `make reindex-status` | Show last reindex result. |
 
-### Querying / debugging
+### Querying
 
 | Target | Description |
 |--------|-------------|
 | `make ask` | Interactive single question (blocking). |
 | `make ask-stream` | Interactive single question (streaming). |
-| `make debug-retrieve` | Vector search only, no metadata in response. |
-| `make debug-retrieve-dated` | Vector search with full metadata (date, entities, etc.). |
-| `make parse-dates` | Test date parsing on a query. |
+| `make retrieve` | Vector search, returns source/title/date/snippet per result. |
+| `make retrieve-dated` | Vector search with full metadata; shows the date filter that was applied. |
+| `make parse-dates` | Show the parsed date range for a query; useful for verifying date extraction. |
 
 ### Podman machine (macOS)
 
@@ -197,15 +204,15 @@ make test           # run all tests with coverage
 
 ### Coverage
 
-230 tests across 8 files; overall coverage ~93% on `app/` modules:
+222 tests across 8 files; overall coverage ~91% on `app/` modules:
 
 | Module | Coverage |
 |--------|----------|
 | `settings.py`, `md_loader.py` | 100% |
 | `rag_server.py` | 97% |
-| `date_parser.py`, `watcher.py` | 96% |
-| `indexer.py` | 84% |
-| `name_parser.py` | 83% |
+| `watcher.py` | 96% |
+| `date_parser.py` | 93% |
+| `indexer.py`, `name_parser.py` | 83% |
 
 ## Troubleshooting
 - **No results for sentence queries with a name**: ensure your notes have the person name in title, filename, headings, or a parent folder (so it gets into `entities`). Run `make reindex`.
 
@@ -4,10 +4,11 @@
 from datetime import datetime, timedelta
 from zoneinfo import ZoneInfo
 from typing import Optional
-import json
-import httpx
 
-from settings import settings
+try:
+    import dateparser  # type: ignore[import]
+except ImportError:  # pragma: no cover
+    dateparser = None  # type: ignore[assignment]
 
 ISO_DATE = re.compile(r"\b\d{4}-\d{2}-\d{2}\b")
 DMY_SLASH = re.compile(r"\b(\d{1,2})/(\d{1,2})/(\d{4})\b")
@@ -49,6 +50,15 @@
 )
 FORTNIGHT_RE = re.compile(r"\b(?:last|past|previous)?\s*fortnight\b", re.IGNORECASE)
 
+_MON = r"(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:t(?:ember)?)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)"
+# Matches "in April", "during March 2025", "April 2025" (year required when no preposition,
+# to avoid treating the auxiliary verb "may" as a month name).
+MONTH_ONLY_RE = re.compile(
+    r"(?:(?:in|during|for|of)\s+(?P<mp>" + _MON + r")(?:\s+(?P<yp>\d{4}))?(?!\s*\d)"
+    r"|(?P<mn>" + _MON + r")\s+(?P<yn>\d{4}))",
+    re.IGNORECASE,
+)
+
 RANGE_RE = re.compile(
     r"\b(?:between\s+(?P<between_a>.+?)\s+and\s+(?P<between_b>.+?)|from\s+(?P<from_a>.+?)\s+(?:to|until)\s+(?P<from_b>.+?)|since\s+(?P<since>.+?)|after\s+(?P<after>.+?)|before\s+(?P<before>.+?))\b",
     re.IGNORECASE,
@@ -233,54 +243,41 @@ def parse(self, q: str, tz_name: str) -> tuple[Optional[str], Optional[str]]:
                 if not start and not end:
                     start = end = iso
 
+        # Bare month name: "in April", "during March 2025", "April 2025"
+        if not start and not end:
+            mm = MONTH_ONLY_RE.search(q)
+            if mm:
+                mon_str = mm.group('mp') or mm.group('mn')
+                yr_str = mm.group('yp') or mm.group('yn')
+                mon = self._parse_month(mon_str)
+                yr = int(yr_str) if yr_str else now.year
+                if mon:
+                    start, end = self._month_bounds(datetime(yr, mon, 1, tzinfo=tz))
+
         # Normalize order
         if start and end and start > end:
             start, end = end, start
 
         if start or end:
             return start, end
 
-        # LLM fallback for ambiguous phrases
+        # dateparser fallback for phrases not covered by the regex rules above
+        # (e.g. "a few weeks ago", "early March", "Q1 2025", "last Tuesday").
+        # dateparser is pure-Python — no network call, no model load.
         try:
-            prompt = (
-                "You are a date range extractor. Given the current date/time and a user query, "
-                "return a JSON object with keys start, end. Use ISO YYYY-MM-DD dates or null.\n"
-                "Rules: start <= end when both present; interpret relative phrases relative to the current date/time and timezone.\n"
-                "Output ONLY JSON. No extra text.\n\n"
-                f"Current date/time: {now.strftime('%Y-%m-%d %H:%M')} {settings.timezone}\n"
-                f"Query: {q}\n"
-            )
-            payload = {
-                "model": settings.generator_model,
-                "prompt": prompt,
-                "options": {
-                    "temperature": 0,
-                    "num_ctx": getattr(settings, "num_ctx", 2048),
-                    "num_predict": 128,
-                    "keep_alive": "5m",
+            parsed = dateparser.parse(
+                q,
+                settings={
+                    "PREFER_DATES_FROM": "past",
+                    "PREFER_DAY_OF_MONTH": "first",
+                    "RETURN_AS_TIMEZONE_AWARE": True,
+                    "TIMEZONE": tz_name,
+                    "TO_TIMEZONE": tz_name,
                 },
-                "stream": False,
-            }
-            with httpx.Client(base_url=settings.ollama_base_url, timeout=20.0) as client:
-                r = client.post("/api/generate", json=payload)
-                r.raise_for_status()
-                data = r.json().get("response", "{}")
-            obj = json.loads(data)
-            s = obj.get("start")
-            e = obj.get("end")
-            # validate
-            def _is_iso(d: Optional[str]) -> bool:
-                if not d:
-                    return False
-                try:
-                    datetime.fromisoformat(d)
-                    return True
-                except Exception:
-                    return False
-            s = s if _is_iso(s) else None
-            e = e if _is_iso(e) else None
-            if s and e and s > e:
-                s, e = e, s
-            return s, e
+            )
+            if parsed:
+                d = parsed.date().isoformat()
+                return d, d
         except Exception:
-            return None, None
+            pass
+        return None, None
@@ -120,6 +120,13 @@ def extract_entities_from_text(text: str) -> List[str]:
     "His",
     "Her",
     "Their",
+    # month names (full and abbreviated) — prevent date words being treated as names
+    "January", "February", "March", "April", "May", "June",
+    "July", "August", "September", "October", "November", "December",
+    "Jan", "Feb", "Mar", "Apr", "Jun", "Jul", "Aug", "Sep", "Sept", "Oct", "Nov", "Dec",
+    # day names
+    "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday",
+    "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun",
     # common non-name tokens seen in titles
     "Notes",
     "Note",