Skip to content

Add a local MCP server for the documentation corpus#365

Draft
miharp wants to merge 1 commit into
OpenVoxProject:masterfrom
miharp:feat/docs-mcp-server
Draft

Add a local MCP server for the documentation corpus#365
miharp wants to merge 1 commit into
OpenVoxProject:masterfrom
miharp:feat/docs-mcp-server

Conversation

@miharp

@miharp miharp commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a local Model Context Protocol server
(tools/mcp/) that exposes the OpenVox documentation to MCP-aware tools (Claude
Code, Cursor, GitHub Copilot, Codex, …) so an assistant can search and read the
docs in-workflow. It is the local, self-hosted counterpart to a hosted "Ask AI"
widget: no third party, no API key, no quota, and queries stay on the machine.

Implements #364. Depends on #358 for the live-corpus default (see below).

Claude Code answering by calling the openvox-docs MCP server

How it works

The corpus is the site's llms.txt / llms-full.txt files, which already ship
as machine-readable plain text with stable per-project (# Project:) and per-doc
(--- / ## title / Source:) delimiters, so the server parses them into one
document per page without scraping HTML.

  • corpus.py — fetches the two files from the live site by default, with
    on-disk caching + conditional requests and an offline fallback to the last good
    copy; OPENVOX_DOCS_SOURCE reads a local _site/ build instead.
  • search.py — BM25 keyword ranking (rank-bm25) over the parsed bodies,
    title-weighted, with a query-centered snippet.
  • server.py — FastMCP server exposing the tools below.

Tools

Tool Purpose
list_projects List the documentation projects.
list_docs(project?) List pages as {project, title, url}.
search_docs(query, project?, limit) BM25 search → {title, url, project, score, snippet}.
get_doc(ref, max_chars=40000) Full text of one page (URL/path/title); capped, since the single-page references are large.
refresh_corpus() Force a re-fetch and rebuild the index.

Tests & CI

  • pytest — 30 unit tests covering corpus parsing, remote fetch/caching (304
    reuse + offline fallback), BM25 ranking, and every tool (~93% coverage).
  • smoke_test.py — launches the server over stdio against a throwaway corpus and
    exercises all five tools end-to-end (network- and model-free).
  • .github/workflows/mcp.yml — runs both on Python 3.10 and 3.13, path-filtered
    to tools/mcp/**.

Dependency on #358

The default corpus source is the live site, which 404s until #358's llms.txt /
llms-full.txt are deployed. Until then the server runs against a local build via
OPENVOX_DOCS_SOURCE=$PWD/_site (documented in the README). Opening as a draft
for that reason; ready to finalize once #358 ships.

Notes

  • New subproject with its own Python toolchain; excluded from the Jekyll build.
  • The demo GIF is committed and embedded in tools/mcp/README.md, regenerable
    from tools/mcp/demo/demo.tape with VHS.

Add tools/mcp: a local, stdio Model Context Protocol server that exposes
the OpenVox documentation to MCP-aware tools (Claude Code, Cursor, Claude
Desktop) so an assistant can search and read the docs in-workflow. It is
the self-hosted counterpart to a hosted "Ask AI" widget: no third party,
no API key, no quota, and queries stay on the machine.

The corpus is the site's llms.txt / llms-full.txt files, which already
ship as machine-readable plain text with stable per-project and per-doc
delimiters, so the server parses them into one document per page without
scraping HTML.

- corpus.py: fetch llms.txt / llms-full.txt from the live site (default)
  with on-disk caching and conditional requests, falling back to the last
  good copy when offline; OPENVOX_DOCS_SOURCE reads a local _site build
  instead. Parses both into structured Doc records.
- search.py: BM25 keyword search (rank-bm25) over the parsed bodies, with
  the title weighted and a query-centered snippet.
- server.py: FastMCP app exposing list_projects, list_docs, search_docs,
  get_doc, and refresh_corpus.

Exclude tools/ from the Jekyll build so the server isn't published with
the site.

Signed-off-by: Michael Harp <mike@mikeharp.com>
@miharp miharp force-pushed the feat/docs-mcp-server branch from 7495cba to dfa1f98 Compare June 23, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant