Add a local MCP server for the documentation corpus#365
Draft
miharp wants to merge 1 commit into
Draft
Conversation
Add tools/mcp: a local, stdio Model Context Protocol server that exposes the OpenVox documentation to MCP-aware tools (Claude Code, Cursor, Claude Desktop) so an assistant can search and read the docs in-workflow. It is the self-hosted counterpart to a hosted "Ask AI" widget: no third party, no API key, no quota, and queries stay on the machine. The corpus is the site's llms.txt / llms-full.txt files, which already ship as machine-readable plain text with stable per-project and per-doc delimiters, so the server parses them into one document per page without scraping HTML. - corpus.py: fetch llms.txt / llms-full.txt from the live site (default) with on-disk caching and conditional requests, falling back to the last good copy when offline; OPENVOX_DOCS_SOURCE reads a local _site build instead. Parses both into structured Doc records. - search.py: BM25 keyword search (rank-bm25) over the parsed bodies, with the title weighted and a query-centered snippet. - server.py: FastMCP app exposing list_projects, list_docs, search_docs, get_doc, and refresh_corpus. Exclude tools/ from the Jekyll build so the server isn't published with the site. Signed-off-by: Michael Harp <mike@mikeharp.com>
7495cba to
dfa1f98
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a local Model Context Protocol server
(
tools/mcp/) that exposes the OpenVox documentation to MCP-aware tools (ClaudeCode, Cursor, GitHub Copilot, Codex, …) so an assistant can search and read the
docs in-workflow. It is the local, self-hosted counterpart to a hosted "Ask AI"
widget: no third party, no API key, no quota, and queries stay on the machine.
Implements #364. Depends on #358 for the live-corpus default (see below).
How it works
The corpus is the site's
llms.txt/llms-full.txtfiles, which already shipas machine-readable plain text with stable per-project (
# Project:) and per-doc(
---/## title/Source:) delimiters, so the server parses them into onedocument per page without scraping HTML.
on-disk caching + conditional requests and an offline fallback to the last good
copy;
OPENVOX_DOCS_SOURCEreads a local_site/build instead.rank-bm25) over the parsed bodies,title-weighted, with a query-centered snippet.
Tools
list_projectslist_docs(project?){project, title, url}.search_docs(query, project?, limit){title, url, project, score, snippet}.get_doc(ref, max_chars=40000)refresh_corpus()Tests & CI
pytest— 30 unit tests covering corpus parsing, remote fetch/caching (304reuse + offline fallback), BM25 ranking, and every tool (~93% coverage).
smoke_test.py— launches the server over stdio against a throwaway corpus andexercises all five tools end-to-end (network- and model-free).
.github/workflows/mcp.yml— runs both on Python 3.10 and 3.13, path-filteredto
tools/mcp/**.Dependency on #358
The default corpus source is the live site, which 404s until #358's
llms.txt/llms-full.txtare deployed. Until then the server runs against a local build viaOPENVOX_DOCS_SOURCE=$PWD/_site(documented in the README). Opening as a draftfor that reason; ready to finalize once #358 ships.
Notes
tools/mcp/README.md, regenerablefrom
tools/mcp/demo/demo.tapewith VHS.