Skip to content

Commit 3141462

Browse files
authored
agent: docs: add source adapters ADR (#57)
Merged by Vision under delegated forge ownership. Gilfoyle implemented issue #50, Heimdall independently verified head 22ad69d, and all CI/Security/CodeQL/CodeRabbit checks passed.
1 parent 575c17b commit 3141462

1 file changed

Lines changed: 128 additions & 0 deletions

File tree

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# ADR-001: Source Adapters
2+
3+
- **Status:** Accepted
4+
- **Date:** 2026-05-29
5+
- **Deciders:** @ayhammouda
6+
- **Roadmap refs:** principles 2.1, 2.2, 2.7
7+
8+
## Context and Problem Statement
9+
10+
`python-docs-mcp-server` needs documentation answers that are precise,
11+
version-aware, and trustworthy inside MCP clients. The project therefore cannot
12+
treat "source" as an arbitrary search result or scraped mirror. The first layer
13+
of the architecture is a source-connector layer that accepts a version or
14+
package identifier, reaches only canonical upstream sources, and hands stable
15+
artifacts to ingestion.
16+
17+
Two source adapters exist today:
18+
19+
- CPython documentation source: `build-index` uses pinned CPython documentation
20+
build targets from
21+
[`src/mcp_server_python_docs/ingestion/cpython_versions.py`](../../src/mcp_server_python_docs/ingestion/cpython_versions.py),
22+
clones `python/cpython` at the configured tag, installs the configured Sphinx
23+
pin in a dedicated build virtual environment, and runs `sphinx-build -b json`
24+
before ingesting generated JSON files through
25+
[`src/mcp_server_python_docs/ingestion/sphinx_json.py`](../../src/mcp_server_python_docs/ingestion/sphinx_json.py).
26+
Symbol inventory ingestion uses `objects.inv` through
27+
[`src/mcp_server_python_docs/ingestion/inventory.py`](../../src/mcp_server_python_docs/ingestion/inventory.py).
28+
- PyPI metadata source:
29+
[`src/mcp_server_python_docs/services/package_docs.py`](../../src/mcp_server_python_docs/services/package_docs.py)
30+
backs `lookup_package_docs` with `GET /pypi/<project>/json` from the official
31+
PyPI JSON API. It returns package-declared PyPI, documentation, homepage,
32+
source, and repository URLs from controlled metadata fields and does not crawl
33+
pages or perform generic web search.
34+
35+
This ADR records the contract for those adapters so later documentation
36+
ecosystems can clone the layer boundary without weakening the trust model.
37+
38+
## Decision Drivers
39+
40+
- Principle 2.1: canonical source only. CPython comes from pinned upstream tags;
41+
PyPI package links come from PyPI project metadata. Scraped mirrors and
42+
third-party indexers are outside the contract.
43+
- Principle 2.2: offline-first runtime. MCP docs queries should read the local
44+
index and cache, not reach remote documentation services at query time.
45+
- Principle 2.7: layered design with stable contracts. Source connectors must
46+
have explicit inputs, outputs, and invariants so ingestion and downstream
47+
retrieval layers do not depend on source-specific behavior.
48+
- The contract must describe current behavior only. Future adapters, such as
49+
other language ecosystems, should clone the contract rather than be documented
50+
as existing features.
51+
52+
## Considered Options
53+
54+
1. Keep source behavior implicit in ingestion and service code.
55+
- Rejected because future work would have to infer the trust boundary from
56+
implementation details, increasing the chance of accidental mirror,
57+
indexer, or runtime-network drift.
58+
2. Allow generic web or third-party docs providers as source adapters.
59+
- Rejected because this conflicts with principle 2.1 and would make results
60+
less reproducible and less auditable.
61+
3. Document a narrow source-connector contract for the adapters that exist
62+
today.
63+
- Accepted because it matches the current code and gives future adapters a
64+
stable layer boundary to copy.
65+
66+
## Decision Outcome
67+
<!-- Canonical source only; pinned, reproducible; PyPI metadata is the one
68+
controlled network lookup and is not a query-time call. -->
69+
70+
The source-connector layer is limited to canonical upstream sources. CPython
71+
documentation builds are pinned by version-specific CPython tags and Sphinx
72+
pins, then converted into canonical ingestion artifacts by the build pipeline.
73+
PyPI package documentation discovery is limited to the official PyPI JSON API
74+
and allowlisted project metadata fields.
75+
76+
`lookup_package_docs` is the documented exception to the offline-first rule: it
77+
performs a controlled PyPI metadata lookup when the package lookup runs. That is
78+
a build/lookup-time metadata call, not a docs-query-time call against the local
79+
stdlib documentation index, and it is not a general-purpose web fetch.
80+
81+
Future source adapters should clone this contract: accept a stable identifier,
82+
retrieve canonical upstream artifacts, hand those artifacts to ingestion, and
83+
avoid third-party indexers or scraped mirrors.
84+
85+
### Consequences
86+
87+
**Positive:** The source boundary is auditable, reproducible, and easy to test
88+
against roadmap principles. CPython docs builds can be rebuilt from pinned
89+
upstream tags, and PyPI package URLs are traceable to package-declared metadata.
90+
Downstream ingestion, storage, retrieval, budget, serializer, cache, and
91+
transport layers can rely on source artifacts without knowing source-specific
92+
network details.
93+
94+
**Negative / risks:** CPython builds depend on GitHub availability and the
95+
ability to build each pinned CPython docs tree with the configured Sphinx pin.
96+
PyPI metadata quality depends on what each package declares, so results may be
97+
missing, stale, or incomplete. The `lookup_package_docs` exception must remain
98+
narrow; expanding it into page crawling or arbitrary web search would violate
99+
the contract.
100+
101+
## Layer Contract (principle 2.7)
102+
103+
- **Inputs:** A stable source identifier. For CPython documentation, the input
104+
is a supported Python `X.Y` version resolved through
105+
`CPYTHON_DOCS_BUILD_CONFIG`. For PyPI metadata, the input is a package name
106+
normalized into a PyPI project identifier.
107+
- **Outputs:** Canonical artifacts handed to ingestion or presentation. CPython
108+
outputs are `objects.inv` symbol data and Sphinx JSON documentation pages that
109+
ingestion stores in the local index. PyPI outputs are package-declared project,
110+
documentation, homepage, source, and repository URLs plus the metadata source
111+
URL returned by `lookup_package_docs`.
112+
- **Invariants:** Source adapters use canonical upstreams only; CPython content
113+
is pinned and reproducible by tag and Sphinx pin; docs queries use local
114+
indexed artifacts and do not call remote documentation services at query time;
115+
PyPI metadata lookup is the sole documented network exception; adapters do not
116+
use scraped mirrors, third-party indexers, generic web search, or silent
117+
fallback sources.
118+
119+
## Links
120+
121+
- STRATEGIC-ROADMAP-2026-05-29.md §2.1, §2.2, §2.7
122+
- [`src/mcp_server_python_docs/ingestion/cpython_versions.py`](../../src/mcp_server_python_docs/ingestion/cpython_versions.py)
123+
- [`src/mcp_server_python_docs/__main__.py`](../../src/mcp_server_python_docs/__main__.py)
124+
- [`src/mcp_server_python_docs/ingestion/sphinx_json.py`](../../src/mcp_server_python_docs/ingestion/sphinx_json.py)
125+
- [`src/mcp_server_python_docs/ingestion/inventory.py`](../../src/mcp_server_python_docs/ingestion/inventory.py)
126+
- [`src/mcp_server_python_docs/services/package_docs.py`](../../src/mcp_server_python_docs/services/package_docs.py)
127+
- [`README.md`](../../README.md) "Why not Context7 or generic docs retrieval?"
128+
and "PyPI package docs lookup"

0 commit comments

Comments
 (0)