|
| 1 | +# Agent Context — CPython source SHA pin |
| 2 | + |
| 3 | +> One-read working context for issue `[v0.3.0] ingestion — pin CPython source by commit SHA`. |
| 4 | +> PARTIAL issue: you do the pin + verification; the human writes the SECURITY.md prose. |
| 5 | +
|
| 6 | +## 1. Roadmap excerpt |
| 7 | + |
| 8 | +> **Build-time supply-chain hardening** (roadmap §4, v0.3.0): Pin CPython source |
| 9 | +> by SHA, not by tag. Document the threat model in SECURITY.md (the `build-index` |
| 10 | +> CPython clone is the largest non-runtime attack surface). Verify Sphinx-build |
| 11 | +> environment isolation. |
| 12 | +> |
| 13 | +> **Decision 5.10 (locked):** Build-time supply chain (the `build-index` CPython |
| 14 | +> clone) is an explicit risk area; threat model documented in SECURITY.md; |
| 15 | +> CPython source pinned by SHA. |
| 16 | +
|
| 17 | +## 2. Code touch-points |
| 18 | + |
| 19 | +- `src/mcp_server_python_docs/ingestion/cpython_versions.py` |
| 20 | + - `CPythonDocsBuildConfig(TypedDict)` — add `sha: str`. |
| 21 | + - `CPYTHON_DOCS_BUILD_CONFIG` — five entries, currently `{"tag": ..., "sphinx_pin": ...}`: |
| 22 | + `3.10→v3.10.20`, `3.11→v3.11.15`, `3.12→v3.12.13`, `3.13→v3.13.13`, `3.14→v3.14.4`. |
| 23 | + Add the resolved SHA to each. Resolve with: |
| 24 | + `git ls-remote https://github.com/python/cpython.git refs/tags/<tag>` |
| 25 | + (use the dereferenced commit — the `<tag>^{}` line — not the annotated-tag object). |
| 26 | +- `src/mcp_server_python_docs/__main__.py:210–226` — the clone: |
| 27 | + `git clone --depth 1 --branch config["tag"] https://github.com/python/cpython.git <clone_dir>`. |
| 28 | + After it, add: `rev = git -C <clone_dir> rev-parse HEAD`; if `rev != config["sha"]`, |
| 29 | + log a clear error and **abort this version's build** (raise / skip-with-failure — |
| 30 | + match the existing error-handling style in this function; do not silently continue). |
| 31 | +- `tests/test_ingestion.py:53` — existing assertion |
| 32 | + `config["tag"].startswith(f"v{version}.")`. Add a sibling assertion that |
| 33 | + `config["sha"]` matches `^[0-9a-f]{40}$`. |
| 34 | + |
| 35 | +## 3. Patterns to follow |
| 36 | + |
| 37 | +- `tests/test_ingestion.py` iterates `CPYTHON_DOCS_BUILD_CONFIG.items()` for the |
| 38 | + tag assertion — extend that same loop for the SHA assertion. No new fixtures. |
| 39 | +- The clone block already uses `subprocess.run([...], check=True, capture_output=True, text=True)` |
| 40 | + — reuse that idiom for the `rev-parse` call. |
| 41 | + |
| 42 | +## 4. Known pitfalls |
| 43 | + |
| 44 | +- **`--branch <tag>` cannot take a raw SHA** on a shallow clone against GitHub by |
| 45 | + default. Keep the tag-based shallow fetch; make the **SHA a post-clone |
| 46 | + verification gate**, not the fetch ref. That is the integrity win: a moved/re-tagged |
| 47 | + tag now fails the build instead of silently changing canonical content. |
| 48 | +- Use the **dereferenced commit SHA** (peeled tag), not the annotated tag object's |
| 49 | + own SHA — `rev-parse HEAD` after checkout gives the commit; match that. |
| 50 | +- **Do not edit `SECURITY.md`** (forbidden). Draft the threat-model paragraph in |
| 51 | + the PR body + decision log below for a human to paste. |
| 52 | +- A full `build-index` clones over the network and takes minutes — do not gate the |
| 53 | + PR on it. The unit tests cover the config + verification logic offline. |
| 54 | +- Don't bump any tag to a newer CPython point release; pin the SHA of the |
| 55 | + **current** tag only. |
| 56 | + |
| 57 | +## 5. Decision log |
| 58 | + |
| 59 | +- Resolved SHAs (tag → 40-hex commit), one line each: |
| 60 | + - 3.10 / v3.10.20 → |
| 61 | + - 3.11 / v3.11.15 → |
| 62 | + - 3.12 / v3.12.13 → |
| 63 | + - 3.13 / v3.13.13 → |
| 64 | + - 3.14 / v3.14.4 → |
| 65 | +- Where/how the verification aborts on mismatch: |
| 66 | +- **Draft SECURITY.md threat-model paragraph (for human to paste):** |
| 67 | + > |
0 commit comments