Skip to content

Commit cf7a956

Browse files
ayhammoudaclaude
andcommitted
docs(phase-09): complete phase execution — compare_versions verified (5/5)
Phase goal achieved: compare_versions MCP tool ships and is verified functional against a production-shaped index (6 tools registered). Verifier ran the full gate independently: ruff clean, pyright 0 errors, 284 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent a115d38 commit cf7a956

2 files changed

Lines changed: 122 additions & 2 deletions

File tree

.planning/ROADMAP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@
88

99
## Backlog (post-v0.1.6)
1010

11-
Triaged 2026-05-26 — all three phases remain in scope. Phase 09 is up next.
11+
Triaged 2026-05-26 — all three phases remain in scope. Phase 10 is up next.
1212

1313
| Phase | Title | Tool surface | Issue | Status |
1414
|-------|-------|--------------|-------|--------|
15-
| 09 | `compare_versions(symbol, v1, v2)` | New MCP tool | [#32](https://github.com/ayhammouda/python-docs-mcp-server/issues/32) | **All plans done** — 5/5, verifying |
15+
| 09 | `compare_versions(symbol, v1, v2)` | New MCP tool | [#32](https://github.com/ayhammouda/python-docs-mcp-server/issues/32) | **✓ Complete** — 5/5 plans, verified 2026-05-29 |
1616
| 10 | `whatsnew_for_version(version)` | New MCP tool | [#33](https://github.com/ayhammouda/python-docs-mcp-server/issues/33) | Backlog |
1717
| 11 | `detect_python_version` v2 (venv-aware) | Existing tool enhancement | [#34](https://github.com/ayhammouda/python-docs-mcp-server/issues/34) | Backlog |
1818

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
phase: 09-compare-versions
3+
verified: 2026-05-29T00:10:00Z
4+
status: passed
5+
score: 5/5 success criteria + all PLAN must_haves verified
6+
overrides_applied: 0
7+
re_verification: # not applicable — initial verification
8+
previous_status: none
9+
gaps: []
10+
---
11+
12+
# Phase 09: compare_versions(symbol, v1, v2) Verification Report
13+
14+
**Phase Goal:** Ship the `compare_versions(symbol, v1, v2)` MCP tool — a new tool that diffs a Python stdlib symbol's documentation between two versions (GitHub issue #32).
15+
**Verified:** 2026-05-29T00:10:00Z
16+
**Status:** passed
17+
**Re-verification:** No — initial verification
18+
19+
## Goal Achievement
20+
21+
### Observable Truths (CONTEXT.md success criteria — the contract)
22+
23+
| # | Truth | Status | Evidence |
24+
| --- | --- | --- | --- |
25+
| 1 | `compare_versions("asyncio.TaskGroup", "3.10", "3.11")` returns a clear diff showing the symbol was newly introduced in 3.11 (`added`/`new_in` marker) | ✓ VERIFIED | `compare.py:202-213` added branch sets `change="added"` + extracts `new_in`. Test `test_compare_added_in_v2` asserts `change=="added"` and `new_in=="3.11"` — passing. |
26+
| 2 | Comparing identical versions returns an empty diff with explicit "no change" marker | ✓ VERIFIED | `compare.py:196-199` returns `change="unchanged"`. Test `test_compare_identical_versions` asserts unchanged + all delta fields None/empty — passing. |
27+
| 3 | Missing-version cases return an actionable error with the indexed-version list | ✓ VERIFIED | `compare.py:180-181` calls `validate_version` first; `version_resolution.py:29` raises `VersionNotFoundError` with available list (MVER-03). Test `test_compare_unknown_version_raises_with_indexed_list` asserts message names `3.99`, `3.10`, `3.11` — passing. |
28+
| 4 | Token cost of a typical diff is under 300 tokens | ✓ VERIFIED | `models.py` emits only non-None delta fields; `_SECTION_DIFF_MAX_CHARS=600`. Test `test_compare_diff_is_token_frugal` asserts `~tokens < 300` and serialized < 1200 bytes — passing (smoke check per L1). |
29+
| 5 | Integration test exercises 3 representative symbols (changed, unchanged, missing) | ✓ VERIFIED | `tests/test_compare_versions.py` covers added/removed/changed/unchanged + both error paths + heuristics + M2 fallback + CR-01/WR-01/WR-02 regressions (21 tests, all passing). |
30+
31+
**Score:** 5/5 success criteria verified.
32+
33+
### CMPR acceptance criteria (CONTEXT.md §Requirements)
34+
35+
| Criterion | Status | Evidence |
36+
| --- | --- | --- |
37+
| CMPR-01: accepts `(symbol, v1, v2)`, returns structured diff (signature, docstring delta, deprecation flag, see-also +/-) | ✓ VERIFIED | `CompareService.compare` returns `CompareVersionsResult` with `signature_delta`, `section_diff`, `deprecated_in`, `see_also_added/removed`. Each has dedicated passing tests. |
38+
| CMPR-02: both versions must be indexed; else actionable error pointing at build-index | ✓ VERIFIED | `validate_version` raises `VersionNotFoundError` listing indexed versions before any symbol work. |
39+
| CMPR-03: JSON-serializable, token-frugal — no full re-printing of unchanged paragraphs | ✓ VERIFIED | Pydantic model `model_dump()`; section_diff truncated to 600 chars with line-boundary marker (WR-02); unchanged-case returns no diff body. |
40+
41+
### Required Artifacts
42+
43+
| Artifact | Expected | Status | Details |
44+
| --- | --- | --- | --- |
45+
| `src/mcp_server_python_docs/services/compare.py` | CompareService + compare() | ✓ VERIFIED | 312 lines; 4 branches, fixed H2 ordering, CR-01/WR-01/WR-02 fixes present. Wired into app_context + server. |
46+
| `src/mcp_server_python_docs/models.py` | CompareVersionsResult model | ✓ VERIFIED | `class CompareVersionsResult(BaseModel)` with all delta fields + `note`, every field has `Field(description=...)`. `signature_delta` (not `signature_change`), advisory description present. |
47+
| `tests/test_compare_versions.py` | Behavioral tests, production-shaped fixture | ✓ VERIFIED | 21 tests passing. Fixture stores EXTENSIONLESS `documents.slug` (`doc_slug = slug[:-5]`, line 145) matching ingestion ground truth — false-confidence masking is resolved. |
48+
| `src/mcp_server_python_docs/app_context.py` | `compare_service` field | ✓ VERIFIED | `compare_service: CompareService` (line 29). |
49+
| `src/mcp_server_python_docs/server.py` | `compare_versions` tool + lifespan wiring | ✓ VERIFIED | `CompareService(db, content_svc)` (line 163), passed to AppContext (line 189), `@mcp.tool def compare_versions` (line 393) delegating to `app_ctx.compare_service.compare`. |
50+
| `tests/test_services.py` | 6-tool registration | ✓ VERIFIED | `test_six_tools_registered` asserts `len(tools) == 6` (line 450/455); schema test inspects `compare_versions` params (line 480-485). |
51+
| `README.md` | six-tool list + compare_versions row | ✓ VERIFIED | "six read-only MCP tools" (line 46), "exposes six MCP tools" (line 180), `compare_versions` table row (line 189). |
52+
| `.github/INTEGRATION-TEST.md` | all six tools + compare_versions call step | ✓ VERIFIED | Test 1 checklist lists all six tools incl `lookup_package_docs` (L2 gap closed) and a `compare_versions` call step (line 53). |
53+
| `09-01-data-shape-spike-SUMMARY.md` | Locked regex patterns | ✓ VERIFIED | `## Locked regex patterns` section; regexes match compare.py verbatim. |
54+
55+
### Key Link Verification
56+
57+
| From | To | Via | Status | Details |
58+
| --- | --- | --- | --- | --- |
59+
| compare.py | cache.create_symbol_cache | import + `self._resolve(symbol, version)` | ✓ WIRED | `create_symbol_cache` returns `Callable[[str,str], CachedSymbol|None]`; `CachedSymbol` has `.uri`/`.anchor`. |
60+
| compare.py | content.ContentService.get_docs | `self._content.get_docs(slug, version, anchor)` | ✓ WIRED | Called in `_section_text`; CR-01 candidate logic present. |
61+
| compare.py | version_resolution.validate_version | import + call | ✓ WIRED | Called first in compare(); raises VersionNotFoundError from its own module. |
62+
| compare.py | observability.log_tool_call | `@log_tool_call("compare_versions")` | ✓ WIRED | Decorator present on `compare` (line 168). |
63+
| compare.py | VersionNotFoundError | (must NOT import — H4) | ✓ VERIFIED | grep confirms NOT imported; ruff F401 clean. |
64+
| server.app_lifespan | CompareService | `CompareService(db, content_svc)` | ✓ WIRED | Line 163. |
65+
| server.create_server | AppContext.compare_service | `app_ctx.compare_service.compare(...)` | ✓ WIRED | Line 407. |
66+
| models.py | server.py | batch import incl CompareVersionsResult | ✓ WIRED | Line 30-37. |
67+
68+
### Data-Flow Trace (Level 4 — CR-01 focus)
69+
70+
| Concern | Finding | Status |
71+
| --- | --- | --- |
72+
| `documents.slug` shape (ingestion ground truth) | `sphinx_json.py:429` `slug = current_page_name` (EXTENSIONLESS); `:439` `doc_uri = f"{current_page_name}.html"`. Symbol URIs carry `.html`. | ✓ confirmed |
73+
| Test fixture slug shape | `test_compare_versions.py:145` `doc_slug = slug[:-5] if slug.endswith(".html")` → stores EXTENSIONLESS slug, mirroring production. Old `.html`-in-both shape that masked CR-01 is gone. | ✓ FLOWING |
74+
| compare `_section_text` resolution | `compare.py:153` tries `(page[:-5], page)` extensionless-first, mirroring `ranker._document_candidates`. Resolves real-index symbols instead of falling into M2 fallback. | ✓ FLOWING |
75+
| CR-01 regression guard | `test_section_text_resolves_extensionless_slug_cr01` resolves `library/json.html#json.dumps` against extensionless slug AND asserts end-to-end `some.old_func` yields `deprecated_in="3.11"`, `note is None` (real metadata, not page-unavailable note). Passing. | ✓ FLOWING |
76+
77+
### Behavioral Spot-Checks
78+
79+
| Behavior | Command | Result | Status |
80+
| --- | --- | --- | --- |
81+
| Runtime tool enumeration = 6 incl compare_versions | `create_server(); asyncio.run(mcp.list_tools())` | count=6, names include `compare_versions` | ✓ PASS |
82+
| Compare suite | `pytest tests/test_compare_versions.py tests/test_services.py::TestToolRegistration` | 21 passed | ✓ PASS |
83+
| Deprecation regex version-agnostic | `_extract_deprecated_in("...3.12...")` | returns `3.12` | ✓ PASS |
84+
| Full gate — ruff | `uv run ruff check src/ tests/` | All checks passed | ✓ PASS |
85+
| Full gate — pyright | `uv run pyright src/` | 0 errors, 0 warnings | ✓ PASS |
86+
| Full gate — pytest | `uv run pytest --tb=short -q` | 284 passed in 8.13s | ✓ PASS |
87+
88+
### Code Review Resolution Cross-Check (09-REVIEW.md, commit 0e16b34)
89+
90+
| Finding | Claimed Fix | Codebase Evidence | Status |
91+
| --- | --- | --- | --- |
92+
| CR-01 (blocker): slug-derivation mismatch | extensionless-first candidates + production-shaped fixture | `compare.py:153` + fixture `:145` + regression test | ✓ FIXED |
93+
| WR-01: see-also over-capture | blank-line-bounded window | `compare.py:97-108` + `test_extract_see_also_excludes_unrelated_body_links_wr01` | ✓ FIXED |
94+
| WR-02: mid-line diff truncation | line-boundary truncation + marker | `compare.py:267-284` + `test_section_diff_truncates_on_line_boundary_wr02` | ✓ FIXED |
95+
| WR-03/WR-04 | accepted by-design (2-version semantics; advisory naming) | model descriptions align | ✓ ACCEPTED |
96+
| IN-03: stale `test_create_server_has_three_tools` name | cosmetic, not fixed | name still stale at `test_services.py:411` (subset check, still correct) | ℹ️ INFO |
97+
98+
### Anti-Patterns Found
99+
100+
| File | Line | Pattern | Severity | Impact |
101+
| --- | --- | --- | --- | --- |
102+
| (none) || No TBD/FIXME/XXX/HACK/PLACEHOLDER/stub in compare.py, models.py, or test file |||
103+
| test_services.py | 411 | Stale test name `test_create_server_has_three_tools` (asserts 4-tool subset) | ℹ️ Info | Cosmetic only (IN-03, accepted in review); assertions correct, six-tool count covered by sibling test. Not a blocker. |
104+
105+
### Issue #32 Hygiene
106+
107+
`git log --oneline origin/main..HEAD | grep -iE "Closes|Fixes|Resolves" | grep "#32"` → ZERO matches. Closing keyword reserved for PR body per #35 retrospective. ✓ Clean.
108+
109+
### Human Verification Required
110+
111+
None. All success criteria are programmatically verified via the seeded fixture and runtime tool enumeration. Optional: live-index end-to-end MCP Inspector run per INTEGRATION-TEST.md Test 1 — covered by the documented runbook and not required for goal achievement (the CR-01 fix + production-shaped fixture close the previously-masked real-index gap).
112+
113+
### Gaps Summary
114+
115+
No gaps. The phase goal is achieved: a functional `compare_versions(symbol, v1, v2)` MCP tool is registered as the sixth tool, wired through AppContext and the FastMCP lifespan, returning a JSON-serializable token-frugal `CompareVersionsResult` across all four diff branches. The previously-identified blocker (CR-01 slug mismatch that degraded the tool to presence-only on real indexes) is fixed and guarded by a regression test using a production-shaped extensionless-slug fixture — verified against ingestion ground truth in `sphinx_json.py`. All canonical gates pass (ruff clean, pyright 0 errors, 284 tests).
116+
117+
---
118+
119+
_Verified: 2026-05-29T00:10:00Z_
120+
_Verifier: Claude (gsd-verifier)_

0 commit comments

Comments
 (0)