Skip to content

Commit 6321ff4

Browse files
ayhammoudabluecloud-gilfoyle[bot]claude
authored
feat: cache retrieved docs and add PyPI package docs lookup (#9)
* feat: cache docs and expose PyPI package docs lookup * fix: bound PyPI metadata response reads * fix: address PR 9 review cache and PyPI errors * docs: align PyPI lookup scope * test: add PR 9 MCP smoke test plan * docs: add PR 9 smoke failure fix plan * fix: restore stdio transport stdout * fix: return retrievable slugs for symbol hits * fix: address PR #9 review feedback - package_docs: catch UnicodeDecodeError on non-UTF-8 PyPI responses so they surface as the controlled empty-result path instead of leaking a tool-level exception - ranker/SymbolHit: return None for page-only symbol anchors so get_docs() retrieves the whole page; SymbolHit.anchor is now str | None - persistent_cache: serialize execute()/commit()/stats updates with threading.Lock — per the Python sqlite3 docs, check_same_thread=False alone does not make a connection safe for concurrent writes; without serialization concurrent put() raised SystemError under load Minor follow-ups picked up along the way: _empty_result helper in package_docs, drop unused PackageDocsInput model, use storage.db.get_cache_dir/get_index_path helpers from server.py, import _NO_ANCHOR_KEY constant in smoke test. Tests: UTF-8 decode path, page-only anchor=None contract, concurrent put() across 20 threads. 261/261 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden cache constructor and tighten package-docs types - persistent_cache: move _fingerprint_index() inside the try-except so a missing index.db disables the cache cleanly instead of raising FileNotFoundError out of the constructor and crashing server startup (CodeRabbit Major) - models: tighten PackageDocsSource.kind and PackageDocsResult .trust_boundary to Literal types so invalid values fail validation at construction (CodeRabbit nitpick); add PackageKind alias and propagate it through package_docs._source signature, with a cast at the dynamic _ALLOWED-derived call site (runtime-safe — the allowlist enumeration matches the Literal exactly) Tests: cache disables gracefully on missing index, PackageDocsSource rejects unknown kind, PackageDocsResult rejects unknown trust_boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: bluecloud-gilfoyle[bot] <262642412+bluecloud-gilfoyle[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8f6736c commit 6321ff4

19 files changed

Lines changed: 2191 additions & 68 deletions

PR9_MCP_TEST_PLAN.md

Lines changed: 393 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,393 @@
1+
# PR #9 MCP Test Plan — Persistent `get_docs` Cache + `lookup_package_docs`
2+
3+
PR: https://github.com/ayhammouda/python-docs-mcp-server/pull/9
4+
Branch: `fix/open-issues-cache-pypi-docs`
5+
Purpose: validate the PR through actual MCP/tool-level behavior, not only unit tests.
6+
7+
## Goals
8+
9+
Confirm that:
10+
11+
1. Existing stdlib docs tools still work.
12+
2. `get_docs` returns correct content.
13+
3. Persistent cache is written and reused across server restarts.
14+
4. Cache identity is correct: version, slug, anchor, `max_chars`, `start_index`, and index fingerprint matter.
15+
5. Cache failure is best-effort and does not break retrieval.
16+
6. `lookup_package_docs` returns controlled PyPI-declared docs/homepage/source/repository links.
17+
7. PyPI error modes return controlled notes rather than internal errors.
18+
8. MCP annotations and tool count are coherent.
19+
20+
## Preconditions
21+
22+
```bash
23+
cd /srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review
24+
git checkout fix/open-issues-cache-pypi-docs || git checkout review-pr-9
25+
git pull --ff-only origin fix/open-issues-cache-pypi-docs
26+
uv sync --all-extras
27+
```
28+
29+
Baseline gates:
30+
31+
```bash
32+
uv run ruff check src/ tests/
33+
uv run pyright src/
34+
uv run pytest --tb=short -q
35+
uv build
36+
```
37+
38+
Expected:
39+
40+
- Ruff passes.
41+
- Pyright passes for `src/`.
42+
- Pytest passes: currently expected `254 passed, 3 skipped`.
43+
- Build succeeds.
44+
45+
## Test Area 1 — Tool Registration / MCP Contract
46+
47+
### 1.1 Verify five tools are registered
48+
49+
Run a small introspection script or use the MCP client harness if available.
50+
51+
Expected tools:
52+
53+
- `search_docs`
54+
- `get_docs`
55+
- `lookup_package_docs`
56+
- `list_versions`
57+
- `detect_python_version`
58+
59+
Expected annotations:
60+
61+
- stdlib tools: `readOnlyHint=True`, `openWorldHint=False`
62+
- `lookup_package_docs`: `readOnlyHint=True`, `openWorldHint=True`
63+
64+
Pass criteria:
65+
66+
- Exactly five tools are exposed.
67+
- `lookup_package_docs` is visibly open-world because it calls PyPI.
68+
69+
## Test Area 2 — `get_docs` Functional Retrieval
70+
71+
### 2.1 Full page retrieval
72+
73+
Call:
74+
75+
```text
76+
get_docs(slug="library/json.html", version="3.12", max_chars=1000, start_index=0)
77+
```
78+
79+
Expected:
80+
81+
- Result contains JSON documentation content.
82+
- `slug == "library/json.html"`
83+
- `version == "3.12"`
84+
- `anchor is null`
85+
- `char_count > 0`
86+
87+
### 2.2 Section retrieval
88+
89+
First use `search_docs` to find a valid section anchor for `json`, then call:
90+
91+
```text
92+
get_docs(slug="library/json.html", version="3.12", anchor=<valid_anchor>, max_chars=1000, start_index=0)
93+
```
94+
95+
Expected:
96+
97+
- Result is section-scoped.
98+
- `anchor == <valid_anchor>`
99+
- Content is not the full page.
100+
101+
### 2.3 Empty anchor remains invalid
102+
103+
Call:
104+
105+
```text
106+
get_docs(slug="library/json.html", version="3.12", anchor="", max_chars=1000, start_index=0)
107+
```
108+
109+
Expected:
110+
111+
- Controlled tool error / page-not-found style response.
112+
- It must **not** return a cached full-page response.
113+
114+
This specifically verifies the `anchor=None` vs `anchor=""` cache fix.
115+
116+
## Test Area 3 — Persistent Cache Behavior
117+
118+
Before running, locate cache path from platform cache dir. Expected filename:
119+
120+
```text
121+
retrieved-docs-cache.sqlite3
122+
```
123+
124+
Likely under:
125+
126+
```text
127+
~/.cache/mcp-python-docs/retrieved-docs-cache.sqlite3
128+
```
129+
130+
or the platform cache directory used by the app.
131+
132+
### 3.1 Cache file creation
133+
134+
1. Delete the cache file if present.
135+
2. Start the MCP server/client.
136+
3. Call `get_docs` for `library/json.html`.
137+
4. Stop the server.
138+
139+
Expected:
140+
141+
- Cache file exists.
142+
- SQLite table `retrieved_docs_cache` exists.
143+
- At least one row is present.
144+
145+
Suggested inspection:
146+
147+
```bash
148+
sqlite3 <cache-path>/retrieved-docs-cache.sqlite3 \
149+
"SELECT version, slug, anchor, max_chars, start_index, length(result_json) FROM retrieved_docs_cache;"
150+
```
151+
152+
### 3.2 Cache survives restart
153+
154+
1. Start server again.
155+
2. Call the same `get_docs` request.
156+
157+
Expected:
158+
159+
- Same response content.
160+
- No user-visible behavior change.
161+
- If logs expose cache hits/misses, second call should be a hit.
162+
163+
### 3.3 Cache key separates pagination/budget
164+
165+
Call:
166+
167+
```text
168+
get_docs(slug="library/json.html", version="3.12", max_chars=500, start_index=0)
169+
get_docs(slug="library/json.html", version="3.12", max_chars=1000, start_index=0)
170+
get_docs(slug="library/json.html", version="3.12", max_chars=500, start_index=100)
171+
```
172+
173+
Expected:
174+
175+
- Separate cache rows for each identity.
176+
- Results are not cross-contaminated.
177+
178+
### 3.4 Corrupt cache is best-effort
179+
180+
1. Stop server.
181+
2. Replace cache file with invalid bytes:
182+
183+
```bash
184+
printf 'not sqlite' > <cache-path>/retrieved-docs-cache.sqlite3
185+
```
186+
187+
3. Start server.
188+
4. Call `get_docs(slug="library/json.html", version="3.12")`.
189+
190+
Expected:
191+
192+
- Docs retrieval still succeeds.
193+
- Warning is logged about disabled/skipped persistent cache.
194+
- No internal server error.
195+
196+
## Test Area 4 — `lookup_package_docs` Happy Path
197+
198+
### 4.1 Known package with docs/source
199+
200+
Call:
201+
202+
```text
203+
lookup_package_docs(package="requests")
204+
```
205+
206+
Expected:
207+
208+
- `metadata_source == "https://pypi.org/pypi/requests/json"`
209+
- `trust_boundary == "pypi-declared-metadata"`
210+
- `package` is canonical from PyPI if available.
211+
- `version` is non-empty.
212+
- `sources` includes PyPI project URL and likely homepage/source/docs links.
213+
- Every source URL is `http://` or `https://`.
214+
- No web search / unofficial mirror fallback.
215+
216+
### 4.2 Normalization
217+
218+
Call:
219+
220+
```text
221+
lookup_package_docs(package="Sample_Project")
222+
```
223+
224+
Expected:
225+
226+
- Metadata source normalizes to:
227+
228+
```text
229+
https://pypi.org/pypi/sample-project/json
230+
```
231+
232+
- Returned package may be PyPI canonical name.
233+
234+
### 4.3 Missing package
235+
236+
Call:
237+
238+
```text
239+
lookup_package_docs(package="definitely-not-a-real-package-vision-test-xyz")
240+
```
241+
242+
Expected:
243+
244+
- `sources == []`
245+
- note contains package not found / PyPI 404 style message.
246+
- No internal error.
247+
248+
## Test Area 5 — PyPI Failure Handling
249+
250+
These may require monkeypatching/fake fetcher or temporary network blocking if not practical via live MCP.
251+
252+
### 5.1 Non-404 HTTP errors
253+
254+
Simulate PyPI `429` or `503`.
255+
256+
Expected:
257+
258+
- Controlled result:
259+
260+
```text
261+
sources=[]
262+
note="PyPI returned HTTP 429."
263+
```
264+
265+
or equivalent code.
266+
267+
### 5.2 Network/JSON failure
268+
269+
Simulate:
270+
271+
- `URLError`
272+
- timeout
273+
- invalid JSON body
274+
275+
Expected:
276+
277+
- Controlled result note:
278+
279+
```text
280+
Unable to retrieve PyPI metadata: <ErrorType>.
281+
```
282+
283+
- No internal server error.
284+
285+
### 5.3 Oversized PyPI JSON body
286+
287+
Simulate a response larger than 5 MiB.
288+
289+
Expected:
290+
291+
- The service reads at most `5 MiB + 1 byte`.
292+
- Controlled result:
293+
294+
```text
295+
sources=[]
296+
note="PyPI metadata exceeded size limit."
297+
```
298+
299+
## Test Area 6 — Scope / Trust Boundary
300+
301+
Use a package or fake response with broad `project_urls`, e.g. labels:
302+
303+
- `Documentation`
304+
- `Homepage`
305+
- `Source`
306+
- `Repository`
307+
- `Issues`
308+
- `Changelog`
309+
- `Community mirror`
310+
- `Tutorial`
311+
312+
Expected:
313+
314+
Included:
315+
316+
- Documentation
317+
- Homepage
318+
- Source
319+
- Repository
320+
321+
Excluded/skipped:
322+
323+
- Issues
324+
- Changelog
325+
- Community mirror
326+
- Tutorial
327+
328+
Result note should mention ignored labels outside controlled allowlist.
329+
330+
## Suggested Manual MCP Smoke Script
331+
332+
If direct MCP client execution is awkward, use a minimal Python script that imports the services and mimics the tool layer:
333+
334+
```python
335+
from mcp_server_python_docs.services.package_docs import PackageDocsService
336+
337+
for pkg in ["requests", "Sample_Project", "definitely-not-a-real-package-vision-test-xyz"]:
338+
print(PackageDocsService().lookup(pkg).model_dump())
339+
```
340+
341+
For `get_docs`, prefer actual MCP invocation or server lifespan because cache path wiring happens there.
342+
343+
## Pass / Fail Summary Template
344+
345+
Return results in this format:
346+
347+
```markdown
348+
## PR #9 MCP Test Results
349+
350+
### Environment
351+
- Commit:
352+
- OS:
353+
- Python:
354+
- Cache path:
355+
356+
### Gates
357+
- ruff:
358+
- pyright src:
359+
- pytest:
360+
- uv build:
361+
362+
### MCP Tool Tests
363+
- Tool registration:
364+
- get_docs full page:
365+
- get_docs section:
366+
- empty anchor behavior:
367+
- cache file creation:
368+
- cache survives restart:
369+
- cache key separation:
370+
- corrupt cache fallback:
371+
- lookup_package_docs requests:
372+
- missing package:
373+
- failure simulation:
374+
- scope/trust boundary:
375+
376+
### Verdict
377+
PASS / FAIL
378+
379+
### Notes / Bugs Found
380+
- ...
381+
```
382+
383+
## Final Acceptance Criteria
384+
385+
The PR is considered MCP-smoke-test ready if:
386+
387+
- Local gates pass.
388+
- Live MCP/tool invocation returns correct stdlib docs.
389+
- Cache file is created and reused after restart.
390+
- Corrupt cache does not break docs retrieval.
391+
- PyPI lookup returns only controlled PyPI-declared metadata.
392+
- PyPI expected failures return controlled notes.
393+
- No internal errors are observed for expected failure modes.

0 commit comments

Comments
 (0)