Skip to content

Commit 68c24e1

Browse files
earayuclaude
andauthored
feat(phase9 #96 D10.d): cursor placeholder + kw-only sentinel on split search tools (#1717)
* feat(phase9 #96 D10.d): cursor placeholder + kw-only sentinel on split search tools D10.d task #96 same-lane follow-up per `[D10 spec amendment]` thread (msg=b9b7072a) PM canonical decisions: - Drift #5: add `*,` kw-only barrier after `query` on the 3 collection-scoped split tools (`vector_search` / `graph_search` / `fulltext_search`), aligning with the D10.c read-primitive precedent established in #1714. `web_search` keeps its existing wire signature per §B.4 / amendment (parameter canonicalization deferred to D10.h cutover). - Drift #4 (c): publish `cursor: str | None = None` placeholder on the same 3 tools so external MCP clients see the canonical surface, but the body explicitly fails with `CursorError("cursor_invalid", "search pagination cursor is not yet implemented", details={"reason": "search_not_paginated", "tool": ...})` on any non-null value. `cursor=None` continues to mean "first page" (current behavior). Real search pagination requires a backend capability that will land in a dedicated D11+ upgrade. `fulltext_search` also gains `rerank: bool = True` to match the §B.3 spec; previously the rerank flag was reachable only via the omnibus `search_collection`. Drifts #1 (`SearchResultItem` with chunk_id) and #2 (`web_search` canonicalization) intentionally NOT in this PR — both deferred to the D10.h cutover lane per the amendment thread. Drift #3 (capability annotations) belongs to D10.f (task #98). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(phase9 #96 D10.d): cursor=="" → first page, NotImplementedError on truthy cursor Per Weston blocker review (msg=177a1dd8) on PR #1717 + architect sign-off (msg=ebfcdabe): 1. **Empty-string cursor must equal None** — both should preserve the existing single-page `top_k` behavior. The previous `is not None` guard incorrectly raised on `cursor=""`. Switched to `if cursor:` (truthiness check) so the loud-fail only triggers on truly non-empty cursor values. 2. **Stop pretending feature-not-implemented is a malformed cursor** — the canonical `CursorError("cursor_invalid", ...)` describes wire-level malformed cursors. Using it for "search pagination is not implemented" camouflages a missing capability as a client-side cursor bug. Switched to plain `NotImplementedError("search pagination is not yet implemented (tool=..., reason= search_not_paginated)")` so the loud-fail accurately surfaces the deferred-capability semantic. Test updates: - `cursor=""` removed from the bad-cursor parametrize (it is no longer a bad cursor) - New `test_collection_scoped_split_tools_treat_none_and_empty_cursor_as_first_page` monkeypatches httpx + get_api_key and asserts that both `None` and `""` cursor pass through the guard and reach the backend search call - Loud-fail test renamed to `_reject_non_empty_cursor` and asserts `NotImplementedError` (not `CursorError`) with a clear "not implemented" message including the originating tool and reason Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 0ed49eb commit 68c24e1

4 files changed

Lines changed: 241 additions & 21 deletions

File tree

aperag/mcp/tools/search_fulltext.py

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,14 @@
2121
2222
Wire shape: returns the existing ``SearchResult`` dict shape with
2323
``recall_type = "fulltext_search"`` for produced items. §B canonical
24-
shape follow-up gated on the chunk_id propagation amendment thread —
25-
see ``search_vector.py`` module docstring.
24+
shape follow-up settled by the ``[D10 spec amendment]`` thread
25+
(msg=b9b7072a) — defer canonical SearchResultItem to D10.h cutover.
26+
27+
The ``cursor`` parameter is a placeholder per the same thread (Drift
28+
#4 (c)): signature lands now, body raises ``NotImplementedError``
29+
on any non-empty value until real search pagination ships.
30+
``None`` and ``""`` both preserve single-page ``top_k`` behavior per
31+
the Weston blocker review (msg=177a1dd8).
2632
"""
2733

2834
from __future__ import annotations
@@ -42,8 +48,11 @@
4248
async def fulltext_search(
4349
collection_id: str,
4450
query: str,
51+
*,
4552
top_k: int = 5,
4653
keywords: list[str] | None = None,
54+
rerank: bool = True,
55+
cursor: str | None = None,
4756
) -> Dict[str, Any]:
4857
"""Full-text keyword search within a collection (§B.3).
4958
@@ -76,12 +85,23 @@ async def fulltext_search(
7685
top_k: Maximum number of results to return (default: 5).
7786
keywords: Optional explicit keyword list overriding the
7887
auto-extracted keywords from the query.
88+
rerank: Whether to apply reranker on returned candidates
89+
(default: True).
90+
cursor: Pagination cursor placeholder (§B.3 / amendment
91+
msg=b9b7072a Drift #4 (c)). ``None`` and ``""`` return
92+
first page; any non-empty value raises
93+
``NotImplementedError`` with a clear "not implemented"
94+
message until real search pagination ships.
7995
8096
Returns:
8197
Search results with ``items`` carrying ``recall_type =
8298
"fulltext_search"``. Highlight snippets / matched terms are
8399
surfaced via ``items[*].metadata``.
84100
"""
101+
if cursor:
102+
raise NotImplementedError(
103+
"search pagination is not yet implemented (tool=fulltext_search, reason=search_not_paginated)"
104+
)
85105
try:
86106
api_key = get_api_key()
87107

@@ -91,6 +111,7 @@ async def fulltext_search(
91111

92112
search_data: Dict[str, Any] = {
93113
"query": query,
114+
"rerank": rerank,
94115
"fulltext_search": fulltext_payload,
95116
}
96117

aperag/mcp/tools/search_graph.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,15 @@
2020
until the D10.h cutover.
2121
2222
Wire shape: returns the existing ``SearchResult`` dict shape with
23-
``recall_type = "graph_search"`` for produced items. §B canonical shape
24-
follow-up gated on the chunk_id propagation amendment thread — see
25-
``search_vector.py`` module docstring.
23+
``recall_type = "graph_search"`` for produced items. §B canonical
24+
shape follow-up settled by the ``[D10 spec amendment]`` thread
25+
(msg=b9b7072a) — defer canonical SearchResultItem to D10.h cutover.
26+
27+
The ``cursor`` parameter is a placeholder per the same thread (Drift
28+
#4 (c)): signature lands now, body raises ``NotImplementedError``
29+
on any non-empty value until real search pagination ships.
30+
``None`` and ``""`` both preserve single-page ``top_k`` behavior per
31+
the Weston blocker review (msg=177a1dd8).
2632
"""
2733

2834
from __future__ import annotations
@@ -42,7 +48,9 @@
4248
async def graph_search(
4349
collection_id: str,
4450
query: str,
51+
*,
4552
top_k: int = 5,
53+
cursor: str | None = None,
4654
) -> Dict[str, Any]:
4755
"""Knowledge-graph search within a collection (§B.2).
4856
@@ -74,12 +82,21 @@ async def graph_search(
7482
collection_id: The ID of the collection to search.
7583
query: The natural-language search query.
7684
top_k: Maximum number of results to return (default: 5).
85+
cursor: Pagination cursor placeholder (§B.2 / amendment
86+
msg=b9b7072a Drift #4 (c)). ``None`` and ``""`` return
87+
first page; any non-empty value raises
88+
``NotImplementedError`` with a clear "not implemented"
89+
message until real search pagination ships.
7790
7891
Returns:
7992
Search results with ``items`` carrying ``recall_type =
8093
"graph_search"``. Graph-specific fields (entity / relation / path)
8194
are surfaced via ``items[*].metadata``.
8295
"""
96+
if cursor:
97+
raise NotImplementedError(
98+
"search pagination is not yet implemented (tool=graph_search, reason=search_not_paginated)"
99+
)
83100
try:
84101
api_key = get_api_key()
85102

aperag/mcp/tools/search_vector.py

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,24 @@
2323
``aperag/api/v2/collections/{id}/searches`` backend (``recall_type =
2424
"vector_search"`` for produced items). The §B canonical
2525
``SearchResult`` / ``SearchResultItem`` with ``chunk_id`` /
26-
``section_path`` / ``heading_anchor`` is deferred to a D10.d follow-up
27-
PR after the chunk_id propagation question is resolved via a
28-
``[D10 spec amendment]`` thread (current backend does not surface
29-
``chunk_id`` in the public response shape — see PR description).
26+
``section_path`` / ``heading_anchor`` exposure was settled by the
27+
``[D10 spec amendment]`` thread (msg=b9b7072a) — Drift #1 resolution
28+
(c): defer the canonical shape to the D10.h cutover lane (one-shot
29+
``aperag/domains/retrieval/`` upgrade) so this lane stays focused on
30+
the split surface + omnibus deprecation.
31+
32+
The ``cursor`` parameter is a placeholder per the same amendment
33+
thread (Drift #4 resolution (c)): the signature is published now so
34+
external MCP clients see the canonical shape, but the body explicitly
35+
raises ``NotImplementedError("search pagination is not yet
36+
implemented")`` on any **non-empty** value. The architect sign-off
37+
(msg=ebfcdabe) plus Weston's blocker review (msg=177a1dd8) explicitly
38+
forbid reusing the canonical ``CursorError("cursor_invalid", ...)``
39+
malformed-wire semantic for this "feature not implemented" case —
40+
that would camouflage missing capability as a client-side cursor bug.
41+
``cursor=None`` and ``cursor=""`` both keep the existing single-page
42+
``top_k`` behavior (Weston msg=177a1dd8 lock); only truthy /
43+
non-empty cursor values trigger the loud-fail.
3044
"""
3145

3246
from __future__ import annotations
@@ -46,9 +60,11 @@
4660
async def vector_search(
4761
collection_id: str,
4862
query: str,
63+
*,
4964
top_k: int = 5,
5065
similarity_threshold: float | None = None,
5166
rerank: bool = True,
67+
cursor: str | None = None,
5268
) -> Dict[str, Any]:
5369
"""Vector similarity search within a collection (§B.1).
5470
@@ -83,11 +99,26 @@ async def vector_search(
8399
similarity_threshold: Minimum similarity score [0, 1]; ``None``
84100
uses the collection's default threshold.
85101
rerank: Whether to apply reranker on returned candidates (default: True).
102+
cursor: Pagination cursor placeholder (§B.1). ``None`` and
103+
``""`` both return the first page (current behavior); any
104+
non-empty value raises ``NotImplementedError`` with a
105+
``"search pagination is not yet implemented"`` message per
106+
the ``[D10 spec amendment]`` (msg=b9b7072a + sign-off
107+
msg=ebfcdabe + Weston msg=177a1dd8). Real search
108+
pagination requires a backend capability that is not yet
109+
available and will land in a dedicated D11+ upgrade.
86110
87111
Returns:
88112
Search results with ``items`` ranked by vector similarity. Each
89113
item carries ``recall_type = "vector_search"``.
90114
"""
115+
if cursor:
116+
# ``cursor`` is truthy iff non-None and non-empty (Weston
117+
# msg=177a1dd8 lock: ``None`` and ``""`` both preserve
118+
# single-page ``top_k`` behavior).
119+
raise NotImplementedError(
120+
"search pagination is not yet implemented (tool=vector_search, reason=search_not_paginated)"
121+
)
91122
try:
92123
api_key = get_api_key()
93124

tests/unit_test/mcp/test_search_split.py

Lines changed: 163 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@
3939
import inspect
4040
from pathlib import Path
4141

42+
import pytest
43+
4244
# Importing ``aperag.mcp.server`` triggers the bottom-of-module
4345
# registration block which loads ``aperag.mcp.tools.search_*`` and
4446
# triggers ``@mcp_server.tool`` decorators. The tool functions are
@@ -88,41 +90,68 @@ def _params(func) -> dict[str, inspect.Parameter]:
8890
return dict(inspect.signature(func).parameters)
8991

9092

93+
def _kw_only_params(func) -> set[str]:
94+
return {
95+
name
96+
for name, param in inspect.signature(func).parameters.items()
97+
if param.kind is inspect.Parameter.KEYWORD_ONLY
98+
}
99+
100+
91101
def test_vector_search_signature_matches_b1_lock():
92-
"""``vector_search`` signature follows §B.1: collection_id + query
93-
positional, then keyword-tunable top_k / similarity_threshold /
94-
rerank.
102+
"""``vector_search`` signature follows §B.1: ``collection_id`` +
103+
``query`` positional, then a ``*,`` kw-only barrier with
104+
``top_k`` / ``similarity_threshold`` / ``rerank`` / ``cursor``.
105+
The kw-only enforcement aligns with the D10.c precedent (per
106+
`[D10 spec amendment]` msg=b9b7072a Drift #5 resolution).
95107
"""
96108
params = _params(vector_search)
97109
assert list(params)[:2] == ["collection_id", "query"]
98-
for name in ("top_k", "similarity_threshold", "rerank"):
99-
assert name in params, f"§B.1 requires `{name}` parameter"
110+
assert _kw_only_params(vector_search) == {
111+
"top_k",
112+
"similarity_threshold",
113+
"rerank",
114+
"cursor",
115+
}, "§B.1 requires kw-only `top_k / similarity_threshold / rerank / cursor`."
100116
assert params["top_k"].default == 5
101117
# similarity_threshold defaults to None to mean "use collection default".
102118
assert params["similarity_threshold"].default is None
103119
assert params["rerank"].default is True
120+
# cursor placeholder per amendment msg=b9b7072a Drift #4 (c).
121+
assert params["cursor"].default is None
104122

105123

106124
def test_graph_search_signature_matches_b2_lock():
107-
"""``graph_search`` signature follows §B.2: collection_id + query +
108-
top_k. The §B.2 spec mentions ``depth`` / ``entity_types`` as
125+
"""``graph_search`` signature follows §B.2 + amendment msg=b9b7072a:
126+
collection_id + query positional then kw-only top_k + cursor.
127+
The §B.2 spec mentions ``depth`` / ``entity_types`` as
109128
forward-looking knobs; first-cut keeps them out until the backend
110129
surface for graph traversal is wired (deferred to D10.d follow-up).
111130
"""
112131
params = _params(graph_search)
113132
assert list(params)[:2] == ["collection_id", "query"]
114-
assert "top_k" in params
133+
assert _kw_only_params(graph_search) == {"top_k", "cursor"}, "§B.2 requires kw-only `top_k / cursor`."
115134
assert params["top_k"].default == 5
135+
assert params["cursor"].default is None
116136

117137

118138
def test_fulltext_search_signature_matches_b3_lock():
119-
"""``fulltext_search`` signature follows §B.3: collection_id +
120-
query + top_k + optional keyword override.
139+
"""``fulltext_search`` signature follows §B.3 + amendment
140+
msg=b9b7072a: collection_id + query positional then kw-only
141+
top_k + keywords + rerank + cursor.
121142
"""
122143
params = _params(fulltext_search)
123144
assert list(params)[:2] == ["collection_id", "query"]
124-
assert "top_k" in params and params["top_k"].default == 5
125-
assert "keywords" in params and params["keywords"].default is None
145+
assert _kw_only_params(fulltext_search) == {
146+
"top_k",
147+
"keywords",
148+
"rerank",
149+
"cursor",
150+
}, "§B.3 requires kw-only `top_k / keywords / rerank / cursor`."
151+
assert params["top_k"].default == 5
152+
assert params["keywords"].default is None
153+
assert params["rerank"].default is True
154+
assert params["cursor"].default is None
126155

127156

128157
def test_web_search_signature_preserves_existing_wire_for_b4():
@@ -139,6 +168,128 @@ def test_web_search_signature_preserves_existing_wire_for_b4():
139168
assert "source" in params and params["source"].default == ""
140169

141170

171+
# --- 2b. Cursor placeholder explicit-not-silent (§B / amendment Drift #4) ---
172+
173+
174+
@pytest.mark.asyncio
175+
@pytest.mark.parametrize(
176+
"tool",
177+
[vector_search, graph_search, fulltext_search],
178+
ids=["vector_search", "graph_search", "fulltext_search"],
179+
)
180+
@pytest.mark.parametrize(
181+
"bad_cursor",
182+
["garbage", "AAAA", "{}", "0"],
183+
ids=["garbage", "base64_no_payload", "empty_json", "single_zero"],
184+
)
185+
async def test_collection_scoped_split_tools_reject_non_empty_cursor(tool, bad_cursor):
186+
"""Per `[D10 spec amendment]` msg=b9b7072a Drift #4 (c) +
187+
architect sign-off msg=ebfcdabe + Weston blocker msg=177a1dd8:
188+
any non-empty cursor must loud-fail with ``NotImplementedError``
189+
bearing a "search pagination is not yet implemented" message.
190+
191+
The sign-off explicitly forbids reusing
192+
``CursorError("cursor_invalid", ...)`` for this case — the canonical
193+
cursor codes describe wire-level malformed / expired / drifted
194+
cursors, and the "feature not implemented yet" semantic must not
195+
be camouflaged as a malformed-cursor error.
196+
197+
``cursor=None`` and ``cursor=""`` are covered by the
198+
happy-path-passthrough test below; this case only exercises the
199+
truthy-cursor loud-fail path.
200+
"""
201+
202+
with pytest.raises(NotImplementedError) as excinfo:
203+
await tool("collection-id", "query", cursor=bad_cursor)
204+
205+
message = str(excinfo.value)
206+
assert "search pagination is not yet implemented" in message, (
207+
f"{tool.__name__} loud-fail message must clearly state that "
208+
f"search pagination is not implemented (got {message!r})."
209+
)
210+
assert tool.__name__ in message, (
211+
"Loud-fail message must identify the originating tool so logs can attribute the rejection."
212+
)
213+
assert "search_not_paginated" in message, (
214+
"Loud-fail message must surface the deferred-pagination reason so clients understand they should not retry."
215+
)
216+
217+
218+
@pytest.mark.asyncio
219+
@pytest.mark.parametrize(
220+
"tool",
221+
[vector_search, graph_search, fulltext_search],
222+
ids=["vector_search", "graph_search", "fulltext_search"],
223+
)
224+
@pytest.mark.parametrize(
225+
"noop_cursor",
226+
[None, ""],
227+
ids=["none", "empty_string"],
228+
)
229+
async def test_collection_scoped_split_tools_treat_none_and_empty_cursor_as_first_page(tool, noop_cursor, monkeypatch):
230+
"""Weston blocker msg=177a1dd8 lock: ``None`` *and* ``""`` cursor
231+
must both preserve the existing single-page ``top_k`` behavior
232+
(``""`` is **not** a malformed cursor; it is the canonical
233+
"no pagination requested" wire value).
234+
235+
We stub ``get_api_key()`` and the outbound httpx call so the test
236+
only validates that the cursor guard does **not** raise — it does
237+
not exercise the backend search path itself.
238+
"""
239+
240+
import aperag.mcp.tools.search_fulltext as sft
241+
import aperag.mcp.tools.search_graph as sg
242+
import aperag.mcp.tools.search_vector as sv
243+
244+
monkeypatch.setattr(sv, "get_api_key", lambda: "test-token")
245+
monkeypatch.setattr(sg, "get_api_key", lambda: "test-token")
246+
monkeypatch.setattr(sft, "get_api_key", lambda: "test-token")
247+
248+
class _FakeResponse:
249+
status_code = 200
250+
251+
def json(self):
252+
return {"items": []}
253+
254+
class _FakeAsyncClient:
255+
def __init__(self, *args, **kwargs):
256+
pass
257+
258+
async def __aenter__(self):
259+
return self
260+
261+
async def __aexit__(self, exc_type, exc, tb):
262+
return False
263+
264+
async def post(self, url, headers=None, json=None):
265+
return _FakeResponse()
266+
267+
import httpx
268+
269+
monkeypatch.setattr(httpx, "AsyncClient", _FakeAsyncClient)
270+
271+
# No exception expected — the cursor guard must not raise on
272+
# ``None`` or ``""``.
273+
result = await tool("collection-id", "query", cursor=noop_cursor)
274+
assert isinstance(result, dict), (
275+
f"{tool.__name__} must return the SearchResult dict for None/empty cursor (got {type(result).__name__})."
276+
)
277+
278+
279+
@pytest.mark.asyncio
280+
async def test_web_search_does_not_carry_cursor_param():
281+
"""``web_search`` is intentionally cursor-less — its provider
282+
chain (Jina / DDG) is provider-bounded, not paginated, per §B.4.
283+
The amendment thread (msg=b9b7072a Drift #4) only added cursor
284+
placeholders to the 3 collection-scoped split tools.
285+
"""
286+
287+
params = _params(web_search)
288+
assert "cursor" not in params, (
289+
"web_search must not carry a cursor parameter — its scope is provider-bounded per §B.4 (no pagination)."
290+
)
291+
292+
142293
# --- 3. Deprecation banner on omnibus aliases ------------------------
143294

144295

0 commit comments

Comments
 (0)