Skip to content

Commit 2515918

Browse files
earayuclaude
andcommitted
fix(collection): task #87 P1-D3 — convert vector_backend to computed_field
Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 7fa9177 commit 2515918

4 files changed

Lines changed: 142 additions & 38 deletions

File tree

aperag/domains/knowledge_base/schemas.py

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
from datetime import datetime
3838
from typing import Any, Literal, Optional
3939

40-
from pydantic import AnyUrl, BaseModel, Field, conint
40+
from pydantic import AnyUrl, BaseModel, Field, computed_field, conint
4141

4242
from aperag.schema.common import (
4343
Chunk,
@@ -46,6 +46,7 @@
4646
PaginatedResponse,
4747
VectorBackendInfo,
4848
VisionChunk,
49+
project_vector_backend_info,
4950
)
5051

5152
__all__ = [
@@ -99,32 +100,40 @@ class Collection(BaseModel):
99100
# msg=e4120886 reproducer).
100101
summary: Optional[str] = Field(None, description="Auto-generated long-form knowledge-base summary")
101102
config: Optional[CollectionConfig] = None
102-
# task #61 P1-D3 (PR for #87): read-only projection of the deployment
103-
# vector backend identity + capability matrix. **Intentionally placed
104-
# on Collection (read response) and NOT inside CollectionConfig.**
105-
# ``CollectionConfig`` is reused as the create/update input shape, so
106-
# putting vector_backend there would expose the field on the OpenAPI
107-
# ``CollectionCreate``/``CollectionUpdate`` input schemas and let
108-
# callers mistake a deployment-wide setting for a per-collection
109-
# editable knob (per dongdong msg=c2593fdd + PM msg=caf7e4df + spec
110-
# P1-D3 read-only projection lock). Projected by
111-
# ``collection_service.build_collection_response()`` from
112-
# ``settings.vector_db_type``; ``None`` for unknown backends so the FE
113-
# can render a placeholder without a hard failure.
114-
vector_backend: Optional[VectorBackendInfo] = Field(
115-
None,
116-
description=(
117-
"Read-only deployment vector backend identity + static capability "
118-
"matrix. Projected from ``settings.vector_db_type``; not editable "
119-
"per collection."
120-
),
121-
)
122103
status: Optional[Literal["ACTIVE", "INACTIVE", "DELETED"]] = None
123104
created: Optional[datetime] = None
124105
updated: Optional[datetime] = None
125106
is_published: Optional[bool] = Field(False, description="Whether the collection is published to marketplace")
126107
published_at: Optional[datetime] = Field(None, description="Publication time, null when not published")
127108

109+
# task #61 P1-D3 (PR for #87): read-only projection of the deployment
110+
# vector backend identity + capability matrix. Modelled as a Pydantic
111+
# v2 ``@computed_field`` so it is **only** present on the OpenAPI
112+
# output schema (``Collection-Output``) and is **not** part of any
113+
# input shape that reuses :class:`Collection` (e.g.
114+
# ``CollectionCreate``/``CollectionUpdate``, plus the request-side
115+
# composites ``Agent-Input.collections`` /
116+
# ``CreateTurnRequest.collections`` that inherit ``Collection-Input``).
117+
# Per dongdong msg=fa88e97b BLOCKER: marking ``vector_backend`` as a
118+
# plain ``Optional[VectorBackendInfo]`` field would have leaked the
119+
# deployment-wide setting onto every input shape that references
120+
# ``Collection`` and let callers think they could submit a
121+
# per-collection override on agent / chat requests — exactly the
122+
# error this projection lock is meant to prevent (per architect
123+
# msg=0044261f read-only output projection lock + PM msg=caf7e4df).
124+
@computed_field # type: ignore[prop-decorator]
125+
@property
126+
def vector_backend(self) -> Optional[VectorBackendInfo]:
127+
# Lazy import keeps the module import graph compatible with the
128+
# G1 boundary rules: ``aperag.config`` is allowed from a domain
129+
# schema, but importing it at module-load time on every domain
130+
# schema introduces a settings-side-effect at import which a few
131+
# tests intentionally avoid. Doing the import inside the property
132+
# also matches the existing ``utils.py`` lazy-resolver style.
133+
from aperag.config import settings
134+
135+
return project_vector_backend_info(settings.vector_db_type)
136+
128137

129138
class Document(BaseModel):
130139
id: Optional[str] = None

aperag/domains/knowledge_base/service/collection_service.py

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,8 @@
6767
SearchResultItem,
6868
SearchResultList,
6969
)
70-
from aperag.config import settings
7170
from aperag.exceptions import ValidationException
72-
from aperag.schema.common import PageResult, project_vector_backend_info
71+
from aperag.schema.common import PageResult
7372
from aperag.schema.utils import dumpCollectionConfig, parseCollectionConfig
7473
from aperag.utils.constant import QuotaType
7574
from aperag.utils.utils import utc_now
@@ -176,7 +175,15 @@ async def validate_collection_models(self, user: str, config) -> None:
176175
)
177176

178177
async def build_collection_response(self, instance: CollectionRow) -> Collection:
179-
"""Build Collection response object for API return."""
178+
"""Build Collection response object for API return.
179+
180+
``Collection.vector_backend`` is a Pydantic v2 ``@computed_field``
181+
(per task #61 P1-D3 / PR for #87 — read-only projection of the
182+
deployment vector backend identity + capability matrix). It is
183+
populated automatically on serialization, so it is intentionally
184+
**not** passed as a constructor argument — passing it would also
185+
not work because computed fields are not accepted as input.
186+
"""
180187
return Collection(
181188
id=instance.id,
182189
title=instance.title,
@@ -192,16 +199,6 @@ async def build_collection_response(self, instance: CollectionRow) -> Collection
192199
type=instance.type,
193200
status=getattr(instance, "status", None),
194201
config=parseCollectionConfig(instance.config),
195-
# task #61 P1-D3 (PR for #87): static read-only projection of
196-
# the deployment vector backend identity + capability matrix.
197-
# The value is identical for every collection in the
198-
# deployment because ``settings.vector_db_type`` is a
199-
# deployment-wide env var (``aperag/config.py``); the
200-
# projection is intentionally not persisted per row. Returns
201-
# ``None`` when the configured backend is not in the static
202-
# capability matrix so the FE can render a placeholder
203-
# without a hard failure on misconfigured deployments.
204-
vector_backend=project_vector_backend_info(settings.vector_db_type),
205202
created=instance.gmt_created.isoformat(),
206203
updated=instance.gmt_updated.isoformat(),
207204
)

tests/unit_test/contracts/test_vector_backend_capability_matrix.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,107 @@ def test_project_vector_backend_info_returns_none_for_falsy_input() -> None:
142142
assert project_vector_backend_info("") is None
143143

144144

145+
def test_collection_input_schema_does_not_expose_vector_backend() -> None:
146+
"""Pin OpenAPI input/output split for the deployment vector-backend
147+
projection.
148+
149+
``Collection.vector_backend`` is a Pydantic v2 ``@computed_field``,
150+
so the *input* JSON Schema generated by Pydantic must not list it
151+
while the *output* JSON Schema must (with ``readOnly: true``). The
152+
composite request shapes that reuse :class:`Collection`
153+
(``CollectionCreate`` / ``CollectionUpdate``, plus the agent /
154+
chat-turn request bodies whose ``collections`` field embeds
155+
``Collection-Input`` in the FastAPI / openapi-typescript output)
156+
therefore inherit the same property automatically.
157+
158+
The dongdong msg=fa88e97b BLOCKER caught the previous
159+
``Optional[VectorBackendInfo]`` regular-field implementation
160+
leaking onto every input shape that referenced ``Collection``; this
161+
test pins the fix so a future refactor cannot regress to a regular
162+
field without flipping the gate red.
163+
"""
164+
165+
from aperag.domains.knowledge_base.schemas import (
166+
Collection,
167+
CollectionCreate,
168+
CollectionUpdate,
169+
)
170+
171+
output_schema = Collection.model_json_schema(mode="serialization")
172+
input_schema = Collection.model_json_schema(mode="validation")
173+
174+
output_props = output_schema.get("properties", {})
175+
input_props = input_schema.get("properties", {})
176+
177+
# Output schema must surface the projection + mark it read-only.
178+
assert "vector_backend" in output_props
179+
assert output_props["vector_backend"].get("readOnly") is True
180+
181+
# Input schema must NOT surface the projection — calling
182+
# ``Collection(vector_backend=...)`` is rejected anyway, but we
183+
# still want OpenAPI consumers (FE typed schema, agent request
184+
# body, chat turn request body) to never see it as an editable
185+
# knob in the first place.
186+
assert "vector_backend" not in input_props
187+
188+
# The composite request schemas embed Collection / CollectionConfig
189+
# but never accept ``vector_backend`` directly either.
190+
create_props = CollectionCreate.model_json_schema().get("properties", {})
191+
update_props = CollectionUpdate.model_json_schema().get("properties", {})
192+
assert "vector_backend" not in create_props
193+
assert "vector_backend" not in update_props
194+
195+
196+
def test_collection_constructor_ignores_vector_backend_input() -> None:
197+
"""Defensive: even if a caller submits a ``vector_backend`` payload
198+
on the input shape, Pydantic v2 silently ignores it (computed
199+
fields cannot be set from input) and the resulting object's
200+
``vector_backend`` is still derived from the deployment setting.
201+
202+
The combination of:
203+
* computed-field-only output (asserted by
204+
:func:`test_collection_input_schema_does_not_expose_vector_backend`)
205+
* input being silently ignored (this test)
206+
means a malicious / mistaken caller cannot override the deployment
207+
projection by stuffing ``vector_backend`` into a request body.
208+
"""
209+
210+
from aperag.domains.knowledge_base.schemas import Collection
211+
212+
instance = Collection.model_validate(
213+
{
214+
"id": "col1",
215+
"vector_backend": {
216+
"type": "qdrant",
217+
"capabilities": {
218+
"supports_atomic_batch_upsert": False,
219+
"supports_filter_or_with_empty_parts": False,
220+
"supports_legacy_mode": True,
221+
},
222+
},
223+
}
224+
)
225+
226+
# The computed property is recomputed from settings on access; the
227+
# input payload is discarded. We only assert that the value does
228+
# NOT equal the malicious payload — the actual projection depends
229+
# on the active ``settings.vector_db_type`` in the test runtime.
230+
rendered = instance.model_dump()
231+
assert rendered.get("vector_backend") != {
232+
"type": "qdrant",
233+
"capabilities": {
234+
"supports_atomic_batch_upsert": False,
235+
"supports_filter_or_with_empty_parts": False,
236+
"supports_legacy_mode": True,
237+
},
238+
} or rendered.get("vector_backend") == project_vector_backend_info(
239+
# If the test deployment happens to be qdrant the values may
240+
# coincide; in that case the equality is fine because it came
241+
# from the projection helper, not the input payload.
242+
"qdrant"
243+
).model_dump() # type: ignore[union-attr]
244+
245+
145246
def test_vector_backend_info_pydantic_round_trip() -> None:
146247
"""Sanity: the Pydantic model round-trips through ``model_dump`` /
147248
``model_validate`` so the generated OpenAPI schema mirrors the

web/src/api-v2/schema.d.ts

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2843,8 +2843,6 @@ export interface components {
28432843
*/
28442844
summary?: string | null;
28452845
config?: components["schemas"]["CollectionConfig-Input"] | null;
2846-
/** @description Read-only deployment vector backend identity + static capability matrix. Projected from ``settings.vector_db_type``; not editable per collection. */
2847-
vector_backend?: components["schemas"]["VectorBackendInfo"] | null;
28482846
/** Status */
28492847
status?: ("ACTIVE" | "INACTIVE" | "DELETED") | null;
28502848
/** Created */
@@ -2882,8 +2880,6 @@ export interface components {
28822880
*/
28832881
summary?: string | null;
28842882
config?: components["schemas"]["CollectionConfig-Output"] | null;
2885-
/** @description Read-only deployment vector backend identity + static capability matrix. Projected from ``settings.vector_db_type``; not editable per collection. */
2886-
vector_backend?: components["schemas"]["VectorBackendInfo"] | null;
28872883
/** Status */
28882884
status?: ("ACTIVE" | "INACTIVE" | "DELETED") | null;
28892885
/** Created */
@@ -2901,6 +2897,7 @@ export interface components {
29012897
* @description Publication time, null when not published
29022898
*/
29032899
published_at?: string | null;
2900+
readonly vector_backend: components["schemas"]["VectorBackendInfo"] | null;
29042901
};
29052902
/** CollectionConfig */
29062903
"CollectionConfig-Input": {

0 commit comments

Comments
 (0)