Input: Design documents from /specs/056-output-schema-validation/
Prerequisites: plan.md, spec.md, research.md, data-model.md, contracts/validator.md, quickstart.md
Tests: TDD is mandatory (Constitution V + repo CLAUDE.md). Every implementation task is preceded by a failing test.
Organization: Grouped by user story (US1 P1, US2 P2, US3 P3). US1 is the standalone MVP.
- [P]: Parallelizable (different files, no dependency on an incomplete task) — suitable to fan out to a subagent.
- Paths are repo-root-relative.
- T001 Promote
github.com/santhosh-tekuri/jsonschema/v6from indirect to a direct require ingo.mod(move the line out of the indirect block; rungo mod tidyand confirmgo build ./...still compiles). Capture the exact version (v6.0.2). - T002 Create the empty package skeleton
internal/outputvalidation/doc.gowith a package comment describing the pure-validator contract (no server/storage imports allowed).
These unblock every story. Config + metadata field must exist before wiring.
- T003 [P] Add failing test
internal/config/config_test.go(or extend existing)TestDefaultOutputValidationConfigasserting defaults{Mode:"warn", MaxBytes:5<<20, MaxDepth:64, MissingStructuredContent:"allow"}and helper behaviour (IsEnabledfalse only for "off";IsStrict; nil pointer ⇒ defaults). - T004 Implement
OutputValidationConfigstruct, root-config fieldOutputValidation *OutputValidationConfig(jsonoutput_validation,omitempty),DefaultOutputValidationConfig(), and helpers ininternal/config/config.go; make T003 pass. Wire the default into config load/normalisation alongsideSensitiveDataDetection. - T005 [P] Add
OutputSchemaJSON stringfield (jsonoutput_schema_json,omitempty) toToolMetadataininternal/config/config.go.
Goal: A tool with a declared outputSchema returning a non-conforming structuredContent is blocked (strict) or forwarded+audited (warn); conforming output passes through byte-identical; no schema ⇒ no-op.
Independent Test: go test ./internal/outputvalidation/... + go test ./internal/server/ -run ContentForward + the E2E stub-tool scenario all green; conforming structuredContent is byte-identical.
- T006 [P] [US1] Write failing
internal/outputvalidation/validator_test.go: table tests for (a) empty schema ⇒ Pass; (b) nil structured ⇒ Pass; (c) conforming structured ⇒ Pass; (d) violating structured ⇒ Violate with non-empty Reason; (e) uncompilable schema ⇒ Pass + warn logged once; (f) cache reuse (second Validate with same key doesn't recompile — assert via a compile counter or timing hook). - T007 [P] [US1] Write failing
internal/server/content_forward_test.go: (a) success path forwardsStructuredContentbyte-identical (deep-equal + no pointer mutation); (b) injected validator returning Violate in strict ⇒ caller-visible block signal; (c) Violate in warn ⇒ original payload forwarded, verdict surfaced; (d)IsErrorupstream result ⇒ validator not invoked.
- T008 [US1] Implement
Validator,New(...),Validate(toolKey, schemaJSON, structured),Verdict/Outcomeininternal/outputvalidation/validator.gopercontracts/validator.md: per-tool compiled-schemasync.Mapcache keyed by toolKey+schema-hash; uncompilable ⇒ sentinel + one-time warn; never mutatestructured. Make T006 pass (guards stubbed/permissive for now — real guards land in US2). - T009 [US1] Add the validator hook to
forwardContentResultininternal/server/content_forward.go: accept an optional validator + the captured schema + mode posture; on success forward unchanged; return a verdict to the caller. Make T007 pass. Keep the function signature change backward-compatible for the non-validating callers (or add a sibling wrapper). - T010 [US1] Capture the output schema at discovery: in
internal/upstream/core/client.go(~line 284, beside theParamsJSONmarshal) populateToolMetadata.OutputSchemaJSONfromtool.RawOutputSchema(fallback: marshaltool.OutputSchema). Add/extend a client test asserting a tool with an output schema yields a non-emptyOutputSchemaJSONand one without yields empty. - T011 [US1] Wire validation into
handleCallToolVariantat bothforwardContentResultcall sites (internal/server/mcp.go:1794and:2166): look up the capturedOutputSchemaJSONforserver:tool, skip whenresult.IsErrorormode=off, run the validator, and translate the verdict — strict ⇒emitActivityPolicyDecision(server,tool,sid,"blocked",reason)+mcp.NewToolResultError(...); warn ⇒emitActivityPolicyDecision(...,"warning",reason)+ forward unchanged. Hold a*outputvalidation.Validatoron the server struct, constructed from config at startup. - T012 [US1] Integration test
internal/server/*_test.goexercising T011: a fake upstream tool with an output schema returning conforming → forwarded; violating → blocked in strict (error result) and forwarded in warn; assert exactly onepolicy_decisionactivity record per failure with the tool/mode/reason.
Checkpoint: US1 delivers the MVP — schema validation end-to-end with strict/warn and audit records.
Goal: byte-size and nesting-depth guards run before schema validation; a breach is a guard-violation verdict under the active mode.
Independent Test: go test ./internal/outputvalidation/ -run Guard green; oversized/deep payloads blocked in strict / tagged in warn before compilation.
- T013 [P] [US2] Write failing
internal/outputvalidation/guards_test.go: (a) payload overmax_bytes⇒ guard verdictGuardHit="max_bytes", schema NOT compiled; (b) nesting deeper thanmax_depth⇒GuardHit="max_depth"; (c) within both ⇒ proceeds to schema validation; (d) depth walk handles arrays + objects + scalars without stack blowup. - T014 [US2] Implement
guards.go(byte-size via one-time marshal, recursive depth walk with an explicit bound) and call guards first insideValidate; make T013 pass and ensure the US1 "cache not recompiled on guard breach" expectation holds. - T015 [US2] Extend the server integration test (T012) with a tool whose response trips each guard; assert guard-violation handling + activity record in both modes.
Checkpoint: proxy is protected from pathological structured payloads.
Goal: mode off/warn/strict + missing_structured_content posture honoured end-to-end; failures discoverable via the activity CLI/API.
Independent Test: toggling mode changes behaviour as specified; mcpproxy activity list/show surface validation records with tool/mode/reason.
- T016 [P] [US3] Test:
mode=off⇒ validator never invoked (no activity records, payload untouched) — add to server integration test. - T017 [US3] Implement the
missing_structured_contentposture in the T011 caller: declared-schema + nilstructuredContent⇒ no-op in warn; in strict, block iff posture=block. Add a test covering the ContextForge #4042 trap (declared schema, text-only response, warn ⇒ forwarded; strict+allow ⇒ forwarded; strict+block ⇒ blocked). - T018 [P] [US3] Test that a validation failure record is retrievable via
GET /api/v1/activity?type=policy_decisionand viamcpproxy activity show <id>with tool/mode/reason fields populated (reuse existing activity test harness).
Checkpoint: feature is configurable and auditable.
- T019 [US1] Extend
scripts/test-api-e2e.sh(or add a sibling stub MCP server undere2e//scripts/) with an upstream tool declaring anoutputSchema; assert via curl: strict-mode block returns an error mentioning schema validation, warn-mode forwards, and apolicy_decisionactivity record appears. This is the mandatory curl-based verification. - T020 [P] Run
go test -race ./internal/outputvalidation/... ./internal/config/... ./internal/server/...and fix any race/flake. - T021 [P] Write
docs/features/output-schema-validation.md(config block, modes table, activity queries) mirroringquickstart.md; add the REST/MCP behaviour note. Do NOT expandCLAUDE.md(40k char CI gate — at most one line if any). - T022 [P] Run
./scripts/run-linter.sh(golangci-lint) and resolve all findings in touched files. - T023 Update
oas/swagger.yamlonly if a new/changed REST field is exposed (validation surfaces via existing/api/v1/activity; likely no OAS change — confirm with./scripts/verify-oas-coverage.sh). - T024 Final
make build(personal) +go build -tags server ./cmd/mcpproxy(server) to confirm both editions compile unaffected (FR-A12).
- Setup (T001–T002) → blocks everything.
- Foundational (T003–T005) → blocks all stories (config + metadata field).
- US1 (T006–T012) → MVP; depends on Foundational. Within US1: T006/T007/T010 tests are [P]; T008 before T009 before T011 before T012.
- US2 (T013–T015) → depends on US1's validator (T008) and server wiring (T011).
- US3 (T016–T018) → depends on US1 wiring (T011); independent of US2.
- Polish (T019–T024) → after the stories it verifies; T020/T021/T022 are [P].
- Test-writing wave (after Foundational): T006, T007, T010-test in parallel subagents — different files, no shared state.
- Docs + lint + race (T020, T021, T022) run concurrently at the end.
- US2 and US3 can proceed in parallel once US1 (T011) lands — they touch disjoint logic (guards vs. config posture/observability), coordinating only on the shared server integration test file (serialize edits to that one file).
- MVP = Phase 1 + 2 + US1 (T001–T012) — shippable on its own: schema validation with strict/warn + audit.
- Layer US2 (guards) then US3 (config posture + observability).
- E2E + polish last; verify with curl (T019) per the mandate.