Skip to content

feat(graph): require evidence on manual concept creation#358

Merged
aaronsb merged 16 commits intomainfrom
feature/concept-evidence-required
Mar 26, 2026
Merged

feat(graph): require evidence on manual concept creation#358
aaronsb merged 16 commits intomainfrom
feature/concept-evidence-required

Conversation

@aaronsb
Copy link
Copy Markdown
Owner

@aaronsb aaronsb commented Mar 26, 2026

Summary

  • Manually created concepts now require evidence_text (min 10 chars) — stored as an Instance node linked to the concept's synthetic source
  • New POST /concepts/{id}/evidence endpoint to add evidence to existing concepts
  • New add_evidence action on the MCP concept tool
  • evidence_text parameter added to MCP graph tool (create concept + queue operations)
  • Fixes pre-existing TS build failures by adding missing @types/* dev dependencies

Context

Concepts created via the graph MCP tool were born with zero evidence instances — completely ungrounded (e.g. c_341dd030bb48 had "Unexplored [0% conf]"). The synthetic source tracked who created the concept but not why. This closes that gap by requiring evidence at creation time and providing a way to add evidence after the fact.

Test plan

  • Create concept without evidence_text → rejected with validation error
  • Create concept with evidence_text → succeeds, concept has 1 evidence instance
  • Add evidence to existing concept → instance created, grounding score improves
  • CLI tests pass (46/46)
  • API tests pass (1180/1180, 1 pre-existing unrelated failure excluded)
  • TypeScript build succeeds cleanly

aaronsb added 15 commits March 15, 2026 15:33
Proposes replacing hardcoded model lists with a database-backed catalog
per provider, adding OpenRouter as a fourth inference endpoint, and
integrating per-model pricing into cost estimation.
…R-800)

- Add provider_model_catalog table (migration 059) with pricing, capabilities,
  and curation fields per model per provider
- Seed catalog with known models and pricing (migration 060)
- Add OpenRouterProvider using OpenAI SDK with custom base URL
- Add fetch_model_catalog() to all providers for dynamic discovery
- Add model_catalog.py with upsert/query/enable/default/pricing helpers
- Add admin API endpoints: GET/POST/PUT /admin/models/catalog/*
- Add configure.py 'models' subcommand: list, refresh, enable, disable, default, price
- Update CostEstimator to read pricing from catalog before env vars
- Wire OpenRouter into provider factory, rate limiter, and configure.py
- Update list_available_models() on all providers to prefer catalog
Replace hardcoded OpenAI setup in guided-init.sh with interactive flow:
- Step 4: Choose provider (OpenAI, Anthropic, OpenRouter)
- Step 5: Enter and validate API key
- Step 6: Refresh model catalog, present filtered menu, user picks model

OpenRouter shows curated subset (GPT-4o, Claude, Gemini, Llama, etc.)
with option [0] to show all 200+ models. Ollama noted as post-init config.

Also adds --tsv, --category, --limit flags to configure.py models list
for machine-parseable output used by the init script.
Needed by OpenRouter and Ollama fetch_model_catalog() which use
requests.get() directly rather than the OpenAI SDK.
Follow existing pattern — INSERT INTO public.schema_migrations at the
end of each migration file.
058 uses AGE Cypher DDL (CREATE VLABEL/ELABEL) which can fail on cold
start when AGE isn't fully initialized. Wrap each label creation in
EXCEPTION handler so failures log a notice instead of aborting — labels
get created on first use anyway.

Also add ON_ERROR_STOP=1 to psql in migrate-db.sh so failed migrations
don't self-register via the INSERT at the end of the file. Previously,
a mid-file failure would still record the migration as applied, then
the runner would exit, blocking all subsequent migrations.
Split migrations into two phases:
- schema/migrations/     — standard SQL (tables, indexes, permissions)
- schema/migrations-warm/ — AGE/Cypher DDL (needs running graph engine)

Move 058 (precreate_graph_labels) to migrations-warm. After cold
migrations, restart postgres so AGE is fully initialized, verify it's
healthy, then apply warm migrations.

Also:
- Migration runner continues past failures instead of stopping
- configure.py ai-provider handles missing catalog table gracefully
- Runner supports --warm flag to select migration directory
…input

The ai_extraction_config table has a CHECK constraint that only allowed
openai, anthropic, ollama. Add openrouter via migration 061.

Also remove -s flag from API key read so input is visible — easier to
verify the key was pasted correctly.
NUMERIC(12,6) overflows on high-cost OpenRouter models (image generation,
specialized). Use unconstrained NUMERIC which handles arbitrary precision.
No backwards compatibility needed — merge table creation, seed data,
openrouter constraint, and NUMERIC widening into one migration.
[$] cycles through: unsorted → cheapest first → most expensive first.
[0] toggles between curated and full model list. Both compose — sort
applies to whichever list is currently shown.
The catalog ID is the 2nd positional arg which argparse maps to
provider_name, not model_id. Fall back to provider_name for actions
that expect a catalog ID.
4096 truncates extraction responses for verbose models via OpenRouter.
Bump to 16384 as interim fix — proper solution is catalog-driven token
limits per model.
- Add --max-tokens flag to configure.py ai-provider
- Store max_tokens in ai_extraction_config table (column already existed)
- OpenRouterProvider reads max_tokens from config (default 16384)
- Init flow prompts for max tokens with press-enter-to-accept default
- Factory passes max_tokens from database config to provider
…e endpoint

Manually created concepts were born with zero evidence instances, making them
ungrounded. Now evidence_text is required when creating concepts via API/CLI/MCP
(except match_only mode and LLM extraction). Evidence is stored as an Instance
node linked to the concept's synthetic source.

Also adds:
- POST /concepts/{id}/evidence endpoint for adding evidence to existing concepts
- add_evidence action on the MCP concept tool
- evidence_text parameter on the MCP graph tool (create concept + queue)
- Missing @types dev deps that were causing pre-existing TS build failures
@aaronsb
Copy link
Copy Markdown
Owner Author

aaronsb commented Mar 26, 2026

Code Review: Evidence-Required for Manual Concept Creation

Scope: Evidence feature changes only (models, routes, service, CLI types, MCP client, MCP server, graph-operations). Not reviewing ADR-800 model catalog changes bundled in this PR.

Overall Assessment: Solid feature with clean separation across the stack. The validation enforcement, Instance node creation, and add_evidence endpoint follow existing patterns well. A few issues worth addressing before merge.


Finding 1: Audit Action Mismatch (Bug)

Location: api/app/routes/concepts.py:314

Problem: The add_evidence endpoint uses AuditAction.CREATE_CONCEPT for its audit log, but this is an evidence creation action, not a concept creation.

Why it matters: Audit logs become unreliable for forensic analysis. Someone filtering audit events for concept creation will get false positives from evidence additions, and there's no way to filter for evidence additions specifically.

Suggestion: Add an ADD_EVIDENCE variant to AuditAction (or use a distinct string). The AuditAction enum in audit_service.py already follows a pattern of one enum per operation type (CREATE_CONCEPT, UPDATE_CONCEPT, DELETE_CONCEPT, CREATE_EDGE, etc.).


Finding 2: Missing response_model on Evidence Endpoint

Location: api/app/routes/concepts.py:283-286

Problem: The add_evidence endpoint has no response_model declared, unlike every other endpoint in this file. The service returns a raw dict, which FastAPI will serialize without Pydantic validation.

Why it matters: Two consequences: (1) OpenAPI docs won't describe the response shape, so clients have to guess; (2) no guarantee the response matches EvidenceResponse in the TypeScript types — if the service dict shape drifts, the API silently returns unexpected fields.

Suggestion: Add response_model=EvidenceResponse (you already have the Pydantic model, you'd just need to make it BaseModel with the right fields, which it already is at api/app/models/concepts.py:97-105). Then either return an EvidenceResponse(...) from the route or let FastAPI coerce the dict via the model.

Actually, looking at EvidenceCreate (the request model) — you'd need a separate response Pydantic model in concepts.py since EvidenceCreate only has evidence_text. Consider adding:

class EvidenceResponse(BaseModel):
    concept_id: str
    instance_id: str
    source_id: str
    evidence_text: str

This mirrors the TypeScript EvidenceResponse interface already defined in cli/src/types/index.ts:1064-1069.


Finding 3: Query Safety — Raw _execute_cypher in add_evidence

Location: api/app/services/concept_service.py:264-270

Problem: The add_evidence method uses self.age_client._execute_cypher(...) directly to verify the concept exists. Per the API Way (ADR-048), _execute_cypher can match vocabulary nodes and is the "UNSAFE" pattern. The existing methods (create_concept, list_concepts, update_concept) also use _execute_cypher, so this is consistent within the file — but it's worth noting as pre-existing debt.

Why it matters now: The new add_evidence method's MATCH query (MATCH (c:Concept {concept_id: $concept_id})) is actually safe since it targets the Concept label explicitly with a parameterized ID. The parameters are passed correctly (not string-interpolated), so injection risk is mitigated. This is more of a consistency note than a blocking issue.

Suggestion: Non-blocking. If the facade gains a get_concept_by_id method in the future, migrate these queries. The parameterized approach used here is safe in practice.


Finding 4: No Evidence Validation in Graph Tool Queue Operations

Location: cli/src/mcp/graph-operations.ts:655-667

Problem: The graph tool's queue operation (executeQueueOp) passes evidence_text through to createConcept, which is correct. But neither the queue parsing nor executeQueueOp validates that evidence_text is present before calling create. Validation happens server-side (good), but a missing evidence_text in a multi-op queue will fail partway through the batch — succeeding on some operations and failing on the concept create.

Why it matters: Partial queue failures are hard to recover from. The user creates edges and concepts in a single queue, and one missing field causes a mid-batch abort.

Suggestion: Consider adding client-side pre-validation in the queue executor that checks for evidence_text when op === 'create' and entity === 'concept', similar to the existing if (!op.label || !op.ontology) throw new Error(...) check at line 657-658. This gives a fast-fail before any mutations happen.


Finding 5: Missing Test Coverage for Evidence Features

Location: tests/api/test_concepts.py

Problem: The test file has no tests for the new POST /concepts/{id}/evidence endpoint and no tests verifying the evidence_text requirement on concept creation. The existing test_create_concept_success test still creates a concept without evidence_text (which would now fail for non-LLM creation methods in the real service).

Why it matters: The PR description says tests pass (1180/1180), but none of those tests exercise the new evidence validation path or the add_evidence endpoint. The mock-based tests sidestep the service logic entirely, so the evidence requirement isn't tested even indirectly.

Suggestion: Add tests for:

  1. POST /concepts without evidence_text → service raises ValueError → 400
  2. POST /concepts with evidence_text → success, instance created
  3. POST /concepts/{id}/evidence → success path (201, correct response shape)
  4. POST /concepts/{id}/evidence on non-existent concept → 404
  5. POST /concepts/{id}/evidence auth/RBAC → requires graph:write
  6. Evidence text below min length (< 10 chars) → validation error

Finding 6: creation_method Hardcoded to API in add_evidence

Location: api/app/services/concept_service.py:281

Problem: add_evidence always passes CreationMethod.API when creating the synthetic source, regardless of how the evidence was actually submitted. If evidence is added via MCP (concept tool, add_evidence action), the synthetic source still says "created via api".

Why it matters: Provenance becomes inaccurate. The synthetic source text will say "created via api" even when the evidence came through MCP or CLI, which undermines the purpose of tracking creation method.

Suggestion: Accept an optional creation_method parameter in add_evidence() and pass it from the route. The MCP handler in mcp-server.ts:1718-1734 could pass creation_method: 'mcp'. Alternatively, if this is intentional (evidence creation is always attributed to the API regardless of entry point), document that decision.


Positive Observations

  • Clean EvidenceCreate model with proper min/max length validation via Pydantic field constraints
  • _create_evidence_instance is well-extracted as a reusable private method, shared between create_concept and add_evidence
  • encodeURIComponent(conceptId) in the client — good defensive practice against path traversal
  • The evidence_text field on ConceptCreate is Optional[str] with enforcement in the service rather than at the model level — this correctly allows LLM extraction to skip evidence while requiring it for manual creation. The conditional validation at service layer (line 117) is the right place for this business rule.
  • TypeScript types (EvidenceCreate, EvidenceResponse, ConceptCreate.evidence_text) are properly aligned with the Python models

Summary

Category Count
Bug (audit action) 1
API contract gap (missing response_model) 1
Missing tests 1 (blocking)
Provenance accuracy 1
Client-side validation opportunity 1
Query safety (pre-existing, non-blocking) 1

Recommendation: Fix findings 1, 2, and 5 before merge. Finding 6 is worth addressing if feasible. Findings 3 and 4 are non-blocking improvements.


AI-assisted review via Claude

- Add ADD_EVIDENCE audit action instead of reusing CREATE_CONCEPT
- Add EvidenceResponse model and response_model on evidence endpoint
- Accept creation_method parameter in add_evidence service method
- Add tests for evidence endpoint (success, validation, 404, auth)
- Add tests for evidence_text requirement on concept creation
@aaronsb aaronsb merged commit e5b46fb into main Mar 26, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant