feat(graph): require evidence on manual concept creation by aaronsb · Pull Request #358 · aaronsb/knowledge-graph-system

aaronsb · 2026-03-26T23:23:56Z

Summary

Manually created concepts now require evidence_text (min 10 chars) — stored as an Instance node linked to the concept's synthetic source
New POST /concepts/{id}/evidence endpoint to add evidence to existing concepts
New add_evidence action on the MCP concept tool
evidence_text parameter added to MCP graph tool (create concept + queue operations)
Fixes pre-existing TS build failures by adding missing @types/* dev dependencies

Context

Concepts created via the graph MCP tool were born with zero evidence instances — completely ungrounded (e.g. c_341dd030bb48 had "Unexplored [0% conf]"). The synthetic source tracked who created the concept but not why. This closes that gap by requiring evidence at creation time and providing a way to add evidence after the fact.

Test plan

Create concept without evidence_text → rejected with validation error
Create concept with evidence_text → succeeds, concept has 1 evidence instance
Add evidence to existing concept → instance created, grounding score improves
CLI tests pass (46/46)
API tests pass (1180/1180, 1 pre-existing unrelated failure excluded)
TypeScript build succeeds cleanly

Proposes replacing hardcoded model lists with a database-backed catalog per provider, adding OpenRouter as a fourth inference endpoint, and integrating per-model pricing into cost estimation.

…R-800) - Add provider_model_catalog table (migration 059) with pricing, capabilities, and curation fields per model per provider - Seed catalog with known models and pricing (migration 060) - Add OpenRouterProvider using OpenAI SDK with custom base URL - Add fetch_model_catalog() to all providers for dynamic discovery - Add model_catalog.py with upsert/query/enable/default/pricing helpers - Add admin API endpoints: GET/POST/PUT /admin/models/catalog/* - Add configure.py 'models' subcommand: list, refresh, enable, disable, default, price - Update CostEstimator to read pricing from catalog before env vars - Wire OpenRouter into provider factory, rate limiter, and configure.py - Update list_available_models() on all providers to prefer catalog

Replace hardcoded OpenAI setup in guided-init.sh with interactive flow: - Step 4: Choose provider (OpenAI, Anthropic, OpenRouter) - Step 5: Enter and validate API key - Step 6: Refresh model catalog, present filtered menu, user picks model OpenRouter shows curated subset (GPT-4o, Claude, Gemini, Llama, etc.) with option [0] to show all 200+ models. Ollama noted as post-init config. Also adds --tsv, --category, --limit flags to configure.py models list for machine-parseable output used by the init script.

Needed by OpenRouter and Ollama fetch_model_catalog() which use requests.get() directly rather than the OpenAI SDK.

Follow existing pattern — INSERT INTO public.schema_migrations at the end of each migration file.

058 uses AGE Cypher DDL (CREATE VLABEL/ELABEL) which can fail on cold start when AGE isn't fully initialized. Wrap each label creation in EXCEPTION handler so failures log a notice instead of aborting — labels get created on first use anyway. Also add ON_ERROR_STOP=1 to psql in migrate-db.sh so failed migrations don't self-register via the INSERT at the end of the file. Previously, a mid-file failure would still record the migration as applied, then the runner would exit, blocking all subsequent migrations.

Split migrations into two phases: - schema/migrations/ — standard SQL (tables, indexes, permissions) - schema/migrations-warm/ — AGE/Cypher DDL (needs running graph engine) Move 058 (precreate_graph_labels) to migrations-warm. After cold migrations, restart postgres so AGE is fully initialized, verify it's healthy, then apply warm migrations. Also: - Migration runner continues past failures instead of stopping - configure.py ai-provider handles missing catalog table gracefully - Runner supports --warm flag to select migration directory

…input The ai_extraction_config table has a CHECK constraint that only allowed openai, anthropic, ollama. Add openrouter via migration 061. Also remove -s flag from API key read so input is visible — easier to verify the key was pasted correctly.

NUMERIC(12,6) overflows on high-cost OpenRouter models (image generation, specialized). Use unconstrained NUMERIC which handles arbitrary precision.

No backwards compatibility needed — merge table creation, seed data, openrouter constraint, and NUMERIC widening into one migration.

[$] cycles through: unsorted → cheapest first → most expensive first. [0] toggles between curated and full model list. Both compose — sort applies to whichever list is currently shown.

The catalog ID is the 2nd positional arg which argparse maps to provider_name, not model_id. Fall back to provider_name for actions that expect a catalog ID.

4096 truncates extraction responses for verbose models via OpenRouter. Bump to 16384 as interim fix — proper solution is catalog-driven token limits per model.

- Add --max-tokens flag to configure.py ai-provider - Store max_tokens in ai_extraction_config table (column already existed) - OpenRouterProvider reads max_tokens from config (default 16384) - Init flow prompts for max tokens with press-enter-to-accept default - Factory passes max_tokens from database config to provider

…e endpoint Manually created concepts were born with zero evidence instances, making them ungrounded. Now evidence_text is required when creating concepts via API/CLI/MCP (except match_only mode and LLM extraction). Evidence is stored as an Instance node linked to the concept's synthetic source. Also adds: - POST /concepts/{id}/evidence endpoint for adding evidence to existing concepts - add_evidence action on the MCP concept tool - evidence_text parameter on the MCP graph tool (create concept + queue) - Missing @types dev deps that were causing pre-existing TS build failures

aaronsb · 2026-03-26T23:29:57Z

Code Review: Evidence-Required for Manual Concept Creation

Scope: Evidence feature changes only (models, routes, service, CLI types, MCP client, MCP server, graph-operations). Not reviewing ADR-800 model catalog changes bundled in this PR.

Overall Assessment: Solid feature with clean separation across the stack. The validation enforcement, Instance node creation, and add_evidence endpoint follow existing patterns well. A few issues worth addressing before merge.

Finding 1: Audit Action Mismatch (Bug)

Location: api/app/routes/concepts.py:314

Problem: The add_evidence endpoint uses AuditAction.CREATE_CONCEPT for its audit log, but this is an evidence creation action, not a concept creation.

Why it matters: Audit logs become unreliable for forensic analysis. Someone filtering audit events for concept creation will get false positives from evidence additions, and there's no way to filter for evidence additions specifically.

Suggestion: Add an ADD_EVIDENCE variant to AuditAction (or use a distinct string). The AuditAction enum in audit_service.py already follows a pattern of one enum per operation type (CREATE_CONCEPT, UPDATE_CONCEPT, DELETE_CONCEPT, CREATE_EDGE, etc.).

Finding 2: Missing `response_model` on Evidence Endpoint

Location: api/app/routes/concepts.py:283-286

Problem: The add_evidence endpoint has no response_model declared, unlike every other endpoint in this file. The service returns a raw dict, which FastAPI will serialize without Pydantic validation.

Why it matters: Two consequences: (1) OpenAPI docs won't describe the response shape, so clients have to guess; (2) no guarantee the response matches EvidenceResponse in the TypeScript types — if the service dict shape drifts, the API silently returns unexpected fields.

Suggestion: Add response_model=EvidenceResponse (you already have the Pydantic model, you'd just need to make it BaseModel with the right fields, which it already is at api/app/models/concepts.py:97-105). Then either return an EvidenceResponse(...) from the route or let FastAPI coerce the dict via the model.

Actually, looking at EvidenceCreate (the request model) — you'd need a separate response Pydantic model in concepts.py since EvidenceCreate only has evidence_text. Consider adding:

class EvidenceResponse(BaseModel):
    concept_id: str
    instance_id: str
    source_id: str
    evidence_text: str

This mirrors the TypeScript EvidenceResponse interface already defined in cli/src/types/index.ts:1064-1069.

Finding 3: Query Safety — Raw `_execute_cypher` in `add_evidence`

Location: api/app/services/concept_service.py:264-270

Problem: The add_evidence method uses self.age_client._execute_cypher(...) directly to verify the concept exists. Per the API Way (ADR-048), _execute_cypher can match vocabulary nodes and is the "UNSAFE" pattern. The existing methods (create_concept, list_concepts, update_concept) also use _execute_cypher, so this is consistent within the file — but it's worth noting as pre-existing debt.

Why it matters now: The new add_evidence method's MATCH query (MATCH (c:Concept {concept_id: $concept_id})) is actually safe since it targets the Concept label explicitly with a parameterized ID. The parameters are passed correctly (not string-interpolated), so injection risk is mitigated. This is more of a consistency note than a blocking issue.

Suggestion: Non-blocking. If the facade gains a get_concept_by_id method in the future, migrate these queries. The parameterized approach used here is safe in practice.

Finding 4: No Evidence Validation in Graph Tool Queue Operations

Location: cli/src/mcp/graph-operations.ts:655-667

Problem: The graph tool's queue operation (executeQueueOp) passes evidence_text through to createConcept, which is correct. But neither the queue parsing nor executeQueueOp validates that evidence_text is present before calling create. Validation happens server-side (good), but a missing evidence_text in a multi-op queue will fail partway through the batch — succeeding on some operations and failing on the concept create.

Why it matters: Partial queue failures are hard to recover from. The user creates edges and concepts in a single queue, and one missing field causes a mid-batch abort.

Suggestion: Consider adding client-side pre-validation in the queue executor that checks for evidence_text when op === 'create' and entity === 'concept', similar to the existing if (!op.label || !op.ontology) throw new Error(...) check at line 657-658. This gives a fast-fail before any mutations happen.

Finding 5: Missing Test Coverage for Evidence Features

Location: tests/api/test_concepts.py

Problem: The test file has no tests for the new POST /concepts/{id}/evidence endpoint and no tests verifying the evidence_text requirement on concept creation. The existing test_create_concept_success test still creates a concept without evidence_text (which would now fail for non-LLM creation methods in the real service).

Why it matters: The PR description says tests pass (1180/1180), but none of those tests exercise the new evidence validation path or the add_evidence endpoint. The mock-based tests sidestep the service logic entirely, so the evidence requirement isn't tested even indirectly.

Suggestion: Add tests for:

POST /concepts without evidence_text → service raises ValueError → 400
POST /concepts with evidence_text → success, instance created
POST /concepts/{id}/evidence → success path (201, correct response shape)
POST /concepts/{id}/evidence on non-existent concept → 404
POST /concepts/{id}/evidence auth/RBAC → requires graph:write
Evidence text below min length (< 10 chars) → validation error

Finding 6: `creation_method` Hardcoded to `API` in `add_evidence`

Location: api/app/services/concept_service.py:281

Problem: add_evidence always passes CreationMethod.API when creating the synthetic source, regardless of how the evidence was actually submitted. If evidence is added via MCP (concept tool, add_evidence action), the synthetic source still says "created via api".

Why it matters: Provenance becomes inaccurate. The synthetic source text will say "created via api" even when the evidence came through MCP or CLI, which undermines the purpose of tracking creation method.

Suggestion: Accept an optional creation_method parameter in add_evidence() and pass it from the route. The MCP handler in mcp-server.ts:1718-1734 could pass creation_method: 'mcp'. Alternatively, if this is intentional (evidence creation is always attributed to the API regardless of entry point), document that decision.

Positive Observations

Clean EvidenceCreate model with proper min/max length validation via Pydantic field constraints
_create_evidence_instance is well-extracted as a reusable private method, shared between create_concept and add_evidence
encodeURIComponent(conceptId) in the client — good defensive practice against path traversal
The evidence_text field on ConceptCreate is Optional[str] with enforcement in the service rather than at the model level — this correctly allows LLM extraction to skip evidence while requiring it for manual creation. The conditional validation at service layer (line 117) is the right place for this business rule.
TypeScript types (EvidenceCreate, EvidenceResponse, ConceptCreate.evidence_text) are properly aligned with the Python models

Summary

Category	Count
Bug (audit action)	1
API contract gap (missing response_model)	1
Missing tests	1 (blocking)
Provenance accuracy	1
Client-side validation opportunity	1
Query safety (pre-existing, non-blocking)	1

Recommendation: Fix findings 1, 2, and 5 before merge. Finding 6 is worth addressing if feasible. Findings 3 and 4 are non-blocking improvements.

AI-assisted review via Claude

- Add ADD_EVIDENCE audit action instead of reusing CREATE_CONCEPT - Add EvidenceResponse model and response_model on evidence endpoint - Accept creation_method parameter in add_evidence service method - Add tests for evidence endpoint (success, validation, 404, auth) - Add tests for evidence_text requirement on concept creation

aaronsb added 15 commits March 15, 2026 15:33

docs(adr): ADR-800 dynamic model catalog and OpenRouter support

bc15563

Proposes replacing hardcoded model lists with a database-backed catalog per provider, adding OpenRouter as a fourth inference endpoint, and integrating per-model pricing into cost estimation.

fix(operator): add requests to operator container deps

f66e08e

Needed by OpenRouter and Ollama fetch_model_catalog() which use requests.get() directly rather than the OpenAI SDK.

fix(schema): add migration registration to 059 and 060

be408ba

Follow existing pattern — INSERT INTO public.schema_migrations at the end of each migration file.

fix(schema): widen pricing columns to unconstrained NUMERIC

f9011a9

NUMERIC(12,6) overflows on high-cost OpenRouter models (image generation, specialized). Use unconstrained NUMERIC which handles arbitrary precision.

refactor(schema): consolidate migrations 059-062 into single 059

1b34705

No backwards compatibility needed — merge table creation, seed data, openrouter constraint, and NUMERIC widening into one migration.

feat(operator): add price sort toggle to model selection menu

188be02

[$] cycles through: unsorted → cheapest first → most expensive first. [0] toggles between curated and full model list. Both compose — sort applies to whichever list is currently shown.

fix(operator): fix argparse mapping for models enable/disable/default

291ac94

The catalog ID is the 2nd positional arg which argparse maps to provider_name, not model_id. Fall back to provider_name for actions that expect a catalog ID.

fix(ai): increase OpenRouter max_tokens to 16384

d9c6e43

4096 truncates extraction responses for verbose models via OpenRouter. Bump to 16384 as interim fix — proper solution is catalog-driven token limits per model.

aaronsb merged commit e5b46fb into main Mar 26, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graph): require evidence on manual concept creation#358

feat(graph): require evidence on manual concept creation#358
aaronsb merged 16 commits intomainfrom
feature/concept-evidence-required

aaronsb commented Mar 26, 2026

Uh oh!

aaronsb commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronsb commented Mar 26, 2026

Summary

Context

Test plan

Uh oh!

aaronsb commented Mar 26, 2026

Code Review: Evidence-Required for Manual Concept Creation

Finding 1: Audit Action Mismatch (Bug)

Finding 2: Missing response_model on Evidence Endpoint

Finding 3: Query Safety — Raw _execute_cypher in add_evidence

Finding 4: No Evidence Validation in Graph Tool Queue Operations

Finding 5: Missing Test Coverage for Evidence Features

Finding 6: creation_method Hardcoded to API in add_evidence

Positive Observations

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Finding 2: Missing `response_model` on Evidence Endpoint

Finding 3: Query Safety — Raw `_execute_cypher` in `add_evidence`

Finding 6: `creation_method` Hardcoded to `API` in `add_evidence`