Log-group tagging, CKAN civic-AI caveats, and JSON-RPC error-code fix by brendanbabb · Pull Request #2 · codeforanchorage/OpenContext

brendanbabb · 2026-05-14T21:25:53Z

Merges the boston/security-hardening-and-aws-docs branch into main. Four commits:

Commit	Summary
`5a7ede2`	CKAN: PARTIAL-dataset warning + `include_resource_totals` for whole-dataset counts
`8256db3`	Tag CloudWatch log groups with `Project = "mcp-server"` (Lambda + new API Gateway access log group) so the mcp-observability project can discover them via the Resource Groups Tagging API. Already deployed & verified in staging and prod.
`9479ea7`	Add civic-AI safety caveats to CKAN response formatters (provenance, sample-size, freshness, stringly-typed, NULL-like frequency, field-name validation, search-ambiguity, `search_and_query` composite). +44 tests.
`6fd1ccf`	Return JSON-RPC `-32601` (Method not found) for unknown methods instead of `-32603` (Internal error).

Validation

Full test suite: 441 passed (pytest tests/)
ruff check: clean on all changed files
ruff format: applied; all changed files formatted
Terraform: terraform validate clean; terraform fmt clean on touched files
The log-group tagging is already live — staging (boston-ckan-mcp-staging) and prod (boston-opencontext-mcp-prod) log groups confirmed discoverable via resourcegroupstaggingapi get-resources.

Notes

The CKAN civic-AI changes (9479ea7) were pre-existing work in the working tree, committed as-is with formatting normalized — not authored in the session that opened this PR.
Untracked scratch artifacts (.claude/, terraform/aws/tfplan, probe_*.json, live_*.json) were intentionally not committed.

🤖 Generated with Claude Code

…aset counts Regression report from prod: when GPT-4o was asked "How many 311 service requests does Boston have in total?", the original (un- enhanced) MCP forced a search_datasets→get_dataset→iterate-query_data chain that surfaced per-year counts. The enhanced search_and_query short-circuits that — picks the rolling NEW SYSTEM resource (~9,790 rows, the recent ~30-day operational view), presents it as canonical, and the model reports 9,790 as "the total". 22 archives' worth of data quietly missed. Root cause: making one tool excellent at the 80% case ("give me data") creates a path-of-least-resistance regression for the 20% case ("give me a total"). Path of least resistance wins. Fixes: 1. Prominent PARTIAL DATASET ANSWER block, prepended whenever search_and_query auto-picked one of N queryable resources (i.e. neither resource_name nor resource_index were provided). Tells the model the answer is for one resource only and names the three escape hatches: include_resource_totals=true, resource_name= for a specific archive, or execute_sql with UNION ALL. Suppressed when the model explicitly picked a resource — they got what they asked for. 2. New include_resource_totals=true flag. When set, runs COUNT(*) in parallel (asyncio.gather) against every queryable resource of the matched dataset. Output prepends a "Per-resource totals" block with a GRAND TOTAL line + per-resource breakdown. One follow-up call gives the model the full breakdown that previously took 22 sequential calls. 3. The existing siblings-list block is suppressed when include_resource_totals is set, since the totals block already lists all resources with counts. Live verification (boston prod CKAN, this commit): - search_and_query("311") with no flags → PARTIAL block at top, 9,790 rolling-window count, plus the siblings list and the path to a real total. - search_and_query("311", include_resource_totals=true) → "GRAND TOTAL across 17 resources: 3402330" with per-year breakdown (2011: 58262 ... 2026: 108033). Single call. Tests: +3 (49 -> 52). Coverage: PARTIAL warning fires on auto-pick, no warning when resource_name is explicit, parallel COUNTs sum to correct grand total.

Add `Project = mcp-server` tags so the mcp-observability project can discover these log groups via the Resource Groups Tagging API. - Lambda log group (/aws/lambda/...): add tags block. - API Gateway access log group (/aws/apigateway/...-access): this group did not exist, so create it and wire access_log_settings on the prod stage to emit access logs to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wire provenance, sample-size, freshness, type-fidelity, and field-validation guidance into the CKAN plugin's response formatters so the model gets explicit warnings instead of silently over-trusting open-data responses: - Provenance headers + echoed query params on every response (_wrap_response, _format_provenance_header, _params_repr). - Sample-size caveats: SMALL SAMPLE / single-record banners so counts and percentages are not generalized from tiny result sets. - Data freshness: flag datasets last edited beyond their update cadence (_parse_ckan_iso, _frequency_days, _format_freshness_caveat). - Stringly-typed columns: warn when date/number values sit in TEXT columns. - NULL-like frequency: DATA QUALITY caveat for columns that are mostly empty / "N/A" / "Unknown". - Field-name validation with "did you mean" suggestions against the real schema; search-ambiguity detection across candidate datasets. - search_and_query composite formatter for one-call keyword-to-rows. Adds 44 tests covering the new formatters. Full suite: 441 passed. Pre-existing work from the working tree; committed as-is (formatting normalized with ruff).

An unrecognized `method` was raised as a bare ValueError and caught by the generic handler, which mapped it to -32603 ("Internal error"). Per JSON-RPC 2.0 that is a client error, not a server fault: well-behaved clients could not tell "you called something that does not exist" apart from "the server broke." Introduce MethodNotFoundError and map it to -32601 ("Method not found") in handle_request; everything else still maps to -32603. Also aligns core/mcp_server.py with server/http_handler.py, which already uses -32601. Found while smoke-testing the prod MCP server after the log-group tagging change; not a regression from that work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

brendanbabb and others added 4 commits May 1, 2026 15:04

brendanbabb merged commit d43a042 into main May 14, 2026
2 checks passed

brendanbabb deleted the boston/security-hardening-and-aws-docs branch May 14, 2026 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log-group tagging, CKAN civic-AI caveats, and JSON-RPC error-code fix#2

Log-group tagging, CKAN civic-AI caveats, and JSON-RPC error-code fix#2
brendanbabb merged 4 commits into
mainfrom
boston/security-hardening-and-aws-docs

brendanbabb commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brendanbabb commented May 14, 2026

Validation

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant