Skip to content

Commit e9479ca

Browse files
authored
Merge pull request #1 from codeforanchorage/boston/security-hardening-and-aws-docs
Boston fork: security hardening, upstream-portal protection, AWS hosting docs
2 parents 087ff41 + a8e9fea commit e9479ca

23 files changed

Lines changed: 4218 additions & 268 deletions

.github/workflows/release.yml

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,16 @@ jobs:
112112
echo "Version: $VERSION"
113113
114114
- name: Install Python dependencies into package directory
115-
run: uv pip install --target ./package -r requirements.txt
115+
# Pin --python-version and --python-platform so wheel resolution is
116+
# independent of the runner's ambient interpreter and OS. Matches
117+
# the uv path in scripts/deploy.sh and the python3.11 runtime pinned
118+
# in terraform/aws/main.tf.
119+
run: |
120+
uv pip install -r requirements.txt \
121+
--target ./package \
122+
--python-platform x86_64-manylinux2014 \
123+
--python-version 3.11 \
124+
--no-compile
116125
117126
- name: Copy application code into package
118127
run: |
@@ -122,9 +131,22 @@ jobs:
122131
cp examples/boston-opendata/config.yaml package/config.yaml
123132
124133
- name: Create Lambda ZIP
134+
# Use Python stdlib zipfile to match scripts/deploy.sh and
135+
# .github/workflows/infra.yml — avoids depending on the `zip` binary.
125136
run: |
126137
cd package
127-
zip -r ../opencontext-lambda-${{ steps.get_version.outputs.version }}.zip .
138+
python - "../opencontext-lambda-${{ steps.get_version.outputs.version }}.zip" <<'PY'
139+
import os
140+
import sys
141+
import zipfile
142+
143+
zip_path = sys.argv[1]
144+
with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as z:
145+
for root, _, files in os.walk("."):
146+
for name in files:
147+
path = os.path.join(root, name)
148+
z.write(path, os.path.relpath(path, "."))
149+
PY
128150
129151
- name: Upload Lambda ZIP artifact
130152
uses: actions/upload-artifact@v4

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,10 @@ terraform/*/lambda-deployment.zip
217217
**/*.tfstate
218218
**/*.tfstate.*
219219

220+
# Terraform backend config — contains the deployer's AWS account ID in the
221+
# S3 bucket name. Each fork ships its own. See terraform/aws/backend.tf.example.
222+
terraform/aws/backend.tf
223+
220224
# OpenContext client binaries
221225
opencontext-client
222226
opencontext-client-*

CLAUDE.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Build & Development Commands
6+
7+
```bash
8+
# Install dependencies (uv preferred, pip fallback)
9+
uv sync # or: pip install -r requirements.txt
10+
11+
# Run local MCP server (no Lambda needed)
12+
python3 scripts/local_server.py # Serves on http://localhost:8000/mcp
13+
# Or: python3 local_server.py # Alternate entry point, serves on / and /mcp
14+
15+
# Validate config
16+
python3 -c "from core.validators import load_and_validate_config; load_and_validate_config('config.yaml')"
17+
18+
# Tests
19+
uv run pytest tests/ -n auto # All tests, parallel
20+
uv run pytest tests/test_ckan_plugin.py -v # Single file
21+
uv run pytest tests/test_ckan_plugin.py::TestClass::test_name -v # Single test
22+
uv run pytest tests/ --cov=core --cov=plugins --cov-report=term-missing # With coverage (80% minimum)
23+
24+
# Linting (ruff)
25+
uv run ruff check core/ plugins/ server/ tests/ # Check
26+
uv run ruff check core/ plugins/ server/ tests/ --fix # Auto-fix
27+
uv run ruff format core/ plugins/ server/ tests/ # Format
28+
29+
# Pre-commit hooks
30+
pre-commit run --all-files
31+
32+
# Go client (requires Go 1.21+)
33+
cd client && make build
34+
35+
# Deploy to AWS
36+
./scripts/deploy.sh --environment staging
37+
```
38+
39+
## Architecture
40+
41+
**Core rule: One Fork = One MCP Server.** Each deployment runs exactly ONE plugin. This is enforced at config validation time (`core/validators.py`) and at runtime (`PluginManager.load_plugins()`). To deploy multiple MCP servers, fork the repo per plugin.
42+
43+
**Request flow:**
44+
```
45+
Claude (stdio) → Go client (client/) or stdio_bridge.py → HTTP POST /mcp
46+
→ Lambda (server/adapters/aws_lambda.py) or local_server.py
47+
→ server/http_handler.py → core/mcp_server.py (JSON-RPC 2.0)
48+
→ core/plugin_manager.py → Plugin → External API
49+
```
50+
51+
**Key modules:**
52+
- `core/interfaces.py` — Abstract bases: `MCPPlugin`, `DataPlugin`, plus `ToolDefinition`, `ToolResult`, `PluginType` enum
53+
- `core/plugin_manager.py` — Discovers plugins by scanning `plugins/` and `custom_plugins/` for `plugin.py` files. Registers tools with `pluginname__toolname` prefix. Routes `tools/call` to the correct plugin.
54+
- `core/mcp_server.py` — Handles MCP JSON-RPC methods: `initialize`, `tools/list`, `tools/call`, `ping`
55+
- `core/validators.py` — Loads config from `config.yaml` (local) or `OPENCONTEXT_CONFIG` env var (Lambda). Enforces single-plugin rule.
56+
- `server/adapters/aws_lambda.py` — AWS Lambda entry point (handler: `server.adapters.aws_lambda.lambda_handler`). Also `server/lambda_handler.py` as legacy entry point.
57+
- `server/http_handler.py` — Cloud-agnostic HTTP handler shared by Lambda and local server
58+
- `stdio_bridge.py` — Python stdio-to-HTTP bridge for connecting Claude Desktop/Code to the local server (alternative to Go client)
59+
60+
**Built-in plugins** (`plugins/`): `ckan`, `arcgis`, `socrata` — each implements `DataPlugin` with `search_datasets`, `get_dataset`, `query_data`. Custom plugins go in `custom_plugins/` and are auto-discovered.
61+
62+
## Plugin Development
63+
64+
New plugins must implement `MCPPlugin` (or `DataPlugin` for data sources). Place in `custom_plugins/<name>/plugin.py`. The class must define `plugin_name`, `plugin_type`, `plugin_version` and implement `initialize()`, `shutdown()`, `get_tools()`, `execute_tool()`, `health_check()`. Tool names are auto-prefixed — return bare names from `get_tools()`.
65+
66+
## Configuration
67+
68+
Copy `config-example.yaml` to `config.yaml`. Enable exactly one plugin. Config supports `${ENV_VAR}` substitution. For Lambda, config is serialized to the `OPENCONTEXT_CONFIG` env var by Terraform.
69+
70+
## CI
71+
72+
GitHub Actions (`.github/workflows/ci.yml`) runs ruff lint/format, pip-audit, pytest with coverage, and Go tests on push to main/develop and on PRs.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ See [Getting Started](docs/GETTING_STARTED.md) for full setup.
4545
| [Getting Started](docs/GETTING_STARTED.md) | Setup and usage |
4646
| [Architecture](docs/ARCHITECTURE.md) | System design and plugins |
4747
| [Deployment](docs/DEPLOYMENT.md) | AWS, Terraform, monitoring |
48+
| [AWS Deployment (Boston)](docs/AWS_DEPLOYMENT.md) | Boston fork: region, concurrency, domain, packaging |
49+
| [Security](docs/SECURITY.md) | SQL hardening, rate limits, upstream-portal protection |
4850
| [Testing](docs/TESTING.md) | Local testing (Terminal, Claude, MCP Inspector) |
4951

5052

config.yaml

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,28 @@
1-
examples/dc-arcgis/config.yaml
1+
---
2+
server_name: "Boston OpenData MCP"
3+
description: "City of Boston open data MCP server - Safe, conversational access to Boston's open data"
4+
organization: "City of Boston"
5+
6+
plugins:
7+
ckan:
8+
enabled: true
9+
base_url: "https://data.boston.gov/"
10+
portal_url: "https://data.boston.gov/"
11+
city_name: "Boston"
12+
timeout: 120
13+
14+
arcgis:
15+
enabled: false
16+
portal_url: "https://data-boston.hub.arcgis.com"
17+
city_name: "Boston"
18+
timeout: 120
19+
20+
aws:
21+
region: "us-west-2"
22+
lambda_name: "boston-ckan-mcp-staging"
23+
lambda_memory: 512
24+
lambda_timeout: 120
25+
26+
logging:
27+
level: "INFO"
28+
format: "json"

docs/AWS_DEPLOYMENT.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# AWS Deployment (Boston fork)
2+
3+
This document describes how the Boston-specific deployment of OpenContext is hosted on AWS, what changed relative to the upstream defaults, and how to operate the stack. It complements [DEPLOYMENT.md](DEPLOYMENT.md), which covers the upstream single-Lambda/API-Gateway architecture.
4+
5+
- **Public endpoint (prod):** `https://boston-data.codeforanchorage.org`
6+
- **Upstream data source:** Boston CKAN portal at `https://data.boston.gov/`
7+
- **Runtime:** AWS Lambda (Python 3.11) behind API Gateway, us-west-2
8+
9+
> **Design constraint:** this fork's top operational priority is **not overwhelming `data.boston.gov`**. It is a shared civic resource, not our infrastructure. Every defensive control below — reserved Lambda concurrency, API Gateway rate limits and daily quota, enforced `LIMIT` on SQL, clamped aggregation limits, body-size caps — exists to keep this MCP server from becoming the noisiest client on that portal. See [SECURITY.md §1](SECURITY.md#1-protecting-the-upstream-data-portal) for the full rationale.
10+
11+
---
12+
13+
## 1. What changed in this fork
14+
15+
The upstream deployment assumes a single-region (us-east-1) Lambda with a standard rate-limited API Gateway in front of it. This fork makes the following operational changes:
16+
17+
### 1.1 Region moved to us-west-2
18+
19+
Terraform variables and the deploy script default to `us-west-2`:
20+
21+
- `terraform/aws/prod.tfvars`, `terraform/aws/staging.tfvars`: `aws_region = "us-west-2"`
22+
- `config.yaml`: `aws.region: "us-west-2"`
23+
24+
The move is for co-location with other Code for Anchorage infrastructure and has no functional effect on the Lambda. Cost numbers in [DEPLOYMENT.md](DEPLOYMENT.md#cost-us-east-1) still apply; us-west-2 pricing is effectively identical for Lambda and API Gateway.
25+
26+
### 1.2 Terraform backend extracted and renamed
27+
28+
The upstream `main.tf` hard-coded an `opencontext-terraform-state` bucket in us-east-1. This fork moves the backend into its own file so the bootstrap account+region+bucket are explicit, and renames the bucket to the convention used by `scripts/setup-backend.sh`:
29+
30+
`terraform/aws/backend.tf` (new file):
31+
32+
```hcl
33+
terraform {
34+
backend "s3" {
35+
bucket = "boston-opencontext-tfstate-<AWS_ACCOUNT_ID>-us-west-2"
36+
key = "terraform.tfstate"
37+
region = "us-west-2"
38+
dynamodb_table = "terraform-state-lock"
39+
encrypt = true
40+
}
41+
}
42+
```
43+
44+
The actual `backend.tf` in this repo hardcodes the Code for Anchorage AWS account ID — Terraform cannot interpolate variables into a backend block, so the literal value has to live in the file. A DynamoDB table (`terraform-state-lock`) is used for state locking — forked deployments should run `scripts/setup-backend.sh` to create both the bucket and the lock table, update the account ID in `backend.tf`, then `terraform init` against `terraform/aws/`.
45+
46+
### 1.3 Reserved Lambda concurrency
47+
48+
A new `lambda_reserved_concurrency` variable caps the number of concurrent Lambda invocations. Default is **10**, set in both staging and prod `.tfvars`.
49+
50+
```hcl
51+
# terraform/aws/variables.tf
52+
variable "lambda_reserved_concurrency" {
53+
default = 10
54+
}
55+
```
56+
57+
This serves two purposes. The first is cost containment: a surprise traffic spike can't run the bill away. The second, more important one, is **protecting the upstream open-data portal**. Boston's CKAN portal is a shared civic resource; if a misbehaving client fans out into thousands of parallel SQL queries, reserved concurrency bounds how much of that load we can relay. See [SECURITY.md](SECURITY.md#3-upstream-portal-protection) for the full threat model.
58+
59+
Set to `-1` to disable the cap (fall back to the account-wide concurrency limit). Don't do this in prod without a reason.
60+
61+
### 1.4 API Gateway quota raised, rate limits unchanged
62+
63+
```
64+
api_quota_limit = 3000 # was 1000 upstream
65+
api_rate_limit = 5 # unchanged — sustained req/s
66+
api_burst_limit = 10 # unchanged — burst req/s
67+
```
68+
69+
The daily quota was raised to 3000 after staging traffic showed legitimate per-connector usage (tool discovery + a handful of queries per conversation) could brush against 1000/day for a single user. The per-second rate is kept low deliberately — see [SECURITY.md §2](SECURITY.md#2-rate-limiting-and-body-size).
70+
71+
### 1.5 Custom domain
72+
73+
Prod now fronts the API Gateway with an ACM cert and the custom domain `boston-data.codeforanchorage.org`. Staging has no custom domain (`custom_domain = ""`) — use the raw API Gateway URL from `terraform output`.
74+
75+
### 1.6 Cross-platform, 3.11-pinned packaging
76+
77+
Both `scripts/deploy.sh` and `.github/workflows/release.yml` were updated so the Lambda ZIP matches the runtime regardless of the build host.
78+
79+
- Detects `python3` or falls back to `python` (Windows build hosts).
80+
- Forces cp311 manylinux wheels on every dependency install:
81+
```bash
82+
pip install -r requirements.txt -t ./package \
83+
--platform manylinux2014_x86_64 \
84+
--python-version 3.11 \
85+
--implementation cp \
86+
--abi cp311 \
87+
--only-binary :all: \
88+
--no-compile
89+
```
90+
Without the pin, a build host running Python 3.14 will pull cp314 wheels that fail to import at Lambda cold start with a 502 `InternalServerErrorException`.
91+
- Builds the ZIP with Python's stdlib `zipfile` module instead of the `zip` binary, which isn't present on every runner (notably the staging CI image and Windows).
92+
93+
### 1.7 `local_server.py` serves both `/` and `/mcp`
94+
95+
The Claude Desktop stdio bridge posts to `/mcp`; some earlier testing tools post to `/`. The local dev server now accepts both so you can point Claude Desktop and MCP Inspector at the same endpoint without editing routes.
96+
97+
### 1.8 Concrete Boston CKAN `config.yaml`
98+
99+
Upstream `config.yaml` is a symlink to the DC ArcGIS example. This fork replaces it with a concrete CKAN config targeting `data.boston.gov`. ArcGIS is kept `enabled: false` in the file for reference (Boston's ArcGIS hub at `data-boston.hub.arcgis.com` returns 401 without auth; CKAN is the public entry point).
100+
101+
```yaml
102+
plugins:
103+
ckan:
104+
enabled: true
105+
base_url: "https://data.boston.gov/"
106+
portal_url: "https://data.boston.gov/"
107+
city_name: "Boston"
108+
timeout: 120
109+
arcgis:
110+
enabled: false
111+
```
112+
113+
---
114+
115+
## 2. Operator reference
116+
117+
### 2.1 First-time bootstrap
118+
119+
```bash
120+
# 1. Create the state bucket + lock table (once per account/region)
121+
export AWS_REGION=us-west-2
122+
./scripts/setup-backend.sh
123+
124+
# 2. Initialize Terraform against the S3 backend
125+
cd terraform/aws
126+
terraform init
127+
```
128+
129+
### 2.2 Deploying changes
130+
131+
The deploy script validates `config.yaml`, builds a cp311/manylinux Lambda ZIP, and runs `terraform apply`:
132+
133+
```bash
134+
# Staging
135+
./scripts/deploy.sh --environment staging
136+
137+
# Prod
138+
./scripts/deploy.sh --environment prod
139+
```
140+
141+
Under the hood:
142+
143+
1. Counts enabled plugins (must be exactly one — enforced by `core/validators.py`).
144+
2. Builds `lambda-deployment.zip` with dependencies forced to cp311 manylinux wheels.
145+
3. `terraform apply -var-file=<env>.tfvars` against `terraform/aws/`.
146+
147+
### 2.3 Environment configuration
148+
149+
| Variable | Staging | Prod |
150+
| ------------------------------- | ---------------------------- | ------------------------------------------ |
151+
| `lambda_name` | `boston-ckan-mcp-staging` | `boston-opencontext-mcp-prod` |
152+
| `aws_region` | `us-west-2` | `us-west-2` |
153+
| `lambda_memory` | 512 MB | 512 MB |
154+
| `lambda_timeout` | 120 s | 120 s |
155+
| `lambda_reserved_concurrency` | 10 | 10 |
156+
| `api_quota_limit` | 3000 / day | 3000 / day |
157+
| `api_rate_limit` / `burst` | 5 / 10 req/s | 5 / 10 req/s |
158+
| `custom_domain` | *(none)* | `boston-data.codeforanchorage.org` |
159+
160+
### 2.4 Getting the endpoint URL
161+
162+
```bash
163+
cd terraform/aws
164+
terraform output -raw api_gateway_url # Custom domain on prod, exec-api URL on staging
165+
```
166+
167+
### 2.5 Monitoring
168+
169+
CloudWatch log group `/aws/lambda/<lambda_name>`, 14-day retention. Logs are JSON-structured (`logging.format: json` in `config.yaml`) and include a `request_id` field you can join against API Gateway access logs.
170+
171+
```bash
172+
aws logs tail /aws/lambda/boston-opencontext-mcp-prod --follow --region us-west-2
173+
```
174+
175+
### 2.6 Cost
176+
177+
Expected steady-state cost at current quota is well under \$5/month: at 3000 requests/day × 30 days × 512 MB × ~1 s, Lambda runs roughly \$1–2/month. API Gateway REST API adds ~\$3.50 per million requests; at 100k/month that is ~\$0.35. Route 53 hosted zone + ACM cert are the fixed floor (~\$0.50/month).
178+
179+
---
180+
181+
## 3. Known limitations
182+
183+
- **Single-region, single-AZ.** No failover. Fine for a civic-data read proxy; not for critical services.
184+
- **Reserved concurrency is a fuse, not a queue.** Beyond 10 in-flight requests, API Gateway returns 429. Clients must retry with backoff.
185+
- **ArcGIS plugin is disabled.** Enabling it requires an authenticated portal; Boston's hub returns 401 without auth.

0 commit comments

Comments
 (0)