|
| 1 | +# AWS Deployment (Boston fork) |
| 2 | + |
| 3 | +This document describes how the Boston-specific deployment of OpenContext is hosted on AWS, what changed relative to the upstream defaults, and how to operate the stack. It complements [DEPLOYMENT.md](DEPLOYMENT.md), which covers the upstream single-Lambda/API-Gateway architecture. |
| 4 | + |
| 5 | +- **Public endpoint (prod):** `https://boston-data.codeforanchorage.org` |
| 6 | +- **Upstream data source:** Boston CKAN portal at `https://data.boston.gov/` |
| 7 | +- **Runtime:** AWS Lambda (Python 3.11) behind API Gateway, us-west-2 |
| 8 | + |
| 9 | +> **Design constraint:** this fork's top operational priority is **not overwhelming `data.boston.gov`**. It is a shared civic resource, not our infrastructure. Every defensive control below — reserved Lambda concurrency, API Gateway rate limits and daily quota, enforced `LIMIT` on SQL, clamped aggregation limits, body-size caps — exists to keep this MCP server from becoming the noisiest client on that portal. See [SECURITY.md §1](SECURITY.md#1-protecting-the-upstream-data-portal) for the full rationale. |
| 10 | +
|
| 11 | +--- |
| 12 | + |
| 13 | +## 1. What changed in this fork |
| 14 | + |
| 15 | +The upstream deployment assumes a single-region (us-east-1) Lambda with a standard rate-limited API Gateway in front of it. This fork makes the following operational changes: |
| 16 | + |
| 17 | +### 1.1 Region moved to us-west-2 |
| 18 | + |
| 19 | +Terraform variables and the deploy script default to `us-west-2`: |
| 20 | + |
| 21 | +- `terraform/aws/prod.tfvars`, `terraform/aws/staging.tfvars`: `aws_region = "us-west-2"` |
| 22 | +- `config.yaml`: `aws.region: "us-west-2"` |
| 23 | + |
| 24 | +The move is for co-location with other Code for Anchorage infrastructure and has no functional effect on the Lambda. Cost numbers in [DEPLOYMENT.md](DEPLOYMENT.md#cost-us-east-1) still apply; us-west-2 pricing is effectively identical for Lambda and API Gateway. |
| 25 | + |
| 26 | +### 1.2 Terraform backend extracted and renamed |
| 27 | + |
| 28 | +The upstream `main.tf` hard-coded an `opencontext-terraform-state` bucket in us-east-1. This fork moves the backend into its own file so the bootstrap account+region+bucket are explicit, and renames the bucket to the convention used by `scripts/setup-backend.sh`: |
| 29 | + |
| 30 | +`terraform/aws/backend.tf` (new file): |
| 31 | + |
| 32 | +```hcl |
| 33 | +terraform { |
| 34 | + backend "s3" { |
| 35 | + bucket = "boston-opencontext-tfstate-<AWS_ACCOUNT_ID>-us-west-2" |
| 36 | + key = "terraform.tfstate" |
| 37 | + region = "us-west-2" |
| 38 | + dynamodb_table = "terraform-state-lock" |
| 39 | + encrypt = true |
| 40 | + } |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +The actual `backend.tf` in this repo hardcodes the Code for Anchorage AWS account ID — Terraform cannot interpolate variables into a backend block, so the literal value has to live in the file. A DynamoDB table (`terraform-state-lock`) is used for state locking — forked deployments should run `scripts/setup-backend.sh` to create both the bucket and the lock table, update the account ID in `backend.tf`, then `terraform init` against `terraform/aws/`. |
| 45 | + |
| 46 | +### 1.3 Reserved Lambda concurrency |
| 47 | + |
| 48 | +A new `lambda_reserved_concurrency` variable caps the number of concurrent Lambda invocations. Default is **10**, set in both staging and prod `.tfvars`. |
| 49 | + |
| 50 | +```hcl |
| 51 | +# terraform/aws/variables.tf |
| 52 | +variable "lambda_reserved_concurrency" { |
| 53 | + default = 10 |
| 54 | +} |
| 55 | +``` |
| 56 | + |
| 57 | +This serves two purposes. The first is cost containment: a surprise traffic spike can't run the bill away. The second, more important one, is **protecting the upstream open-data portal**. Boston's CKAN portal is a shared civic resource; if a misbehaving client fans out into thousands of parallel SQL queries, reserved concurrency bounds how much of that load we can relay. See [SECURITY.md](SECURITY.md#3-upstream-portal-protection) for the full threat model. |
| 58 | + |
| 59 | +Set to `-1` to disable the cap (fall back to the account-wide concurrency limit). Don't do this in prod without a reason. |
| 60 | + |
| 61 | +### 1.4 API Gateway quota raised, rate limits unchanged |
| 62 | + |
| 63 | +``` |
| 64 | +api_quota_limit = 3000 # was 1000 upstream |
| 65 | +api_rate_limit = 5 # unchanged — sustained req/s |
| 66 | +api_burst_limit = 10 # unchanged — burst req/s |
| 67 | +``` |
| 68 | + |
| 69 | +The daily quota was raised to 3000 after staging traffic showed legitimate per-connector usage (tool discovery + a handful of queries per conversation) could brush against 1000/day for a single user. The per-second rate is kept low deliberately — see [SECURITY.md §2](SECURITY.md#2-rate-limiting-and-body-size). |
| 70 | + |
| 71 | +### 1.5 Custom domain |
| 72 | + |
| 73 | +Prod now fronts the API Gateway with an ACM cert and the custom domain `boston-data.codeforanchorage.org`. Staging has no custom domain (`custom_domain = ""`) — use the raw API Gateway URL from `terraform output`. |
| 74 | + |
| 75 | +### 1.6 Cross-platform, 3.11-pinned packaging |
| 76 | + |
| 77 | +Both `scripts/deploy.sh` and `.github/workflows/release.yml` were updated so the Lambda ZIP matches the runtime regardless of the build host. |
| 78 | + |
| 79 | +- Detects `python3` or falls back to `python` (Windows build hosts). |
| 80 | +- Forces cp311 manylinux wheels on every dependency install: |
| 81 | + ```bash |
| 82 | + pip install -r requirements.txt -t ./package \ |
| 83 | + --platform manylinux2014_x86_64 \ |
| 84 | + --python-version 3.11 \ |
| 85 | + --implementation cp \ |
| 86 | + --abi cp311 \ |
| 87 | + --only-binary :all: \ |
| 88 | + --no-compile |
| 89 | + ``` |
| 90 | + Without the pin, a build host running Python 3.14 will pull cp314 wheels that fail to import at Lambda cold start with a 502 `InternalServerErrorException`. |
| 91 | +- Builds the ZIP with Python's stdlib `zipfile` module instead of the `zip` binary, which isn't present on every runner (notably the staging CI image and Windows). |
| 92 | + |
| 93 | +### 1.7 `local_server.py` serves both `/` and `/mcp` |
| 94 | + |
| 95 | +The Claude Desktop stdio bridge posts to `/mcp`; some earlier testing tools post to `/`. The local dev server now accepts both so you can point Claude Desktop and MCP Inspector at the same endpoint without editing routes. |
| 96 | + |
| 97 | +### 1.8 Concrete Boston CKAN `config.yaml` |
| 98 | + |
| 99 | +Upstream `config.yaml` is a symlink to the DC ArcGIS example. This fork replaces it with a concrete CKAN config targeting `data.boston.gov`. ArcGIS is kept `enabled: false` in the file for reference (Boston's ArcGIS hub at `data-boston.hub.arcgis.com` returns 401 without auth; CKAN is the public entry point). |
| 100 | + |
| 101 | +```yaml |
| 102 | +plugins: |
| 103 | + ckan: |
| 104 | + enabled: true |
| 105 | + base_url: "https://data.boston.gov/" |
| 106 | + portal_url: "https://data.boston.gov/" |
| 107 | + city_name: "Boston" |
| 108 | + timeout: 120 |
| 109 | + arcgis: |
| 110 | + enabled: false |
| 111 | +``` |
| 112 | +
|
| 113 | +--- |
| 114 | +
|
| 115 | +## 2. Operator reference |
| 116 | +
|
| 117 | +### 2.1 First-time bootstrap |
| 118 | +
|
| 119 | +```bash |
| 120 | +# 1. Create the state bucket + lock table (once per account/region) |
| 121 | +export AWS_REGION=us-west-2 |
| 122 | +./scripts/setup-backend.sh |
| 123 | + |
| 124 | +# 2. Initialize Terraform against the S3 backend |
| 125 | +cd terraform/aws |
| 126 | +terraform init |
| 127 | +``` |
| 128 | + |
| 129 | +### 2.2 Deploying changes |
| 130 | + |
| 131 | +The deploy script validates `config.yaml`, builds a cp311/manylinux Lambda ZIP, and runs `terraform apply`: |
| 132 | + |
| 133 | +```bash |
| 134 | +# Staging |
| 135 | +./scripts/deploy.sh --environment staging |
| 136 | + |
| 137 | +# Prod |
| 138 | +./scripts/deploy.sh --environment prod |
| 139 | +``` |
| 140 | + |
| 141 | +Under the hood: |
| 142 | + |
| 143 | +1. Counts enabled plugins (must be exactly one — enforced by `core/validators.py`). |
| 144 | +2. Builds `lambda-deployment.zip` with dependencies forced to cp311 manylinux wheels. |
| 145 | +3. `terraform apply -var-file=<env>.tfvars` against `terraform/aws/`. |
| 146 | + |
| 147 | +### 2.3 Environment configuration |
| 148 | + |
| 149 | +| Variable | Staging | Prod | |
| 150 | +| ------------------------------- | ---------------------------- | ------------------------------------------ | |
| 151 | +| `lambda_name` | `boston-ckan-mcp-staging` | `boston-opencontext-mcp-prod` | |
| 152 | +| `aws_region` | `us-west-2` | `us-west-2` | |
| 153 | +| `lambda_memory` | 512 MB | 512 MB | |
| 154 | +| `lambda_timeout` | 120 s | 120 s | |
| 155 | +| `lambda_reserved_concurrency` | 10 | 10 | |
| 156 | +| `api_quota_limit` | 3000 / day | 3000 / day | |
| 157 | +| `api_rate_limit` / `burst` | 5 / 10 req/s | 5 / 10 req/s | |
| 158 | +| `custom_domain` | *(none)* | `boston-data.codeforanchorage.org` | |
| 159 | + |
| 160 | +### 2.4 Getting the endpoint URL |
| 161 | + |
| 162 | +```bash |
| 163 | +cd terraform/aws |
| 164 | +terraform output -raw api_gateway_url # Custom domain on prod, exec-api URL on staging |
| 165 | +``` |
| 166 | + |
| 167 | +### 2.5 Monitoring |
| 168 | + |
| 169 | +CloudWatch log group `/aws/lambda/<lambda_name>`, 14-day retention. Logs are JSON-structured (`logging.format: json` in `config.yaml`) and include a `request_id` field you can join against API Gateway access logs. |
| 170 | + |
| 171 | +```bash |
| 172 | +aws logs tail /aws/lambda/boston-opencontext-mcp-prod --follow --region us-west-2 |
| 173 | +``` |
| 174 | + |
| 175 | +### 2.6 Cost |
| 176 | + |
| 177 | +Expected steady-state cost at current quota is well under \$5/month: at 3000 requests/day × 30 days × 512 MB × ~1 s, Lambda runs roughly \$1–2/month. API Gateway REST API adds ~\$3.50 per million requests; at 100k/month that is ~\$0.35. Route 53 hosted zone + ACM cert are the fixed floor (~\$0.50/month). |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## 3. Known limitations |
| 182 | + |
| 183 | +- **Single-region, single-AZ.** No failover. Fine for a civic-data read proxy; not for critical services. |
| 184 | +- **Reserved concurrency is a fuse, not a queue.** Beyond 10 in-flight requests, API Gateway returns 429. Clients must retry with backoff. |
| 185 | +- **ArcGIS plugin is disabled.** Enabling it requires an authenticated portal; Boston's hub returns 401 without auth. |
0 commit comments