Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
97bb2e4
Fix indentation bug in flatten_variables_from_household
MaxGhenis Apr 17, 2026
560a6f6
Tighten rate limit on public /calculate_demo endpoint
MaxGhenis Apr 17, 2026
0192258
Fix sqlite URI and protect analytics DB from wipe on init
MaxGhenis Apr 17, 2026
d5a8af1
Restrict CORS to PolicyEngine origins by default
MaxGhenis Apr 17, 2026
5249862
Replace invalid ConnectionError kwargs with GCPError
MaxGhenis Apr 17, 2026
a9289b8
Stop coercing "0"/"1" env vars to booleans
MaxGhenis Apr 17, 2026
a5c1dc2
Verify JWT signature before trusting sub claim in analytics
MaxGhenis Apr 17, 2026
57bf8fb
Re-raise tracer failures instead of implicit None return
MaxGhenis Apr 17, 2026
060ac52
Validate /calculate payloads and cap axes scans
MaxGhenis Apr 17, 2026
664896a
Time-bound and lazy-load the Auth0 JWKS fetch
MaxGhenis Apr 17, 2026
fc0112e
Use dpath.search instead of deprecated dpath.util.search
MaxGhenis Apr 17, 2026
a21e427
Add changelog entry for bug-audit batch
MaxGhenis Apr 17, 2026
8781260
Anchor default CORS regex with $ to block suffix-match bypass
MaxGhenis Apr 17, 2026
55c3ab2
Cache JWKS successes only so the lazy retry actually retries
MaxGhenis Apr 17, 2026
b3e2cd8
Align analytics error path to NULL client_id
MaxGhenis Apr 17, 2026
51176b2
Drop backward-looking "previously..." comments
MaxGhenis Apr 17, 2026
559569d
Require people on HouseholdModel and cap JSON body size
MaxGhenis Apr 17, 2026
bc4d117
Cover zero-period variable in flatten_variables_from_household
MaxGhenis Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .env-example
Original file line number Diff line number Diff line change
@@ -1,2 +1,16 @@
FLASK_DEBUG=1
CACHE_REDIS_HOST=redis
CACHE_REDIS_HOST=redis

# Optional: wipe the local sqlite analytics DB on startup. Only
# consulted when FLASK_DEBUG=1 and analytics is enabled. Default off
# so captured debug data is not lost across restarts.
# RESET_ANALYTICS=1

# Optional: comma-separated list of origins (or regex patterns) allowed
# by CORS. If unset, the default allowlist is:
# - https://policyengine.org
# - https://*.policyengine.org (anchored regex)
# - http://localhost[:port] (any port)
# - http://127.0.0.1[:port]
# Example override:
# CORS_ALLOWED_ORIGINS=https://foo.example.com,https://bar.example.com
14 changes: 14 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
- bump: patch
changes:
fixed:
- Flatten every (entity, variable, period) triple in flatten_variables_from_household (#1462).
- Tighten /calculate_demo rate limit from 1/second to 1/10 seconds (#1463).
- Stop unconditionally wiping the analytics SQLite DB and fix the sqlite:// URI (#1464).
- Restrict CORS to PolicyEngine origins by default, anchored so attacker subdomains can't bypass (#1465).
- Replace invalid ConnectionError(description=...) with a GCPError class (#1466).
- Keep "0"/"1" env-var values as integers instead of collapsing to False/True (#1467).
- Verify JWT signatures in the analytics decorator and drop datetime.utcnow (#1468).
- Re-raise tracer failures in PolicyEngineCountry.calculate so the endpoint can return a real 500 (#1469).
- Validate /calculate payloads and cap axes scans; add per-endpoint rate limit (#1470).
- Time-bound and lazy-load the Auth0 JWKS fetch so a startup outage doesn't crash the API, caching only successes so the lazy retry actually retries (#1471).
- Replace deprecated dpath.util.search with dpath.search (#1472).
33 changes: 33 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,39 @@ The following endpoints remain unprotected:
- When enabled, all protected endpoints validate JWT tokens against Auth0's JWKS
- The Auth0 domain and audience must match the configured values

## CORS Configuration

Browsers enforce CORS against the API. The default allowlist accepts:

- `https://policyengine.org`
- Any `https://*.policyengine.org` host (anchored regex)
- `http://localhost` on any port (dev servers)
- `http://127.0.0.1` on any port

Override with `CORS_ALLOWED_ORIGINS` (comma-separated strings or
regexes) or `cors.allowed_origins` in YAML:

```bash
CORS_ALLOWED_ORIGINS=https://app.example.com,https://admin.example.com
```

```yaml
cors:
allowed_origins:
- https://app.example.com
- 'https://.*\.example\.com$'
```

Always terminate regex patterns with `$` — Flask-CORS matches with
`re.match`, so an unanchored pattern like `https://.*\.example\.com`
would accept `https://example.com.attacker.com`.

## Analytics reset (debug only)

`RESET_ANALYTICS=1` (or `analytics.reset: true` in YAML) wipes the
local SQLite analytics DB on startup. This is **only** consulted when
`FLASK_DEBUG=1`; production never resets the analytics DB.

## Usage Examples

### Production Deployment (Current)
Expand Down
8 changes: 8 additions & 0 deletions config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,11 @@ ai:
enabled: false
anthropic:
api_key: "" # Override with ANTHROPIC_API_KEY

# CORS configuration
cors:
# List of allowed origins (strings or regex patterns). If left null
# the API defaults to PolicyEngine production domains. Override with
# CORS_ALLOWED_ORIGINS (comma-separated) in environments that serve
# additional frontends.
allowed_origins: null
54 changes: 52 additions & 2 deletions policyengine_household_api/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from policyengine_household_api.decorators.analytics import (
log_analytics_if_enabled,
)
from policyengine_household_api.utils.config_loader import get_config_value

# Endpoints
from .endpoints import (
Expand All @@ -40,7 +41,52 @@

app = application = flask.Flask(__name__)

CORS(app)
# Reject absurdly large request bodies before any view runs. 10 MiB is
# well above the largest legitimate household payload we have seen
# (axes scans push a few hundred KiB) while still capping the memory a
# single attacker can force us to allocate. Overridable via the
# ``MAX_CONTENT_LENGTH`` env var (bytes).
app.config["MAX_CONTENT_LENGTH"] = int(
os.getenv("MAX_CONTENT_LENGTH", 10 * 1024 * 1024)
)


def _resolve_cors_origins():
"""
Resolve the CORS allowed origins list.

Priority:
1. CORS_ALLOWED_ORIGINS env var (comma-separated list)
2. config value "cors.allowed_origins" (list or comma string)
3. Safe default: the PolicyEngine production domains

Use regex patterns so that wildcard subdomains work with
Flask-CORS's `origins` kwarg.
"""
raw = os.getenv("CORS_ALLOWED_ORIGINS") or get_config_value(
"cors.allowed_origins", None
)

if raw is None:
# Flask-CORS uses re.match, which is a prefix match; anchor with
# ``$`` so a hostile host like ``policyengine.org.attacker.com``
# cannot satisfy the wildcard pattern. Include ``localhost:*``
# so local dev servers can hit the API without extra setup.
origins = [
"https://policyengine.org",
r"https://.*\.policyengine\.org$",
r"http://localhost(:[0-9]+)?$",
r"http://127\.0\.0\.1(:[0-9]+)?$",
]
elif isinstance(raw, str):
origins = [o.strip() for o in raw.split(",") if o.strip()]
else:
origins = list(raw)

return origins


CORS(app, origins=_resolve_cors_origins())

# Use in-memory storage for rate limiting
# Note that this provides limits per-instance;
Expand All @@ -59,6 +105,7 @@

@app.route("/<country_id>/calculate", methods=["POST"])
@require_auth_if_enabled()
@limiter.limit("60 per minute")
@log_analytics_if_enabled
def calculate(country_id):
return get_calculate(country_id)
Expand All @@ -84,8 +131,11 @@ def readiness_check():
)


# Note: `/calculate_demo` is intentionally public (documented in
# config/README.md). It is guarded by a conservative rate limit rather
# than JWT authentication.
@app.route("/<country_id>/calculate_demo", methods=["POST"])
@limiter.limit("1 per second")
@limiter.limit("1 per 10 seconds")
def calculate_demo(country_id):
return get_calculate(country_id)

Expand Down
100 changes: 98 additions & 2 deletions policyengine_household_api/auth/validation.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,114 @@
import json
import logging
import time
from threading import Lock
from urllib.request import urlopen

from authlib.oauth2.rfc7523 import JWTBearerTokenValidator
from authlib.jose.rfc7517.jwk import JsonWebKey

logger = logging.getLogger(__name__)

JWKS_FETCH_TIMEOUT = 10 # seconds
# Minimum wait between back-to-back lazy retries after a failure.
# Keeps us from hammering Auth0 when it is actively degraded.
JWKS_RETRY_INTERVAL_SECONDS = 30


# Module-level cache of successful JWKS fetches, keyed by issuer. Only
# successes are cached so that a transient failure is retried on the
# next authenticated request (``lru_cache`` would have memoised the
# ``None`` return, making the "lazy retry" dead code).
_jwks_cache: dict = {}
# Records the monotonic timestamp of the most recent *failed* fetch
# per-issuer so we can rate-limit retries without caching the failure
# itself.
_jwks_last_failure: dict = {}
_jwks_lock = Lock()


def _fetch_jwks_uncached(issuer: str):
"""Fetch the JWKS for an Auth0 issuer, bypassing the cache.

Returns an authlib key set on success, ``None`` on failure. Errors
are logged rather than raised so that a transient Auth0 outage
doesn't crash the process at import time.
"""
jwks_url = f"{issuer}.well-known/jwks.json"
try:
with urlopen(jwks_url, timeout=JWKS_FETCH_TIMEOUT) as response:
return JsonWebKey.import_key_set(json.loads(response.read()))
except Exception as e:
logger.warning(f"Failed to fetch JWKS from {jwks_url}: {e}")
return None


def _fetch_jwks(issuer: str):
"""Fetch JWKS, caching only successful results.

On failure we record the time but do not memoise the ``None`` — a
later call will retry (subject to ``JWKS_RETRY_INTERVAL_SECONDS``
backoff) so that the validator self-heals once Auth0 recovers.
"""
with _jwks_lock:
cached = _jwks_cache.get(issuer)
if cached is not None:
return cached
last_failure = _jwks_last_failure.get(issuer)
if (
last_failure is not None
and time.monotonic() - last_failure < JWKS_RETRY_INTERVAL_SECONDS
):
# Too soon after the last failure — don't hammer Auth0.
return None

# Fetch outside the lock so a slow network call doesn't block
# other threads that might be serving requests with a cached key.
key_set = _fetch_jwks_uncached(issuer)

with _jwks_lock:
if key_set is not None:
_jwks_cache[issuer] = key_set
_jwks_last_failure.pop(issuer, None)
else:
_jwks_last_failure[issuer] = time.monotonic()
return key_set


def _clear_jwks_cache():
"""Test helper: wipe the success/failure caches."""
with _jwks_lock:
_jwks_cache.clear()
_jwks_last_failure.clear()


class Auth0JWTBearerTokenValidator(JWTBearerTokenValidator):
def __init__(self, domain, audience):
issuer = f"https://{domain}/"
jsonurl = urlopen(f"{issuer}.well-known/jwks.json")
public_key = JsonWebKey.import_key_set(json.loads(jsonurl.read()))

public_key = _fetch_jwks(issuer)
if public_key is None:
# Retry on next token validation rather than failing hard
# at construction time. A missing key set means token
# validation will fail cleanly inside authlib.
logger.warning(
"JWKS unavailable at construction; will retry on first "
"token validation."
)

super(Auth0JWTBearerTokenValidator, self).__init__(public_key)
self._issuer = issuer
self.claims_options = {
"exp": {"essential": True},
"aud": {"essential": True, "value": audience},
"iss": {"essential": True, "value": issuer},
}

def authenticate_token(self, token_string):
# Lazy-refresh the JWKS if the initial fetch failed. Because
# ``_fetch_jwks`` only caches successes, this call will retry
# the network fetch (subject to a short backoff) until Auth0
# responds.
if self.public_key is None:
self.public_key = _fetch_jwks(self._issuer)
return super().authenticate_token(token_string)
11 changes: 8 additions & 3 deletions policyengine_household_api/country.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import importlib
import logging
from flask import Response
import json
from policyengine_core.taxbenefitsystems import TaxBenefitSystem
Expand Down Expand Up @@ -432,8 +433,12 @@ def calculate(

return household, None

except Exception as e:
print(f"Error computing tracer output: {e}")
except Exception:
# Re-raise so endpoints/household.py (which unpacks
# ``(result, computation_tree_uuid)``) can surface a real
# 500 instead of a TypeError on ``None`` unpacking.
logging.exception("Tracer failed while computing household")
raise


def create_policy_reform(policy_data: dict) -> dict:
Expand Down Expand Up @@ -478,7 +483,7 @@ def apply(self):


def get_requested_computations(household: dict):
requested_computations = dpath.util.search(
requested_computations = dpath.search(
household,
"*/*/*/*",
afilter=lambda t: t is None,
Expand Down
14 changes: 12 additions & 2 deletions policyengine_household_api/data/analytics_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,21 @@ def initialize_analytics_db_if_enabled(app):
db_url = (
REPO / "policyengine_household_api" / "data" / "policyengine.db"
)
if Path(db_url).exists():
# Only wipe the analytics DB when explicitly requested via
# RESET_ANALYTICS=1 (or the ``analytics.reset`` config flag).
should_reset = os.getenv("RESET_ANALYTICS", "").lower() in (
"1",
"true",
"yes",
) or get_config_value("analytics.reset", False)
if should_reset and Path(db_url).exists():
Path(db_url).unlink()
if not Path(db_url).exists():
Path(db_url).touch()
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:////" + str(db_url)
# sqlite: absolute paths require exactly three slashes plus the
# leading "/" from the absolute path (=> "sqlite:////tmp/x.db").
# db_url here is already absolute, so use an f-string.
app.config["SQLALCHEMY_DATABASE_URI"] = f"sqlite:///{db_url}"
else:
app.config["SQLALCHEMY_DATABASE_URI"] = "mysql+pymysql://"
app.config["SQLALCHEMY_ENGINE_OPTIONS"] = {"creator": getconn}
Expand Down
Loading
Loading