Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 1 addition & 15 deletions .env-example
Original file line number Diff line number Diff line change
@@ -1,16 +1,2 @@
FLASK_DEBUG=1
CACHE_REDIS_HOST=redis

# Optional: wipe the local sqlite analytics DB on startup. Only
# consulted when FLASK_DEBUG=1 and analytics is enabled. Default off
# so captured debug data is not lost across restarts.
# RESET_ANALYTICS=1

# Optional: comma-separated list of origins (or regex patterns) allowed
# by CORS. If unset, the default allowlist is:
# - https://policyengine.org
# - https://*.policyengine.org (anchored regex)
# - http://localhost[:port] (any port)
# - http://127.0.0.1[:port]
# Example override:
# CORS_ALLOWED_ORIGINS=https://foo.example.com,https://bar.example.com
CACHE_REDIS_HOST=redis
24 changes: 0 additions & 24 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,6 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.13.15] - 2026-04-22 22:45:54

### Changed

- Update PolicyEngine US to 1.663.0

## [0.13.14] - 2026-04-17 16:45:19

### Fixed

- Flatten every (entity, variable, period) triple in flatten_variables_from_household (#1462).
- Tighten /calculate_demo rate limit from 1/second to 1/10 seconds (#1463).
- Stop unconditionally wiping the analytics SQLite DB and fix the sqlite:// URI (#1464).
- Restrict CORS to PolicyEngine origins by default, anchored so attacker subdomains can't bypass (#1465).
- Replace invalid ConnectionError(description=...) with a GCPError class (#1466).
- Keep "0"/"1" env-var values as integers instead of collapsing to False/True (#1467).
- Verify JWT signatures in the analytics decorator and drop datetime.utcnow (#1468).
- Re-raise tracer failures in PolicyEngineCountry.calculate so the endpoint can return a real 500 (#1469).
- Validate /calculate payloads and cap axes scans; add per-endpoint rate limit (#1470).
- Time-bound and lazy-load the Auth0 JWKS fetch so a startup outage doesn't crash the API, caching only successes so the lazy retry actually retries (#1471).
- Replace deprecated dpath.util.search with dpath.search (#1472).

## [0.13.13] - 2026-04-12 01:07:14

### Added
Expand Down Expand Up @@ -1701,8 +1679,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0



[0.13.15]: https://github.com/PolicyEngine/policyengine-household-api/compare/0.13.14...0.13.15
[0.13.14]: https://github.com/PolicyEngine/policyengine-household-api/compare/0.13.13...0.13.14
[0.13.13]: https://github.com/PolicyEngine/policyengine-household-api/compare/0.13.12...0.13.13
[0.13.12]: https://github.com/PolicyEngine/policyengine-household-api/compare/0.13.11...0.13.12
[0.13.11]: https://github.com/PolicyEngine/policyengine-household-api/compare/0.13.10...0.13.11
Expand Down
27 changes: 0 additions & 27 deletions changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1417,30 +1417,3 @@
added:
- Return PolicyEngine bundle metadata from household calculate responses.
date: 2026-04-12 01:07:14
- bump: patch
changes:
fixed:
- Flatten every (entity, variable, period) triple in flatten_variables_from_household
(#1462).
- Tighten /calculate_demo rate limit from 1/second to 1/10 seconds (#1463).
- Stop unconditionally wiping the analytics SQLite DB and fix the sqlite:// URI
(#1464).
- Restrict CORS to PolicyEngine origins by default, anchored so attacker subdomains
can't bypass (#1465).
- Replace invalid ConnectionError(description=...) with a GCPError class (#1466).
- Keep "0"/"1" env-var values as integers instead of collapsing to False/True
(#1467).
- Verify JWT signatures in the analytics decorator and drop datetime.utcnow (#1468).
- Re-raise tracer failures in PolicyEngineCountry.calculate so the endpoint can
return a real 500 (#1469).
- Validate /calculate payloads and cap axes scans; add per-endpoint rate limit
(#1470).
- Time-bound and lazy-load the Auth0 JWKS fetch so a startup outage doesn't crash
the API, caching only successes so the lazy retry actually retries (#1471).
- Replace deprecated dpath.util.search with dpath.search (#1472).
date: 2026-04-17 16:45:19
- bump: patch
changes:
changed:
- Update PolicyEngine US to 1.663.0
date: 2026-04-22 22:45:54
5 changes: 5 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- bump: patch
changes:
changed:
- Restore the household API codebase to the 0.13.13 baseline
- Pin policyengine_core to <=3.23.6 and urllib3 to <=1.26.20
33 changes: 0 additions & 33 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,39 +270,6 @@ The following endpoints remain unprotected:
- When enabled, all protected endpoints validate JWT tokens against Auth0's JWKS
- The Auth0 domain and audience must match the configured values

## CORS Configuration

Browsers enforce CORS against the API. The default allowlist accepts:

- `https://policyengine.org`
- Any `https://*.policyengine.org` host (anchored regex)
- `http://localhost` on any port (dev servers)
- `http://127.0.0.1` on any port

Override with `CORS_ALLOWED_ORIGINS` (comma-separated strings or
regexes) or `cors.allowed_origins` in YAML:

```bash
CORS_ALLOWED_ORIGINS=https://app.example.com,https://admin.example.com
```

```yaml
cors:
allowed_origins:
- https://app.example.com
- 'https://.*\.example\.com$'
```

Always terminate regex patterns with `$` — Flask-CORS matches with
`re.match`, so an unanchored pattern like `https://.*\.example\.com`
would accept `https://example.com.attacker.com`.

## Analytics reset (debug only)

`RESET_ANALYTICS=1` (or `analytics.reset: true` in YAML) wipes the
local SQLite analytics DB on startup. This is **only** consulted when
`FLASK_DEBUG=1`; production never resets the analytics DB.

## Usage Examples

### Production Deployment (Current)
Expand Down
8 changes: 0 additions & 8 deletions config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,3 @@ ai:
enabled: false
anthropic:
api_key: "" # Override with ANTHROPIC_API_KEY

# CORS configuration
cors:
# List of allowed origins (strings or regex patterns). If left null
# the API defaults to PolicyEngine production domains. Override with
# CORS_ALLOWED_ORIGINS (comma-separated) in environments that serve
# additional frontends.
allowed_origins: null
54 changes: 2 additions & 52 deletions policyengine_household_api/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@
from policyengine_household_api.decorators.analytics import (
log_analytics_if_enabled,
)
from policyengine_household_api.utils.config_loader import get_config_value

# Endpoints
from .endpoints import (
Expand All @@ -41,52 +40,7 @@

app = application = flask.Flask(__name__)

# Reject absurdly large request bodies before any view runs. 10 MiB is
# well above the largest legitimate household payload we have seen
# (axes scans push a few hundred KiB) while still capping the memory a
# single attacker can force us to allocate. Overridable via the
# ``MAX_CONTENT_LENGTH`` env var (bytes).
app.config["MAX_CONTENT_LENGTH"] = int(
os.getenv("MAX_CONTENT_LENGTH", 10 * 1024 * 1024)
)


def _resolve_cors_origins():
"""
Resolve the CORS allowed origins list.

Priority:
1. CORS_ALLOWED_ORIGINS env var (comma-separated list)
2. config value "cors.allowed_origins" (list or comma string)
3. Safe default: the PolicyEngine production domains

Use regex patterns so that wildcard subdomains work with
Flask-CORS's `origins` kwarg.
"""
raw = os.getenv("CORS_ALLOWED_ORIGINS") or get_config_value(
"cors.allowed_origins", None
)

if raw is None:
# Flask-CORS uses re.match, which is a prefix match; anchor with
# ``$`` so a hostile host like ``policyengine.org.attacker.com``
# cannot satisfy the wildcard pattern. Include ``localhost:*``
# so local dev servers can hit the API without extra setup.
origins = [
"https://policyengine.org",
r"https://.*\.policyengine\.org$",
r"http://localhost(:[0-9]+)?$",
r"http://127\.0\.0\.1(:[0-9]+)?$",
]
elif isinstance(raw, str):
origins = [o.strip() for o in raw.split(",") if o.strip()]
else:
origins = list(raw)

return origins


CORS(app, origins=_resolve_cors_origins())
CORS(app)

# Use in-memory storage for rate limiting
# Note that this provides limits per-instance;
Expand All @@ -105,7 +59,6 @@ def _resolve_cors_origins():

@app.route("/<country_id>/calculate", methods=["POST"])
@require_auth_if_enabled()
@limiter.limit("60 per minute")
@log_analytics_if_enabled
def calculate(country_id):
return get_calculate(country_id)
Expand All @@ -131,11 +84,8 @@ def readiness_check():
)


# Note: `/calculate_demo` is intentionally public (documented in
# config/README.md). It is guarded by a conservative rate limit rather
# than JWT authentication.
@app.route("/<country_id>/calculate_demo", methods=["POST"])
@limiter.limit("1 per 10 seconds")
@limiter.limit("1 per second")
def calculate_demo(country_id):
return get_calculate(country_id)

Expand Down
100 changes: 2 additions & 98 deletions policyengine_household_api/auth/validation.py
Original file line number Diff line number Diff line change
@@ -1,114 +1,18 @@
import json
import logging
import time
from threading import Lock
from urllib.request import urlopen

from authlib.oauth2.rfc7523 import JWTBearerTokenValidator
from authlib.jose.rfc7517.jwk import JsonWebKey

logger = logging.getLogger(__name__)

JWKS_FETCH_TIMEOUT = 10 # seconds
# Minimum wait between back-to-back lazy retries after a failure.
# Keeps us from hammering Auth0 when it is actively degraded.
JWKS_RETRY_INTERVAL_SECONDS = 30


# Module-level cache of successful JWKS fetches, keyed by issuer. Only
# successes are cached so that a transient failure is retried on the
# next authenticated request (``lru_cache`` would have memoised the
# ``None`` return, making the "lazy retry" dead code).
_jwks_cache: dict = {}
# Records the monotonic timestamp of the most recent *failed* fetch
# per-issuer so we can rate-limit retries without caching the failure
# itself.
_jwks_last_failure: dict = {}
_jwks_lock = Lock()


def _fetch_jwks_uncached(issuer: str):
"""Fetch the JWKS for an Auth0 issuer, bypassing the cache.

Returns an authlib key set on success, ``None`` on failure. Errors
are logged rather than raised so that a transient Auth0 outage
doesn't crash the process at import time.
"""
jwks_url = f"{issuer}.well-known/jwks.json"
try:
with urlopen(jwks_url, timeout=JWKS_FETCH_TIMEOUT) as response:
return JsonWebKey.import_key_set(json.loads(response.read()))
except Exception as e:
logger.warning(f"Failed to fetch JWKS from {jwks_url}: {e}")
return None


def _fetch_jwks(issuer: str):
"""Fetch JWKS, caching only successful results.

On failure we record the time but do not memoise the ``None`` — a
later call will retry (subject to ``JWKS_RETRY_INTERVAL_SECONDS``
backoff) so that the validator self-heals once Auth0 recovers.
"""
with _jwks_lock:
cached = _jwks_cache.get(issuer)
if cached is not None:
return cached
last_failure = _jwks_last_failure.get(issuer)
if (
last_failure is not None
and time.monotonic() - last_failure < JWKS_RETRY_INTERVAL_SECONDS
):
# Too soon after the last failure — don't hammer Auth0.
return None

# Fetch outside the lock so a slow network call doesn't block
# other threads that might be serving requests with a cached key.
key_set = _fetch_jwks_uncached(issuer)

with _jwks_lock:
if key_set is not None:
_jwks_cache[issuer] = key_set
_jwks_last_failure.pop(issuer, None)
else:
_jwks_last_failure[issuer] = time.monotonic()
return key_set


def _clear_jwks_cache():
"""Test helper: wipe the success/failure caches."""
with _jwks_lock:
_jwks_cache.clear()
_jwks_last_failure.clear()


class Auth0JWTBearerTokenValidator(JWTBearerTokenValidator):
def __init__(self, domain, audience):
issuer = f"https://{domain}/"

public_key = _fetch_jwks(issuer)
if public_key is None:
# Retry on next token validation rather than failing hard
# at construction time. A missing key set means token
# validation will fail cleanly inside authlib.
logger.warning(
"JWKS unavailable at construction; will retry on first "
"token validation."
)

jsonurl = urlopen(f"{issuer}.well-known/jwks.json")
public_key = JsonWebKey.import_key_set(json.loads(jsonurl.read()))
super(Auth0JWTBearerTokenValidator, self).__init__(public_key)
self._issuer = issuer
self.claims_options = {
"exp": {"essential": True},
"aud": {"essential": True, "value": audience},
"iss": {"essential": True, "value": issuer},
}

def authenticate_token(self, token_string):
# Lazy-refresh the JWKS if the initial fetch failed. Because
# ``_fetch_jwks`` only caches successes, this call will retry
# the network fetch (subject to a short backoff) until Auth0
# responds.
if self.public_key is None:
self.public_key = _fetch_jwks(self._issuer)
return super().authenticate_token(token_string)
2 changes: 1 addition & 1 deletion policyengine_household_api/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
POST = "POST"
UPDATE = "UPDATE"
LIST = "LIST"
VERSION = "0.13.15"
VERSION = "0.13.13"
COUNTRIES = ("uk", "us", "ca", "ng", "il")
COUNTRY_PACKAGE_NAMES = (
"policyengine_uk",
Expand Down
11 changes: 3 additions & 8 deletions policyengine_household_api/country.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import importlib
import logging
from flask import Response
import json
from policyengine_core.taxbenefitsystems import TaxBenefitSystem
Expand Down Expand Up @@ -433,12 +432,8 @@ def calculate(

return household, None

except Exception:
# Re-raise so endpoints/household.py (which unpacks
# ``(result, computation_tree_uuid)``) can surface a real
# 500 instead of a TypeError on ``None`` unpacking.
logging.exception("Tracer failed while computing household")
raise
except Exception as e:
print(f"Error computing tracer output: {e}")


def create_policy_reform(policy_data: dict) -> dict:
Expand Down Expand Up @@ -483,7 +478,7 @@ def apply(self):


def get_requested_computations(household: dict):
requested_computations = dpath.search(
requested_computations = dpath.util.search(
household,
"*/*/*/*",
afilter=lambda t: t is None,
Expand Down
Loading
Loading