You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Authenticated /calculate requests that included medical_out_of_pocket_expenses started crashing with VariableNotFoundError → HTTP 500 after @hua7450 merged the weekly policyengine_us 1.663.0 → 1.687.0 bump (#1491) on May 5 at 17:15 ET. @daphnehanse11 had authored PolicyEngine/policyengine-us#8178 ("Decompose SPM MOOP premiums") and @MaxGhenis had merged it on Apr 29; that PR removed the variable upstream in 1.673.0. Every output failed, not just the medical-expense ones. The breakage ran from the production deploy at 17:52 ET on May 5 to the warn-and-drop hotfix deploy at 13:46 ET on May 6.
CI caught the break: the MyFriendBen customer fixture test failed against 1.687.0 with the same HTTP 500. @hua7450 misread the signal — instead of treating it as a partner contract break and pausing for migration, @hua7450 updated the fixture in #1493 (self-merged, no review requested) so CI went green, then merged the dependency bump #1491 sixteen minutes later. Live partner payloads still sent medical_out_of_pocket_expenses.
May 5, 17:34 — @hua7450 posts the API update to #partner-amplifi, #partner-impactica, and #mfb-policy-engine, flagged "Major Changes (action required) 🚨". The post lands mid-deploy with no advance window.
May 5, 17:52 — Production deploy completes. Requests carrying medical_out_of_pocket_expenses start returning HTTP 500.
May 6, 10:02 — Paul Huntsberger (Amplifi) flags the break in #partner-amplifi.
May 6, 11:03 — @hua7450 confirms in-thread that we have deployed the change and asks whether Amplifi sends medical_out_of_pocket_expenses.
May 6, 11:25 — Paul confirms Amplifi's live requests use medical_out_of_pocket_expenses.
May 6, 13:46 — Hotfix deploy completes. Affected requests return HTTP 200 with warnings.
May 6, 14:16 — @hua7450 notifies MyFriendBen (Elliott) in #mfb-policy-engine that the hotfix is live.
Total impact window: ~20 hours.
Root cause
@daphnehanse11's policyengine-us#8178 ("Decompose SPM MOOP premiums") decomposed medical_out_of_pocket_expenses into program-specific aggregates and deleted the umbrella variable in policyengine_us 1.673.0. The granular replacements (other_medical_expenses, health_insurance_premiums_without_medicare_part_b, etc.) had existed since September 2024, so the model change itself was clean — but the removal broke the household API's input contract for partners.
The authenticated integration test (test_customer_inputs.py::test_my_friend_ben) caught it: the fixture posted medical_out_of_pocket_expenses, and after the bump the engine raised VariableNotFoundError → HTTP 500. The right move was to pause the release, notify partners, and ship a compatibility layer first. Instead @hua7450 migrated only the fixture in #1493 (self-merged, no review) and merged the dependency bump #1491 sixteen minutes later.
Fixture coverage also missed the actual contract surface. Of the four checked-in customer-input fixtures (MyFriendBen, Amplifi ×2, Impactica), only MyFriendBen's happened to include medical_out_of_pocket_expenses. Amplifi's fixture — a sample they sent us months earlier — didn't contain the variable, even though their live production payloads did. CI couldn't have caught the Amplifi break with the fixtures we had, regardless of what we did with the MyFriendBen test. @anth-volk confirmed this asymmetry post-incident by inspecting the checked-in fixtures.
A second flaw determined the blast radius: the API had no input validation before calculation, so any unknown variable flowed all the way to the engine and crashed the whole request. A partner asking for ten outputs lost all ten because one input was unsupported.
Resolution
Four PRs landed:
Warn and drop deprecated inputs instead of crashing #1494 (@hua7450 authored, @anth-volk merged; May 6 13:46 ET deploy) — Added a deprecated-input registry and drop_deprecated_inputs helper. The API now strips allow-listed deprecated inputs and deprecated scan axes before calculation, returns HTTP 200 for the rest of the outputs, and includes structured warnings with the migration hint surfaced verbatim (e.g., "Removed in policyengine-us 1.673.0. Migrate non-premium spending to other_medical_expenses and premium spending to health_insurance_premiums."). This is an availability fix, not a value-preserving shim: outputs that depended on the dropped input fall back to defaults (often zero). For partners passing 0 for medical_out_of_pocket_expenses (MyFriendBen's case pre-Migrate my_friend_ben fixture off removed medical_out_of_pocket_expenses #1493), it is a no-op. Partners passing non-zero values need to migrate the premium portion to health_insurance_premiums to recover those outputs.
Validate household variables and publish OpenAPI spec #1497 — @anth-volk authored and merged pre-calculation variable validation. Unsupported variables and axis names now return HTTP 400 with errors: string[] instead of reaching the compute layer. The PR also deep-copies household payloads in the deprecated-input drop path and publishes /calculate request, success, and error schemas under /specification.
Add calculate variable usage analytics #1499 — @anth-volk authored and merged privacy-safe variable-usage analytics for authenticated /calculate requests. It records variable name, source (input vs. axis), entity type, period granularity, and counts before validation and deprecated-input dropping, so we capture exactly what partners send. It excludes household values, entity IDs, member relationships, exact period keys, and request/response bodies, and adds Alembic migration infrastructure for the analytics DB.
Add scoped calculate analytics endpoint #1502 — @anth-volk authored and merged a scoped GET /analytics/calculate/requests endpoint behind a read:calculate-analytics Auth0 scope so we can inspect partner request shapes without storing any payloads.
Still open
Mirror partner contract checks into policyengine_us PR CI so model authors see partner impact at the source repo before merging variable removals or renames. A breaking-change detector in policyengine_us that flags any variable removal or rename against the household API's accepted input surface would catch this class of issue at the source repo.
Treat partner-fixture failures as a comms gate. A failing customer-input test should block release until affected partners have migration guidance and a window — unless a compatibility layer ships first.
Build partner-confirmed contract surfaces from the new variable-usage analytics. Once a few weeks of Add calculate variable usage analytics #1499 data accumulate, retire the stale-fixture-as-contract pattern — Amplifi's case showed a static fixture can miss variables a partner actually uses in production.
Deprecation playbook for every variable removal or rename: migration mapping, whether a value-preserving alias (or metadata-flagged deprecation) is feasible, warning behavior, grace window, partner-notification owner.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Authenticated
/calculaterequests that includedmedical_out_of_pocket_expensesstarted crashing withVariableNotFoundError→ HTTP 500 after @hua7450 merged the weeklypolicyengine_us1.663.0 → 1.687.0 bump (#1491) on May 5 at 17:15 ET. @daphnehanse11 had authored PolicyEngine/policyengine-us#8178 ("Decompose SPM MOOP premiums") and @MaxGhenis had merged it on Apr 29; that PR removed the variable upstream in 1.673.0. Every output failed, not just the medical-expense ones. The breakage ran from the production deploy at 17:52 ET on May 5 to the warn-and-drop hotfix deploy at 13:46 ET on May 6.CI caught the break: the MyFriendBen customer fixture test failed against 1.687.0 with the same HTTP 500. @hua7450 misread the signal — instead of treating it as a partner contract break and pausing for migration, @hua7450 updated the fixture in #1493 (self-merged, no review requested) so CI went green, then merged the dependency bump #1491 sixteen minutes later. Live partner payloads still sent
medical_out_of_pocket_expenses.Timeline (ET)
policyengine_us1.673.0 ships withoutmedical_out_of_pocket_expenses.test_my_friend_benwith HTTP 500 /VariableNotFoundError.other_medical_expenses. The PR body itself notes that customers sending the old variable need the same migration.policyengine_us1.663.0 → 1.687.0); its CI now passes against the migrated fixture.#partner-amplifi,#partner-impactica, and#mfb-policy-engine, flagged "Major Changes (action required) 🚨". The post lands mid-deploy with no advance window.medical_out_of_pocket_expensesstart returning HTTP 500.#partner-amplifi.medical_out_of_pocket_expenses.medical_out_of_pocket_expenses.#mfb-policy-enginethat the hotfix is live.Total impact window: ~20 hours.
Root cause
@daphnehanse11's policyengine-us#8178 ("Decompose SPM MOOP premiums") decomposed
medical_out_of_pocket_expensesinto program-specific aggregates and deleted the umbrella variable inpolicyengine_us1.673.0. The granular replacements (other_medical_expenses,health_insurance_premiums_without_medicare_part_b, etc.) had existed since September 2024, so the model change itself was clean — but the removal broke the household API's input contract for partners.The authenticated integration test (
test_customer_inputs.py::test_my_friend_ben) caught it: the fixture postedmedical_out_of_pocket_expenses, and after the bump the engine raisedVariableNotFoundError→ HTTP 500. The right move was to pause the release, notify partners, and ship a compatibility layer first. Instead @hua7450 migrated only the fixture in #1493 (self-merged, no review) and merged the dependency bump #1491 sixteen minutes later.Fixture coverage also missed the actual contract surface. Of the four checked-in customer-input fixtures (MyFriendBen, Amplifi ×2, Impactica), only MyFriendBen's happened to include
medical_out_of_pocket_expenses. Amplifi's fixture — a sample they sent us months earlier — didn't contain the variable, even though their live production payloads did. CI couldn't have caught the Amplifi break with the fixtures we had, regardless of what we did with the MyFriendBen test. @anth-volk confirmed this asymmetry post-incident by inspecting the checked-in fixtures.A second flaw determined the blast radius: the API had no input validation before calculation, so any unknown variable flowed all the way to the engine and crashed the whole request. A partner asking for ten outputs lost all ten because one input was unsupported.
Resolution
Four PRs landed:
drop_deprecated_inputshelper. The API now strips allow-listed deprecated inputs and deprecated scan axes before calculation, returns HTTP 200 for the rest of the outputs, and includes structured warnings with the migration hint surfaced verbatim (e.g.,"Removed in policyengine-us 1.673.0. Migrate non-premium spending to other_medical_expenses and premium spending to health_insurance_premiums."). This is an availability fix, not a value-preserving shim: outputs that depended on the dropped input fall back to defaults (often zero). For partners passing0formedical_out_of_pocket_expenses(MyFriendBen's case pre-Migrate my_friend_ben fixture off removed medical_out_of_pocket_expenses #1493), it is a no-op. Partners passing non-zero values need to migrate the premium portion tohealth_insurance_premiumsto recover those outputs.errors: string[]instead of reaching the compute layer. The PR also deep-copies household payloads in the deprecated-input drop path and publishes/calculaterequest, success, and error schemas under/specification./calculaterequests. It records variable name, source (input vs. axis), entity type, period granularity, and counts before validation and deprecated-input dropping, so we capture exactly what partners send. It excludes household values, entity IDs, member relationships, exact period keys, and request/response bodies, and adds Alembic migration infrastructure for the analytics DB.GET /analytics/calculate/requestsendpoint behind aread:calculate-analyticsAuth0 scope so we can inspect partner request shapes without storing any payloads.Still open
policyengine_usPR CI so model authors see partner impact at the source repo before merging variable removals or renames. A breaking-change detector inpolicyengine_usthat flags any variable removal or rename against the household API's accepted input surface would catch this class of issue at the source repo.Beta Was this translation helpful? Give feedback.
All reactions