Skip to content

Commit 42a530c

Browse files
authored
Merge pull request #523 from PolicyEngine/maria/constraints_check_loading_function
Database consistency checks for target loading
2 parents ac37f1f + 8a00e84 commit 42a530c

13 files changed

Lines changed: 1105 additions & 42 deletions

changelog_entry.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
- bump: minor
2+
changes:
3+
added:
4+
- field_valid_values table in database as a source of truth for fields that have semantic meaning external to the database hierarchy (variable, constraint_variable, period, operation, active).
5+
- Event listeners that raise an error if inconsistent operations or parent-child relationships are attempted to be inserted into the database.
6+
- Source field in target table.

policyengine_us_data/db/DATABASE_GUIDE.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,9 @@ make database
8282
- `variable`: PolicyEngine US variable name (e.g., `eitc`, `income_tax`)
8383
- `period`: Year
8484
- `value`: Numerical value
85+
- `source`: Data source identifier (optional, e.g., `IRS SOI`, `Census ACS S0101`)
8586
- `active`: Boolean flag
87+
- Unique constraint on `(variable, period, stratum_id, reform_id)` prevents duplicate targets regardless of source
8688

8789
### SQL Views
8890

@@ -102,6 +104,35 @@ WHERE domain_variable = 'snap' AND geo_level = 'state';
102104
-- active, geo_level, geographic_id, domain_variable
103105
```
104106

107+
### Field Valid Values and Validation
108+
109+
**field_valid_values** - Lookup table that defines which values are allowed for specific fields. SQL triggers on `stratum_constraints` and `targets` check incoming rows against this table and reject invalid values with `RAISE(ABORT)`.
110+
111+
Validated fields:
112+
113+
| Field | Table | Examples | How populated |
114+
|-------|-------|----------|---------------|
115+
| `operation` | stratum_constraints | `==`, `>`, `<=` | Static list in `create_field_valid_values.py` |
116+
| `constraint_variable` | stratum_constraints | `age`, `state_fips` | Dynamic from `policyengine-us` variables + extras |
117+
| `variable` | targets | `eitc`, `person_count` | Dynamic from `policyengine-us` variables |
118+
| `active` | targets | `0`, `1` | Static list |
119+
| `period` | targets | `2022`, `2023`, `2024`, `2025` | Static list |
120+
| `source` | targets | `IRS SOI`, `Census ACS S0101` | Static list (see below) |
121+
122+
**Adding new values**: If you introduce a new data source, time period, or constraint operation, you must register it in `create_field_valid_values.py` before any ETL script can use it. Otherwise the SQL trigger will reject the row at insert time. PolicyEngine variables are registered automatically at database creation time.
123+
124+
### Data Integrity Enforcement
125+
126+
The database enforces consistency through three mechanisms in `create_database_tables.py`:
127+
128+
**1. SQL trigger validation** - Before every INSERT/UPDATE on `targets` and `stratum_constraints`, triggers verify that field values exist in `field_valid_values`. Invalid values are rejected immediately. The `source` field is optional (NULL allowed), but if set, must match a registered value.
129+
130+
**2. Constraint consistency** - A SQLAlchemy `before_insert`/`before_update` listener on `Stratum` calls `ensure_consistent_constraint_set()` to verify that a stratum's constraints are logically compatible (e.g., no contradictory bounds like `age > 50` and `age < 30` on the same stratum).
131+
132+
**3. Parent-child constraint inheritance** - A SQLAlchemy listener ensures child strata include all parent constraints. This prevents a child from claiming to be in a different geographic or demographic scope than its parent. Two cases:
133+
- **Geographic-to-geographic** (e.g., state to CD): Instead of requiring literal constraint duplication, the validator checks geographic containment. A CD's `congressional_district_geoid` must encode the parent's `state_fips` (i.e., `geoid // 100 == state_fips`). Geographic variables are compared as integers to handle zero-padding differences (`"1"` vs `"01"`).
134+
- **Demographic children** (e.g., state to age group): The child must include all parent constraints verbatim (e.g., a child under `state_fips == 6` must also have `state_fips == 6` in its own constraints).
135+
105136
## Key Concepts
106137

107138
### Stratum Domains (replacing stratum_group_id)

0 commit comments

Comments
 (0)