Skip to content

Commit 5117dfb

Browse files
authored
feat: Generic Entity Links (LIAM2-inspired) (#1363)
2 parents 8d26af3 + e53e5a9 commit 5117dfb

34 files changed

Lines changed: 4631 additions & 32 deletions

.benchmarks/Linux-CPython-3.11-64bit/0001_pr_vectorized.json

Lines changed: 725 additions & 0 deletions
Large diffs are not rendered by default.

.benchmarks/Linux-CPython-3.11-64bit/0002_master_loop.json

Lines changed: 725 additions & 0 deletions
Large diffs are not rendered by default.

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Changelog
22

3+
## 44.3.0
4+
5+
#### New Features
6+
7+
- **Generic Entity Links (Phase 1-6)**: Introduced a new Liam2-inspired generic entity linking system avoiding rigid hierarchies like `Person -> Household`.
8+
- Added new `Many2OneLink` and `One2ManyLink` models to create powerful inter-entity networks (e.g., `Person -> Employer`).
9+
- Added implicit links directly binding members arrays. This powers the new `population.links` property natively inside `TaxBenefitSystem.instantiate_entities()`.
10+
- Full capability to chain relationships via python: `person.mother.household.get("rent", period)`.
11+
- Powerful vectorized declarative aggregations out-of-the-box (e.g., `households.persons.sum("salary", period, condition=is_female)`).
12+
13+
#### Technical Changes
14+
15+
- Backward compatibility is 100% maintained. Existing syntax via Projectors natively redirects to implicit links via modified `__getattr__`.
16+
317
## 44.2.2
418

519
#### Bug fixes

PR_DESCRIPTION.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Feature: Generic Entity Links (LIAM2-inspired)
2+
3+
## Context & Motivation
4+
5+
OpenFisca's traditional entity model has historically been strictly hierarchical and bipartite: individuals belong to groups (households, families, tax units), and groups contain individuals. This rigid structure works well for static tax-benefit systems but struggles with complex, real-world socioeconomic models, such as:
6+
- **Intra-entity relationships**: Kinship graphs (person $\rightarrow$ mother, person $\rightarrow$ spouse).
7+
- **Arbitrary inter-entity networks**: Employment networks (person $\rightarrow$ employer), geographical mobility, or ad-hoc associations.
8+
- **Deep chaining**: Navigating multiple relationship hops (e.g., "the region of the household of the mother of the person").
9+
10+
To solve this, we drew inspiration from [LIAM2's linking system](https://liam2.plan.be/) and adapted it to OpenFisca's unique architecture (specifically integrating with our `Role` semantics and vectorized execution).
11+
12+
## What we did
13+
14+
This PR introduces a generic, highly performant, and **100% backward-compatible** Entity Linking system.
15+
16+
### 1. Core Link Classes (`openfisca_core/links`)
17+
- **`Many2OneLink`**: Resolves *N* source members to *1* target entity (e.g., `person.mother`, `person.employer`). Supports fetching values (`.get()`) and dynamic chaining (`.mother.household.rent`).
18+
- **`One2ManyLink`**: Aggregates from *N* target members back to *1* source entity. Supports a wide suite of vectorized aggregations (`sum`, `count`, `any`, `all`, `min`, `max`, `avg`) along with filtering by `role` or an arbitrary boolean `condition` mask.
19+
20+
### 2. Implicit Links & Backward Compatibility
21+
A major design goal was to avoid breaking existing country packages (`openfisca-france`, `openfisca-tunisia`, etc.).
22+
- Links are strictly **additive**.
23+
- During `Simulation` initialization, OpenFisca now automatically reads the existing `GroupEntity` structure and injects **Implicit Links**:
24+
- `ImplicitMany2OneLink`: Automatically adds `person.household`, mapping directly to the high-performance `GroupPopulation.members_entity_id` array.
25+
- `ImplicitOne2ManyLink`: Automatically adds `household.persons`, replacing the need for verbose legacy aggregations.
26+
- `Population.__getattr__` was carefully patched to first check `self.links["..."]` before natively falling back to the legacy `get_projector_from_shortcut()` route. *Everything keeps working identically.*
27+
28+
### 3. Syntax Sugar & Chaining
29+
The new API allows natural, pythonic data fetching:
30+
```python
31+
# Old projector way (still works!):
32+
rents = sim.persons.household("rent", "2024")
33+
34+
# New explicit link definition (e.g., for arbitrary networks)
35+
mother_link = Many2OneLink(name="mother", link_field="mother_id", target_entity_key="person")
36+
person_entity.add_link(mother_link)
37+
38+
# New chaining syntax:
39+
mother_household_rents = sim.persons.mother.household.get("rent", "2024")
40+
41+
# New declarative aggregations:
42+
female_salaries = sim.households.persons.sum("salary", "2024", condition=is_female)
43+
```
44+
45+
## Performance
46+
Performance is a critical constraint for OpenFisca simulations. We added `pytest-benchmark` tests validating the new mechanics.
47+
- `.get()` resolutions (Many-to-One) perform identically to legacy Projectors (~118μs on 15,000 entities).
48+
- Aggregations (`One2Many.sum()`) introduce a negligible setup overhead (< 1ms) but execute fully vectorized `numpy.bincount` and `numpy.maximum.at` operations under the hood.
49+
50+
## Associated Documentation
51+
We've added guides to help framework users model new relationships:
52+
- `docs/implementation/links-api.md`: Reference for creating and querying `Many2OneLink` and `One2ManyLink`.
53+
- `docs/implementation/transition-guide.md`: Migration guide demonstrating how to gradually adopt Links over Legacy Projectors.
54+
55+
## Testing
56+
- 12 new, comprehensive tests covering unit mechanics, system integrations, filtering, chaining, and OpenFisca core lifecycle (`_resolve_links`).
57+
- All 158 core tests and existing Country Template tests continue to pass locally (`make test-code`).

benchmarks/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Benchmarks
2+
3+
## How to run
4+
5+
```bash
6+
# Run all benchmarks
7+
make benchmark
8+
9+
# Run compute benchmarks only
10+
.venv/bin/python -m pytest benchmarks/test_bench_compute.py -v --benchmark-sort=name
11+
12+
# Run memory benchmarks only
13+
.venv/bin/python -m pytest benchmarks/test_bench_memory.py -v -s
14+
15+
# Save results for later comparison
16+
.venv/bin/python -m pytest benchmarks/ --benchmark-save=my_baseline
17+
18+
# Compare with a saved baseline
19+
.venv/bin/python -m pytest benchmarks/ --benchmark-compare=0001_my_baseline
20+
```
21+
22+
## Benchmarks included
23+
24+
### Compute (`test_bench_compute.py`)
25+
26+
| Benchmark | What it measures | Sizes |
27+
|---|---|---|
28+
| `members_position` | GroupPopulation position assignment | 100 → 1M |
29+
| `group_sum` | `household.sum(salary)` | 100 → 1M |
30+
| `disposable_income` | Full variable cascade (~15 vars) | 100 → 100K |
31+
| `tbs_loading` | TaxBenefitSystem initialization | 1 |
32+
33+
### Memory (`test_bench_memory.py`)
34+
35+
| Benchmark | What it measures | Sizes |
36+
|---|---|---|
37+
| `members_position_memory` | Peak memory for position calc | 10K → 1M |
38+
| `simulation_memory` | Peak memory for full simulation | 10K → 1M |
39+
| `per_variable_memory` | Memory per variable per person | 10K → 100K |

benchmarks/conftest.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
"""Shared fixtures for OpenFisca benchmarks."""
2+
3+
import numpy
4+
import pytest
5+
6+
7+
@pytest.fixture(params=[100, 10_000, 100_000, 1_000_000], ids=lambda n: f"N={n:_}")
8+
def population_size(request):
9+
"""Population sizes to benchmark."""
10+
return request.param
11+
12+
13+
@pytest.fixture(params=[100, 10_000, 100_000], ids=lambda n: f"N={n:_}")
14+
def simulation_size(request):
15+
"""Population sizes for full simulation benchmarks (capped for speed)."""
16+
return request.param
17+
18+
19+
@pytest.fixture
20+
def rng():
21+
"""Deterministic random number generator."""
22+
return numpy.random.default_rng(42)
23+
24+
25+
@pytest.fixture
26+
def make_group_population():
27+
"""Factory to create a GroupPopulation with random entity assignment."""
28+
29+
def _make(nb_persons, nb_entities=None):
30+
from openfisca_core.populations.group_population import GroupPopulation
31+
32+
if nb_entities is None:
33+
nb_entities = max(1, nb_persons // 3)
34+
35+
rng = numpy.random.default_rng(42)
36+
pop = GroupPopulation.__new__(GroupPopulation)
37+
pop._members_entity_id = rng.integers(0, nb_entities, size=nb_persons)
38+
pop._members_position = None
39+
pop._ordered_members_map = None
40+
return pop
41+
42+
return _make
43+
44+
45+
@pytest.fixture
46+
def make_simulation():
47+
"""Factory to create a Simulation with salary input."""
48+
49+
def _make(nb_persons):
50+
from openfisca_country_template import CountryTaxBenefitSystem
51+
52+
from openfisca_core.simulations import SimulationBuilder
53+
54+
tbs = CountryTaxBenefitSystem()
55+
sim = SimulationBuilder().build_default_simulation(tbs, count=nb_persons)
56+
57+
rng = numpy.random.default_rng(42)
58+
sim.set_input("salary", "2024-01", rng.uniform(1000, 5000, nb_persons))
59+
return sim
60+
61+
return _make

benchmarks/test_bench_compute.py

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
"""Compute time benchmarks for OpenFisca-Core.
2+
3+
Uses pytest-benchmark for statistically rigorous measurements.
4+
Run with: pytest benchmarks/test_bench_compute.py -v --benchmark-sort=name
5+
"""
6+
7+
import pytest
8+
9+
# ---------------------------------------------------------------------------
10+
# S1: members_position (the function we just vectorized)
11+
# ---------------------------------------------------------------------------
12+
13+
14+
class TestMembersPositionBench:
15+
"""Benchmark GroupPopulation.members_position."""
16+
17+
@pytest.mark.parametrize(
18+
"nb_persons,nb_entities",
19+
[
20+
pytest.param(100, 40, id="N=100"),
21+
pytest.param(10_000, 4_000, id="N=10K"),
22+
pytest.param(100_000, 40_000, id="N=100K"),
23+
pytest.param(1_000_000, 400_000, id="N=1M"),
24+
],
25+
)
26+
def test_members_position(
27+
self, benchmark, nb_persons, nb_entities, make_group_population
28+
):
29+
pop = make_group_population(nb_persons, nb_entities)
30+
31+
def run():
32+
pop._members_position = None # force recompute
33+
return pop.members_position
34+
35+
result = benchmark.pedantic(run, iterations=3, rounds=5, warmup_rounds=1)
36+
assert len(result) == nb_persons
37+
38+
39+
# ---------------------------------------------------------------------------
40+
# S2: GroupPopulation aggregations (sum, any)
41+
# ---------------------------------------------------------------------------
42+
43+
44+
class TestGroupAggregationBench:
45+
"""Benchmark household.sum() and household.any()."""
46+
47+
@pytest.mark.parametrize(
48+
"nb_persons",
49+
[
50+
pytest.param(10_000, id="N=10K"),
51+
pytest.param(100_000, id="N=100K"),
52+
],
53+
)
54+
def test_household_sum(self, benchmark, nb_persons, make_simulation):
55+
sim = make_simulation(nb_persons)
56+
57+
def run():
58+
household = sim.populations["household"]
59+
salaries = household.members("salary", "2024-01")
60+
return household.sum(salaries)
61+
62+
result = benchmark.pedantic(run, iterations=5, rounds=5, warmup_rounds=1)
63+
assert len(result) > 0
64+
65+
@pytest.mark.parametrize(
66+
"nb_persons",
67+
[
68+
pytest.param(10_000, id="N=10K"),
69+
pytest.param(100_000, id="N=100K"),
70+
],
71+
)
72+
def test_household_any(self, benchmark, nb_persons, make_simulation):
73+
sim = make_simulation(nb_persons)
74+
75+
def run():
76+
household = sim.populations["household"]
77+
salaries = household.members("salary", "2024-01")
78+
return household.any(salaries > 3000)
79+
80+
result = benchmark.pedantic(run, iterations=5, rounds=5, warmup_rounds=1)
81+
assert len(result) > 0
82+
83+
84+
# ---------------------------------------------------------------------------
85+
# S3: Full simulation (disposable_income)
86+
# ---------------------------------------------------------------------------
87+
88+
89+
class TestFullSimulationBench:
90+
"""Benchmark a full disposable_income calculation."""
91+
92+
@pytest.mark.parametrize(
93+
"nb_persons",
94+
[
95+
pytest.param(100, id="N=100"),
96+
pytest.param(10_000, id="N=10K"),
97+
pytest.param(100_000, id="N=100K"),
98+
],
99+
)
100+
def test_disposable_income(self, benchmark, nb_persons, make_simulation):
101+
sim = make_simulation(nb_persons)
102+
103+
def run():
104+
return sim.calculate("disposable_income", "2024-01")
105+
106+
result = benchmark.pedantic(run, iterations=1, rounds=3, warmup_rounds=1)
107+
assert len(result) > 0
108+
109+
@pytest.mark.parametrize(
110+
"nb_persons",
111+
[
112+
pytest.param(100, id="N=100"),
113+
pytest.param(10_000, id="N=10K"),
114+
],
115+
)
116+
def test_income_tax(self, benchmark, nb_persons, make_simulation):
117+
sim = make_simulation(nb_persons)
118+
119+
def run():
120+
return sim.calculate("income_tax", "2024-01")
121+
122+
result = benchmark.pedantic(run, iterations=3, rounds=5, warmup_rounds=1)
123+
assert len(result) > 0
124+
125+
126+
# ---------------------------------------------------------------------------
127+
# S4: TBS loading
128+
# ---------------------------------------------------------------------------
129+
130+
131+
class TestTBSLoadingBench:
132+
"""Benchmark TaxBenefitSystem initialization."""
133+
134+
def test_tbs_loading(self, benchmark):
135+
def run():
136+
from openfisca_country_template import CountryTaxBenefitSystem
137+
138+
return CountryTaxBenefitSystem()
139+
140+
result = benchmark.pedantic(run, iterations=1, rounds=3, warmup_rounds=1)
141+
assert result is not None

0 commit comments

Comments
 (0)