Skip to content

Commit 2305ea5

Browse files
authored
Add Ledger API compatibility path (#54)
1 parent 5fa48f0 commit 2305ea5

10 files changed

Lines changed: 185 additions & 37 deletions

File tree

README.md

Lines changed: 48 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,24 @@
1-
# Arch
1+
# PolicyEngine Ledger
22

3-
Arch is PolicyEngine's source-data foundation for social simulation. It captures
4-
source publications, preserves provenance, and represents published values as
5-
structured, queryable facts.
3+
PolicyEngine Ledger is the public name for the source-backed fact store
4+
currently implemented in the historical `arch` Python namespace and
5+
`PolicyEngine/arch-data` repository. New consumers should use the
6+
`policyengine_ledger` import path; existing `arch` imports remain supported
7+
during the rename.
68

7-
Arch may normalize structure: parse files, type values, declare units and
9+
Ledger is PolicyEngine's source-data foundation for social simulation. It
10+
captures source publications, preserves provenance, and represents published
11+
values as structured, queryable facts.
12+
13+
Ledger may normalize structure: parse files, type values, declare units and
814
scales, assign geography and period identifiers, and preserve lineage back to
9-
source artifacts. Arch does not choose among sources, reconcile inconsistent
15+
source artifacts. Ledger does not choose among sources, reconcile inconsistent
1016
sources, age values, impute missing data, select active calibration targets, or
1117
apply simulator-specific mappings.
1218

13-
Microplex consumes Arch facts to build simulation datasets and Microplex
14-
Targets. Modeling choices live in Microplex, not Arch.
19+
Populace consumes Ledger facts to build simulation datasets and Populace
20+
targets. Thesis can consume the same facts as official observations. Modeling
21+
choices live in those consumers, not Ledger.
1522

1623
## Purpose
1724

@@ -24,29 +31,29 @@ This repository provides:
2431
- **Normalization**: Low-assumption representation changes such as unit/scale
2532
conversion and source-published total/share arithmetic.
2633
- **Target inputs**: Source-published aggregates, projections, rates, counts,
27-
and metadata that Microplex may use to compose calibration targets.
34+
and metadata that Populace may use to compose calibration targets.
2835
- **Microdata**: Survey and administrative microdata ingestion for CPS, PUF,
2936
FRS, and related datasets.
3037
- **Jurisdiction loaders**: Source-specific ETL that emits the shared Arch
3138
schema.
3239

33-
Arch facts are not PolicyEngine's assertion that a source claim is ultimately true.
40+
Ledger facts are not PolicyEngine's assertion that a source claim is ultimately true.
3441
They are source-backed claims with provenance.
3542

3643
## Boundary
3744

3845
The load-bearing rule:
3946

40-
> Arch may re-express a published value, but may not choose among, reconcile,
47+
> Ledger may re-express a published value, but may not choose among, reconcile,
4148
> age, impute, or transform published values in ways that change their meaning.
4249
4350
| Layer | Owns | Examples |
4451
|-------|------|----------|
45-
| Arch Sources | Source artifacts and provenance | URLs, checksums, source files, parsed tables/cells |
46-
| Arch Facts | Structured source claims | SOI cells, ACS estimates, CPI values, CBO-published projections |
47-
| Arch Normalization | Representation changes | Unit scales, typed values, geography/date identifiers |
48-
| Arch Target Inputs | Source facts shaped for calibration | SOI EITC totals, CBO baselines, source-published growth factors |
49-
| Microplex Targets | Model-ready target sets | Source selection, reconciliation, aging, activation profiles |
52+
| Ledger Sources | Source artifacts and provenance | URLs, checksums, source files, parsed tables/cells |
53+
| Ledger Facts | Structured source claims | SOI cells, ACS estimates, CPI values, CBO-published projections |
54+
| Ledger Normalization | Representation changes | Unit scales, typed values, geography/date identifiers |
55+
| Ledger Target Inputs | Source facts shaped for calibration | SOI EITC totals, CBO baselines, source-published growth factors |
56+
| Populace Targets | Model-ready target sets | Source selection, reconciliation, aging, activation profiles |
5057

5158
The storage split is documented in
5259
[`docs/storage-architecture.md`](docs/storage-architecture.md): `arch-raw`
@@ -56,7 +63,7 @@ mirrored from accepted builds.
5663

5764
## Repository Model
5865

59-
Arch is global at the schema, validation, database, and build-harness layer.
66+
Ledger is global at the schema, validation, database, and build-harness layer.
6067
Jurisdiction packages are modular source packages that emit the same Arch
6168
objects.
6269

@@ -72,6 +79,7 @@ Python distributions:
7279
policyengine-arch-uk
7380
7481
Python imports:
82+
policyengine_ledger # New public API
7583
arch
7684
arch_us
7785
arch_uk
@@ -98,15 +106,17 @@ arch/
98106
│ ├── schema.py # SQLModel: Target, Stratum, StratumConstraint
99107
│ ├── supabase_client.py # Supabase client helpers
100108
│ └── etl_*.py # Source-specific ETL pipelines
101-
├── micro/ # Microplex consumers of Arch records
109+
├── micro/ # Legacy simulation consumer prototypes
102110
├── calibration/ # Calibration target adapters and constraints
103111
├── data/ # Cached data files
104112
└── docs/ # Architecture and source documentation
105113
```
106114

107-
New code should prefer `arch.sources`, `arch.facts`, `arch.normalization`,
108-
`arch.targets`, and `arch.microdata`. Microplex-specific target composition
109-
and calibration code belongs under `micro/`.
115+
New code should prefer `policyengine_ledger` for source-backed fact and target
116+
input consumers. Existing in-repo implementation code may continue using
117+
`arch.sources`, `arch.facts`, `arch.normalization`, `arch.targets`, and
118+
`arch.microdata` while the rename is phased in. Populace-specific target
119+
composition and calibration code belongs in Populace.
110120

111121
## Quick Start
112122

@@ -137,6 +147,8 @@ JSON report with fact counts, QA counts, warnings, and validation errors:
137147
uv run python -m arch.harness validate-facts --fixture
138148
# Equivalent when the console script is installed:
139149
uv run arch validate-facts --fixture
150+
# Equivalent public command once installed:
151+
uv run ledger validate-facts --fixture
140152
```
141153

142154
To build a tiny source-backed fixture from the packaged IRS SOI Table 1.1
@@ -258,7 +270,7 @@ expected first-class constraints, row-backed filter/constraint evidence,
258270
concept alignment evidence, Axiom concept validation status, and stage-report
259271
validity.
260272

261-
To build the downstream integration artifact Microplex should consume, merge
273+
To build the downstream integration artifact Populace can inspect, merge
262274
available source-package suites for a year into one bundle:
263275

264276
```bash
@@ -481,20 +493,20 @@ Target inputs use a three-table schema:
481493
- **stratum_constraints**: Rules defining each stratum.
482494
- **targets**: Source-published aggregate values linked to strata.
483495

484-
These are inputs to Microplex target composition. Microplex owns the active,
496+
These are inputs to Populace target composition. Populace owns the active,
485497
reconciled, aged target sets used for calibration.
486498

487-
## Arch Facts And Microplex Targets
499+
## Ledger Facts And Populace Targets
488500

489-
Source facts should be structurally normalized before Microplex considers them
501+
Source facts should be structurally normalized before Populace considers them
490502
as calibration target candidates.
491503
Normalization is about representation, not modeling: units, scales, typed
492504
values, geography IDs, period IDs, and same-source arithmetic where the source
493505
publishes the total/share relationship.
494506

495507
Inflation, aging, cross-source reconciliation, source selection, and target
496-
activation belong in Microplex Targets unless the source itself publishes the
497-
adjusted or projected series.
508+
activation belong in Populace unless the source itself publishes the adjusted
509+
or projected series.
498510

499511
```python
500512
from arch.facts import SourceFact
@@ -542,21 +554,21 @@ target_input = as_target(
542554

543555
## Boundaries
544556

545-
- **Arch** owns source data, provenance, source facts, aggregate facts, and
557+
- **Ledger** owns source data, provenance, source facts, aggregate facts, and
546558
microdata ingestion.
547-
- **Microplex Targets** owns source selection, reconciliation, aging, imputation,
559+
- **Populace Targets** owns source selection, reconciliation, aging, imputation,
548560
active target sets, and calibration profiles.
549-
- **Microplex** owns simulation interfaces, entity modeling, weights, and
561+
- **Populace** owns simulation interfaces, entity modeling, weights, and
550562
calibration execution.
551563
- **Jurisdiction source packages** such as `arch-us` and `arch-uk` own
552564
source-specific parsers and specs that emit shared Arch records.
553-
- **Jurisdiction simulation packages** such as `microplex-us` own
554-
simulation-specific variable mappings and target recipes.
565+
- **Jurisdiction simulation packages** own simulation-specific variable
566+
mappings and target recipes.
555567
- **PolicyEngine** owns policy-facing tools and analysis workflows.
556568

557569
## Related Repositories
558570

559-
- [microplex](https://github.com/PolicyEngine/microplex) - Core microsimulation
560-
abstractions and calibration interfaces.
561-
- [microplex-us](https://github.com/PolicyEngine/microplex-us) - US-specific
562-
simulation adapters and calibration profiles.
571+
- [populace](https://github.com/PolicyEngine/populace) - Simulation data builds,
572+
target selection, and calibration execution.
573+
- [thesis](https://github.com/PolicyEngine/thesis) - Public-facing official
574+
observations and analysis surfaces backed by Ledger facts.

policyengine_ledger/__init__.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
"""PolicyEngine Ledger public API.
2+
3+
Ledger is the public name for PolicyEngine's source-backed fact store. The
4+
implementation still lives in the historical :mod:`arch` namespace while the
5+
repository rename is phased in; this package is the stable import path for new
6+
consumers such as Populace and Thesis.
7+
"""
8+
9+
from arch.core import (
10+
AggregateConstraint,
11+
AggregateFact,
12+
Aggregation,
13+
EntityDimension,
14+
GeographyDimension,
15+
Measure,
16+
PeriodDimension,
17+
SourceProvenance,
18+
SourceRecordLayout,
19+
ValidationIssue,
20+
ValidationReport,
21+
build_aggregate_constraints,
22+
build_fact_key,
23+
build_label,
24+
validate_fact,
25+
validate_facts,
26+
)
27+
28+
__all__ = [
29+
"AggregateConstraint",
30+
"AggregateFact",
31+
"Aggregation",
32+
"EntityDimension",
33+
"GeographyDimension",
34+
"Measure",
35+
"PeriodDimension",
36+
"SourceProvenance",
37+
"SourceRecordLayout",
38+
"ValidationIssue",
39+
"ValidationReport",
40+
"build_aggregate_constraints",
41+
"build_fact_key",
42+
"build_label",
43+
"validate_fact",
44+
"validate_facts",
45+
]

policyengine_ledger/cli.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
"""Ledger CLI compatibility entry point."""
2+
3+
from arch.cli import main
4+
5+
__all__ = ["main"]
6+
7+
8+
if __name__ == "__main__":
9+
main()

policyengine_ledger/core.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""Ledger fact schema compatibility module.
2+
3+
New consumers should import from :mod:`policyengine_ledger.core`; the objects
4+
are re-exported from :mod:`arch.core` until the historical namespace is retired.
5+
"""
6+
7+
from arch.core import * # noqa: F403

policyengine_ledger/database.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
"""Ledger relational database compatibility module."""
2+
3+
from arch.database import * # noqa: F403

policyengine_ledger/facts.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
"""Ledger source-fact compatibility module."""
2+
3+
from arch.facts import * # noqa: F403
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
"""Ledger target-input helpers.
2+
3+
Ledger owns source-backed facts and target-eligible source inputs. Consumers
4+
such as Populace decide which subset is active and how those facts map to model
5+
variables.
6+
"""
7+
8+
from arch.targets import * # noqa: F403
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
"""US poverty/nonfiler target coverage compatibility module."""
2+
3+
from arch.targets.us_poverty import * # noqa: F403

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,14 @@ policyengine = [
3636

3737
[project.scripts]
3838
arch = "arch.cli:main"
39+
ledger = "policyengine_ledger.cli:main"
3940

4041
[build-system]
4142
requires = ["hatchling"]
4243
build-backend = "hatchling.build"
4344

4445
[tool.hatch.build.targets.wheel]
45-
packages = ["arch", "db", "micro", "calibration", "packages"]
46+
packages = ["arch", "db", "micro", "calibration", "packages", "policyengine_ledger"]
4647

4748
[tool.pytest.ini_options]
4849
testpaths = ["tests"]
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
from policyengine_ledger import (
2+
AggregateFact,
3+
Aggregation,
4+
EntityDimension,
5+
GeographyDimension,
6+
Measure,
7+
PeriodDimension,
8+
SourceProvenance,
9+
build_fact_key,
10+
validate_fact,
11+
)
12+
from policyengine_ledger.targets.us_poverty import hard_target_package_aliases
13+
14+
15+
def test__given_ledger_import_path__then_it_reexports_arch_fact_schema() -> None:
16+
# Given
17+
fact = AggregateFact(
18+
value=1,
19+
period=PeriodDimension(type="calendar_year", value=2024),
20+
geography=GeographyDimension(level="country", id="0100000US"),
21+
entity=EntityDimension(name="person"),
22+
measure=Measure(concept="test.people", unit="count"),
23+
aggregation=Aggregation(method="count"),
24+
source=SourceProvenance(
25+
source_name="test",
26+
source_table="Fixture",
27+
vintage="2024",
28+
extracted_at="2026-06-14",
29+
extraction_method="unit test",
30+
),
31+
)
32+
33+
# When
34+
issues = validate_fact(fact)
35+
key = build_fact_key(fact)
36+
37+
# Then
38+
assert not issues
39+
assert key.startswith("arch.fact.v1:")
40+
41+
42+
def test__given_ledger_facts_import_path__then_it_reexports_arch_facts() -> None:
43+
# When
44+
from arch.facts import AggregateFact as ArchAggregateFact
45+
from policyengine_ledger.facts import AggregateFact as LedgerAggregateFact
46+
47+
# Then
48+
assert LedgerAggregateFact is ArchAggregateFact
49+
50+
51+
def test__given_ledger_target_import_path__then_it_reexports_target_contracts() -> None:
52+
# When
53+
aliases = hard_target_package_aliases()
54+
55+
# Then
56+
assert "soi-table-1-1" in aliases
57+
assert "ssa-ssi-table-7b1-2024" in aliases

0 commit comments

Comments
 (0)