1- # Arch
1+ # PolicyEngine Ledger
22
3- Arch is PolicyEngine's source-data foundation for social simulation. It captures
4- source publications, preserves provenance, and represents published values as
5- structured, queryable facts.
3+ PolicyEngine Ledger is the public name for the source-backed fact store
4+ currently implemented in the historical ` arch ` Python namespace and
5+ ` PolicyEngine/arch-data ` repository. New consumers should use the
6+ ` policyengine_ledger ` import path; existing ` arch ` imports remain supported
7+ during the rename.
68
7- Arch may normalize structure: parse files, type values, declare units and
9+ Ledger is PolicyEngine's source-data foundation for social simulation. It
10+ captures source publications, preserves provenance, and represents published
11+ values as structured, queryable facts.
12+
13+ Ledger may normalize structure: parse files, type values, declare units and
814scales, assign geography and period identifiers, and preserve lineage back to
9- source artifacts. Arch does not choose among sources, reconcile inconsistent
15+ source artifacts. Ledger does not choose among sources, reconcile inconsistent
1016sources, age values, impute missing data, select active calibration targets, or
1117apply simulator-specific mappings.
1218
13- Microplex consumes Arch facts to build simulation datasets and Microplex
14- Targets. Modeling choices live in Microplex, not Arch.
19+ Populace consumes Ledger facts to build simulation datasets and Populace
20+ targets. Thesis can consume the same facts as official observations. Modeling
21+ choices live in those consumers, not Ledger.
1522
1623## Purpose
1724
@@ -24,29 +31,29 @@ This repository provides:
2431- ** Normalization** : Low-assumption representation changes such as unit/scale
2532 conversion and source-published total/share arithmetic.
2633- ** Target inputs** : Source-published aggregates, projections, rates, counts,
27- and metadata that Microplex may use to compose calibration targets.
34+ and metadata that Populace may use to compose calibration targets.
2835- ** Microdata** : Survey and administrative microdata ingestion for CPS, PUF,
2936 FRS, and related datasets.
3037- ** Jurisdiction loaders** : Source-specific ETL that emits the shared Arch
3138 schema.
3239
33- Arch facts are not PolicyEngine's assertion that a source claim is ultimately true.
40+ Ledger facts are not PolicyEngine's assertion that a source claim is ultimately true.
3441They are source-backed claims with provenance.
3542
3643## Boundary
3744
3845The load-bearing rule:
3946
40- > Arch may re-express a published value, but may not choose among, reconcile,
47+ > Ledger may re-express a published value, but may not choose among, reconcile,
4148> age, impute, or transform published values in ways that change their meaning.
4249
4350| Layer | Owns | Examples |
4451| -------| ------| ----------|
45- | Arch Sources | Source artifacts and provenance | URLs, checksums, source files, parsed tables/cells |
46- | Arch Facts | Structured source claims | SOI cells, ACS estimates, CPI values, CBO-published projections |
47- | Arch Normalization | Representation changes | Unit scales, typed values, geography/date identifiers |
48- | Arch Target Inputs | Source facts shaped for calibration | SOI EITC totals, CBO baselines, source-published growth factors |
49- | Microplex Targets | Model-ready target sets | Source selection, reconciliation, aging, activation profiles |
52+ | Ledger Sources | Source artifacts and provenance | URLs, checksums, source files, parsed tables/cells |
53+ | Ledger Facts | Structured source claims | SOI cells, ACS estimates, CPI values, CBO-published projections |
54+ | Ledger Normalization | Representation changes | Unit scales, typed values, geography/date identifiers |
55+ | Ledger Target Inputs | Source facts shaped for calibration | SOI EITC totals, CBO baselines, source-published growth factors |
56+ | Populace Targets | Model-ready target sets | Source selection, reconciliation, aging, activation profiles |
5057
5158The storage split is documented in
5259[ ` docs/storage-architecture.md ` ] ( docs/storage-architecture.md ) : ` arch-raw `
@@ -56,7 +63,7 @@ mirrored from accepted builds.
5663
5764## Repository Model
5865
59- Arch is global at the schema, validation, database, and build-harness layer.
66+ Ledger is global at the schema, validation, database, and build-harness layer.
6067Jurisdiction packages are modular source packages that emit the same Arch
6168objects.
6269
@@ -72,6 +79,7 @@ Python distributions:
7279 policyengine-arch-uk
7380
7481Python imports:
82+ policyengine_ledger # New public API
7583 arch
7684 arch_us
7785 arch_uk
@@ -98,15 +106,17 @@ arch/
98106│ ├── schema.py # SQLModel: Target, Stratum, StratumConstraint
99107│ ├── supabase_client.py # Supabase client helpers
100108│ └── etl_*.py # Source-specific ETL pipelines
101- ├── micro/ # Microplex consumers of Arch records
109+ ├── micro/ # Legacy simulation consumer prototypes
102110├── calibration/ # Calibration target adapters and constraints
103111├── data/ # Cached data files
104112└── docs/ # Architecture and source documentation
105113```
106114
107- New code should prefer ` arch.sources ` , ` arch.facts ` , ` arch.normalization ` ,
108- ` arch.targets ` , and ` arch.microdata ` . Microplex-specific target composition
109- and calibration code belongs under ` micro/ ` .
115+ New code should prefer ` policyengine_ledger ` for source-backed fact and target
116+ input consumers. Existing in-repo implementation code may continue using
117+ ` arch.sources ` , ` arch.facts ` , ` arch.normalization ` , ` arch.targets ` , and
118+ ` arch.microdata ` while the rename is phased in. Populace-specific target
119+ composition and calibration code belongs in Populace.
110120
111121## Quick Start
112122
@@ -137,6 +147,8 @@ JSON report with fact counts, QA counts, warnings, and validation errors:
137147uv run python -m arch.harness validate-facts --fixture
138148# Equivalent when the console script is installed:
139149uv run arch validate-facts --fixture
150+ # Equivalent public command once installed:
151+ uv run ledger validate-facts --fixture
140152```
141153
142154To build a tiny source-backed fixture from the packaged IRS SOI Table 1.1
@@ -258,7 +270,7 @@ expected first-class constraints, row-backed filter/constraint evidence,
258270concept alignment evidence, Axiom concept validation status, and stage-report
259271validity.
260272
261- To build the downstream integration artifact Microplex should consume , merge
273+ To build the downstream integration artifact Populace can inspect , merge
262274available source-package suites for a year into one bundle:
263275
264276``` bash
@@ -481,20 +493,20 @@ Target inputs use a three-table schema:
481493- ** stratum_constraints** : Rules defining each stratum.
482494- ** targets** : Source-published aggregate values linked to strata.
483495
484- These are inputs to Microplex target composition. Microplex owns the active,
496+ These are inputs to Populace target composition. Populace owns the active,
485497reconciled, aged target sets used for calibration.
486498
487- ## Arch Facts And Microplex Targets
499+ ## Ledger Facts And Populace Targets
488500
489- Source facts should be structurally normalized before Microplex considers them
501+ Source facts should be structurally normalized before Populace considers them
490502as calibration target candidates.
491503Normalization is about representation, not modeling: units, scales, typed
492504values, geography IDs, period IDs, and same-source arithmetic where the source
493505publishes the total/share relationship.
494506
495507Inflation, aging, cross-source reconciliation, source selection, and target
496- activation belong in Microplex Targets unless the source itself publishes the
497- adjusted or projected series.
508+ activation belong in Populace unless the source itself publishes the adjusted
509+ or projected series.
498510
499511``` python
500512from arch.facts import SourceFact
@@ -542,21 +554,21 @@ target_input = as_target(
542554
543555## Boundaries
544556
545- - ** Arch ** owns source data, provenance, source facts, aggregate facts, and
557+ - ** Ledger ** owns source data, provenance, source facts, aggregate facts, and
546558 microdata ingestion.
547- - ** Microplex Targets** owns source selection, reconciliation, aging, imputation,
559+ - ** Populace Targets** owns source selection, reconciliation, aging, imputation,
548560 active target sets, and calibration profiles.
549- - ** Microplex ** owns simulation interfaces, entity modeling, weights, and
561+ - ** Populace ** owns simulation interfaces, entity modeling, weights, and
550562 calibration execution.
551563- ** Jurisdiction source packages** such as ` arch-us ` and ` arch-uk ` own
552564 source-specific parsers and specs that emit shared Arch records.
553- - ** Jurisdiction simulation packages** such as ` microplex-us ` own
554- simulation-specific variable mappings and target recipes.
565+ - ** Jurisdiction simulation packages** own simulation-specific variable
566+ mappings and target recipes.
555567- ** PolicyEngine** owns policy-facing tools and analysis workflows.
556568
557569## Related Repositories
558570
559- - [ microplex ] ( https://github.com/PolicyEngine/microplex ) - Core microsimulation
560- abstractions and calibration interfaces .
561- - [ microplex-us ] ( https://github.com/PolicyEngine/microplex-us ) - US-specific
562- simulation adapters and calibration profiles .
571+ - [ populace ] ( https://github.com/PolicyEngine/populace ) - Simulation data builds,
572+ target selection, and calibration execution .
573+ - [ thesis ] ( https://github.com/PolicyEngine/thesis ) - Public-facing official
574+ observations and analysis surfaces backed by Ledger facts .
0 commit comments