PolicyEngine
diff --git a/‎changelog.d/trace-case-study-for-aea.added.md‎
Lines changed: 1 addition & 0 deletions b/‎changelog.d/trace-case-study-for-aea.added.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/trace-case-study.md‎
Lines changed: 95 additions & 0 deletions b/‎docs/trace-case-study.md‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 2 additions & 2 deletions b/‎pyproject.toml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎scripts/generate_trace_tros.py‎
Lines changed: 8 additions & 16 deletions b/‎scripts/generate_trace_tros.py‎
Lines changed: 8 additions & 16 deletions
diff --git a/‎src/policyengine/cli.py‎
Lines changed: 5 additions & 1 deletion b/‎src/policyengine/cli.py‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎src/policyengine/core/tax_benefit_model_version.py‎
Lines changed: 10 additions & 5 deletions b/‎src/policyengine/core/tax_benefit_model_version.py‎
Lines changed: 10 additions & 5 deletions
@@ -0,0 +1 @@
+Added `docs/trace-case-study.md`, a working draft describing the PolicyEngine TRACE use case for Lars Vilhuber (AEA Data Editor) and the TRACE project team. Covers which PolicyEngine surfaces warrant institutional certification, the precise claims a TRO lets us make, UK data as the strongest case, the three concrete workstreams (us-data build TROs, policyengine-api webapp-run TROs, policyengine-app "Cite this result" UI), and open questions we want feedback on.
@@ -0,0 +1,95 @@
+# PolicyEngine as a TRACE case study
+
+_Working draft, April 2026 — prepared after a 2026-04-21 meeting with Lars Vilhuber (AEA Data Editor), Tara Watson (Brookings), John Sabelhaus, Tim Clark, and Casper (TRACE project)._
+
+## What TRACE is for, in the PolicyEngine case
+
+TRACE (Transparent Research And Citation Exchange) defines a standards-based vocabulary — TROv 0.1 at `https://w3id.org/trace/trov/0.1#` — for documenting analytical artifacts by content hash under a SHACL-validatable JSON-LD grammar. A Transparent Research Object (TRO) binds inputs, code, and outputs in a way that a reader who cannot re-run the analysis can still verify that a specific set of files produced a specific set of results.
+
+The question we walked into the meeting with was: where in the PolicyEngine stack does TRACE add real value?
+
+The answer we walked out with is narrower and cleaner than what we had been building toward. TRACE is not a feature of the `policyengine` Python package for researchers running simulations on their own hardware. For that use case, readers who want to check a paper's numbers can just `pip install` the same pins and rerun. TRACE in that loop is documentation, not credibility.
+
+TRACE matters in exactly the places where the reader cannot easily re-run the analysis:
+
+1. **The calibrated microdata build.** Each `enhanced_cps_YYYY.h5` that we publish to Hugging Face is derived from inputs that the public cannot all access directly (IRS-PUF requires agreeing to IRS's terms of use; the build itself takes hours on Modal with specific GPU configurations). Each release emits a TRO that binds the upstream input fingerprints, the build code, and the output h5 under canonical TROv 0.1. **This is live today** — us-data PR #746 shipped the emission — though cross-linking from the Hugging Face dataset card is still in flight.
+
+2. **Simulation runs through policyengine.org.** When a researcher uses the webapp to score a reform, we run the simulation on our infrastructure against our pinned calibrated data and return the result. A paper that cites that result is asking its readers to trust PolicyEngine's institutional attestation — not to trust that the researcher reproduced a Python pipeline faithfully on their own laptop. A TRO signed by PolicyEngine and served from our infrastructure would make that institutional attestation explicit and machine-verifiable. **This is not yet live** — backend emission is scoped in policyengine-api#3485, the "Cite this result" UI in policyengine-app#2830, both blocked on a pe.py v4 migration (api#3486, draft in #3487). This document describes the intended shape of the workflow, not its current state.
+
+## The claims a PolicyEngine TRO should let us make
+
+Before TRACE, a paper citing a PolicyEngine result could say: "PolicyEngine-US computed an EITC expansion impact of $X using `policyengine-us==1.653.3` and `policyengine-us-data==1.85.2`." The reader had to take it on faith that those versions, run on that reform, actually produced $X — or install the pins and try it themselves, which presumes the researcher's environment was not modified.
+
+A TRO emitted by policyengine.org would let the paper cite a URL instead. That URL would resolve to a JSON-LD document the reader can validate with a stock tool. The artifact set we are designing toward, pinned by SHA-256:
+
+- The **rules bundle**: wheel hashes for `policyengine` and `policyengine-us` at the version resolved at run time. (We do not pin transitive Python dependencies inside the TRO — TRACE has explicitly not built that in, and a verifier who wants to reconstruct the full environment can resolve the declared dependencies against a public index.)
+- The **calibrated microdata**: the `enhanced_cps_2024.h5` SHA-256 and the `DataReleaseManifest` that describes how it was built.
+- The **reform**: the full reform JSON submitted by the user, content-hashed.
+- The **inputs**: for a household-level simulation, the household JSON the user entered; for an economy-wide simulation, the configuration payload.
+- The **outputs**: a content-hashed `results.json` carrying the aggregate metrics the webapp displays. Whether to *also* bind a full per-household weighted simulation frame is an open design question (see below) — it would enable downstream custom splits without re-running the simulation, at a file-size and privacy-posture cost that varies by country.
+- The **institutional attestation**: CI/deploy run URL, git SHA, cloud region, timestamp, and a cryptographic signature. The signing mechanism is not yet settled (see open questions); options under consideration include a GCP workload-identity short-lived signature, a published keychain rooted in a DNS TXT record at policyengine.org, or a Sigstore-style transparency log.
+
+Claims we believe such a TRO *should* support, in plain language:
+
+1. _These were the rules, this was the calibrated microdata, and these were the inputs that produced those outputs._ — This is the artifact-composition claim; TROv core supports it.
+2. _PolicyEngine as an institution ran this simulation; the researcher did not modify the code between our servers and their paper._ — This requires the institutional-attestation design to be nailed down. The service-account signature we envision is one implementation; it is not the only one.
+3. _Any future reader can recover the full per-household counterfactual frame for re-analysis, bounded only by what we legally can redistribute._ — This depends on the per-household-frame default-or-opt-in design question below.
+
+The per-household frame question deserves a specific flag: whether the webapp TRO binds the full per-household counterfactual frame by default, or only on request, is unsettled. Papers cite aggregates; reviewers and follow-up work want distributions, state-level breakdowns, variables the paper did not headline; but an always-default full frame has file-size and privacy-posture costs, especially in restricted-data countries. We intend to make the trade-off deliberately rather than defaulting to either extreme. Transcript note: this came up in the meeting (Sabelhaus on what the microdata contains beyond the summary, Max on whether the full frame belongs in a TRO); no consensus on "default-on" emerged.
+
+One framing point worth being careful about: what PolicyEngine provides is *institution-backed self-attestation*, not arms-length third-party certification. The arms-length property — that the verifier of a claim is structurally independent of the party being audited — is genuinely absent when PolicyEngine both runs the simulation and signs the TRO. What the TRO buys in that case is structured evidence that a reader (or a reviewer) can query, backed by institutional reputation, not cryptographic independence. That is a real step up from "trust me, I ran it" — but we should not market it as more than it is.
+
+## UK data as a strong case for TRACE
+
+In our US work the underlying calibrated h5 is already public on Hugging Face, so a local rerun is in principle possible. That weakens the TRACE value proposition on US — a reader motivated enough to verify could just `pip install` the pins and try it themselves. The TRO still buys institutional attestation (the researcher did not modify the code), but re-running is not materially blocked.
+
+In our UK work the underlying microdata is UK Data Service–licensed and cannot be redistributed. A researcher who wants to verify a UK PolicyEngine result cannot re-run it on their own machine on any reasonable timescale, because they cannot acquire the inputs easily. Institutional attestation is a particularly strong credibility path here, which is why the meeting flagged this kind of scenario as where TRACE adds the most value.
+
+One caveat worth naming explicitly: we are considering publishing a re-calibrated UK variant derived entirely from public-use inputs, which would partially lift the restriction. If that lands, the US and UK cases converge again. And the TRACE project's own plans for external-identifier pinning (UKDS study number + checksum, IRS-PUF agreement number + checksum) — not yet firmed up in TROv at time of writing — would provide an even cleaner mechanism for binding restricted-input provenance without redistribution.
+
+## What is explicitly NOT a TRACE case for us
+
+It is worth being equally clear about where TRACE does *not* add value for PolicyEngine, so we do not accidentally scope it there:
+
+- **A researcher running `policyengine.py` locally and emitting their own TRO.** Readers can `pip install` the same pins and rerun themselves. A TRO is bookkeeping, not a credibility upgrade. The TRO emission helpers in `policyengine.py` exist because they are reused by the two cases above, not because local emission is the flagship user experience.
+- **Tracing transitive Python dependencies.** TRACE has, per the meeting, explicitly not built this in, and we should not either. The code documents its declared dependencies; a verifier can resolve them against a public index.
+- **Anything that replaces plain version-and-vintage identification.** Much of what matters for reproducibility is just showing "they used that file with that version." That is documentation, not TRACE — and it is often enough on its own, especially for researchers running the Python package against public-use inputs.
+
+## Adjacent workstreams TRACE does not cover
+
+Several reproducibility commitments came up in the meeting that are TRACE-adjacent rather than TRACE-solved. Flagging them so they do not get lost:
+
+- **Preservation-grade archiving.** Hugging Face, where our calibrated h5 artifacts are hosted today, does not publish a preservation commitment comparable to Zenodo or a CLOCKSS / LOCKSS participant. For a TRO citation URL to be durable decades from now, the artifacts it pins need to live somewhere with an explicit long-term preservation policy. Zenodo as a secondary / mirror target is worth serious consideration.
+- **PolicyEngine-specific TRACE vocabulary contribution.** We already use `pe:*` extension fields; as we implement and find patterns that generalize (e.g., institution-backed self-attestation, microdata-build provenance, infrastructure-run attestation), contributing those upstream to TROv vocabulary design is in scope.
+- **Plain version-identification work outside TRACE.** Version badges, shareable permalinks that resolve to the same numbers, a "why did this number move?" diff view between release pairs. These are separate deliverables that are on our app roadmap; TRACE is not the right frame for them.
+
+Both external-identifier pinning and OS / compute-environment capture are on the TRACE roadmap and would help when they land. We will adopt as they ship.
+
+## What PolicyEngine is building in response
+
+Three concrete workstreams, each tracked as a GitHub issue:
+
+- **`policyengine-us-data`**: each `enhanced_cps_YYYY.h5` release already emits a build TRO. We will verify these TROs are published alongside the h5 and cross-linked from the Hugging Face dataset card so they are discoverable. (us-data PR #746 shipped the emission; issue #808 addresses a parallel licensing-documentation correction.)
+- **`policyengine-api`**: emit a TRACE TRO for every webapp simulation run. The exact signing mechanism and persistence store are open design questions — service-account + GCS is the current strawman, but a Zenodo / Sigstore / DNS-rooted-keychain alternative is under consideration, especially for long-term durability. (Issue #3485; prerequisite v4 migration in #3487.)
+- **`policyengine-app`**: surface the TRO as a "Cite this result" action with a citation download panel, an always-visible rules-vs-data version badge so the "rules changed or data changed?" question is answerable at a glance, and shareable permalinks that resolve the same numbers forever. (Issue #2830, blocked on the api work.)
+
+Documentation for researchers is being updated (household-api-docs PR #7) to put the webapp-run citation flow ahead of the local-Python-CLI flow, matching the framing that emerged in the meeting.
+
+## What TRACE gets from us as a case study
+
+A few things we think are worth surfacing to the TRACE project directly:
+
+1. **A use case that is infrastructure-certifying, not author-certifying.** The canonical TRACE scenario is a researcher bundling their code and data. Ours is a web service signing runs on behalf of researchers. The distinction matters for how institutional attestation gets represented in the vocabulary and for what SHACL shapes reject.
+2. **Microdata provenance as a first-class artifact class.** Our build pipeline takes hours on specialized hardware and draws on half a dozen upstream sources with varying access levels. The TROv concept of `ArtifactComposition` handles this well, but concrete experience with a working microsimulation build may be useful input as the vocabulary evolves.
+3. **A live stress test for `pe:*` extension discipline.** We have a working example of mapping institutionally-specific certification metadata (`pe:certifiedForModelVersion`, `pe:compatibilityBasis`, `pe:emittedIn`, `pe:ciRunUrl`, `pe:ciGitSha`) onto the TRACE core without polluting TROv shapes. If any of those generalize, we would contribute them upstream.
+
+We will keep notes as the implementation proceeds. The TRACE team is welcome to any of this material as part of their grant work.
+
+## Open questions
+
+- **Per-household frame as default or opt-in.** The meeting did not reach consensus on this; we flagged it as unsettled. Default-on has downstream-analysis utility but file-size and privacy-posture costs. Default-off makes TROs smaller but forces downstream researchers to rerun the simulation for any custom split. Design choice should be made deliberately with trade-offs listed, not defaulted to either extreme.
+- **Retention and addressing of webapp-run TROs.** These become permanent citations. Commitments needed on durable URLs, content-addressing, migration policy for storage-provider changes, and whether we ever prune. Zenodo as a secondary / mirror target is worth serious consideration — Hugging Face does not publish a preservation commitment, and a TRO URL that 404s in 2040 is a worse outcome than a TRO URL that 404s in a PolicyEngine-controlled bucket.
+- **Signing key and key trust model.** A PolicyEngine service-account signature is straightforward to implement; the harder question is how a reader in 2040 verifies the signature belongs to PolicyEngine. Options include a published keychain rooted in a DNS TXT record, a Sigstore-style transparency log, or GCP workload-identity with short-lived signatures. Chain-of-trust design deserves more thought than "we sign it with a service account."
+- **Binding to the actual production runtime.** CI run URL + git SHA documents how the container that ran the simulation was *built*. The TRO should additionally bind the running container image SHA, cloud region, and pod / function instance at execution time. Otherwise the TRO only attests to a build, not a run.
+
+Feedback welcomed from Lars, Tim, Casper, Tara, John — and anyone else reading.
@@ -46,7 +46,7 @@ uk = [
 ]
 us = [
     "policyengine_core>=3.25.0",
-    "policyengine-us==1.653.3",
+    "policyengine-us==1.667.1",
 ]
 dev = [
     "pytest",
@@ -61,7 +61,7 @@ dev = [
     "ruff>=0.9.0",
     "policyengine_core>=3.25.0",
     "policyengine-uk==2.88.0",
-    "policyengine-us==1.653.3",
+    "policyengine-us==1.667.1",
     "towncrier>=24.8.0",
     "mypy>=1.11.0",
     "pytest-cov>=5.0.0",
 
@@ -3,14 +3,9 @@
 Writes ``data/release_manifests/{country}.trace.tro.jsonld`` for each
 country whose bundled manifest ships in the wheel. Run this before
 releasing a new ``policyengine.py`` version so the packaged TRO
-matches the pinned bundle. Requires HTTPS access to the data release
-manifest (and ``HUGGING_FACE_TOKEN`` for private country data).
-
-If a country previously had a TRO on disk and the new run cannot
-regenerate it (e.g. a missing secret or an unreachable HF endpoint),
-the script exits non-zero so the release workflow blocks rather than
-silently shipping a stale/missing TRO. If no bundled release manifests
-are found at all, the script exits 0 with a notice (nothing to do).
+matches the pinned bundle. The richer data release manifest is included
+when available; otherwise the TRO still binds the certified dataset
+sha256 and URI pinned in the bundled release manifest.
 """
 
 from __future__ import annotations
@@ -47,14 +42,11 @@ def regenerate_all() -> tuple[list[Path], list[tuple[str, Path, str]]]:
         try:
             data_release_manifest = get_data_release_manifest(country_id)
         except DataReleaseManifestUnavailableError as exc:
-            if tro_path.exists():
-                regressions.append((country_id, tro_path, str(exc)))
-            else:
-                print(
-                    f"skipped {country_id}: {exc}",
-                    file=sys.stderr,
-                )
-            continue
+            data_release_manifest = None
+            print(
+                f"warning: {country_id}: {exc}; writing limited TRO",
+                file=sys.stderr,
+            )
         tro = build_trace_tro_from_release_bundle(
             country_manifest,
             data_release_manifest,
 
@@ -19,6 +19,7 @@
 from typing import Optional, Sequence
 
 from policyengine.provenance.manifest import (
+    DataReleaseManifestUnavailableError,
     get_data_release_manifest,
     get_release_manifest,
 )
@@ -69,7 +70,10 @@ def _parser() -> argparse.ArgumentParser:
 
 def _emit_bundle_tro(country_id: str, out: Optional[Path]) -> int:
     country_manifest = get_release_manifest(country_id)
-    data_release_manifest = get_data_release_manifest(country_id)
+    try:
+        data_release_manifest = get_data_release_manifest(country_id)
+    except DataReleaseManifestUnavailableError:
+        data_release_manifest = None
     tro = build_trace_tro_from_release_bundle(
         country_manifest,
         data_release_manifest,
 
@@ -7,6 +7,7 @@
 from policyengine.provenance.manifest import (
     CountryReleaseManifest,
     DataCertification,
+    DataReleaseManifestUnavailableError,
     PackageVersion,
     get_data_release_manifest,
 )
@@ -214,16 +215,20 @@ def release_bundle(self) -> dict[str, Optional[str]]:
     def trace_tro(self) -> dict:
         """Build a TRACE TRO for this certified bundle.
 
-        Fetches the published data release manifest so the TRO can pin
-        the exact dataset sha256. Requires a bundled release manifest.
+        Uses the published data release manifest when available. If it
+        has not been published, the TRO falls back to the certified
+        dataset sha256 and URI pinned in the bundled release manifest.
         """
         if self.release_manifest is None:
             raise ValueError(
                 "TRACE TRO export requires a bundled country release manifest."
             )
-        data_release_manifest = get_data_release_manifest(
-            self.release_manifest.country_id
-        )
+        try:
+            data_release_manifest = get_data_release_manifest(
+                self.release_manifest.country_id
+            )
+        except DataReleaseManifestUnavailableError:
+            data_release_manifest = None
         return build_trace_tro_from_release_bundle(
             self.release_manifest,
             data_release_manifest,
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+Added `docs/trace-case-study.md`, a working draft describing the PolicyEngine TRACE use case for Lars Vilhuber (AEA Data Editor) and the TRACE project team. Covers which PolicyEngine surfaces warrant institutional certification, the precise claims a TRO lets us make, UK data as the strongest case, the three concrete workstreams (us-data build TROs, policyengine-api webapp-run TROs, policyengine-app "Cite this result" UI), and open questions we want feedback on.