|
2 | 2 |
|
3 | 3 | | | | |
4 | 4 | |---|---| |
5 | | -| **Status** | Proposed (design doc only — no code, migrations, or tests in this PR) | |
6 | | -| **Supersedes / builds on** | 0001 — Generic scheduled scraping + corpus groups ([PR #1444](https://github.com/Open-Source-Legal/OpenContracts/pull/1444); doc lands at `./0001-scheduled-scraping-and-corpus-groups.md` once merged) | |
| 5 | +| **Status** | Partially implemented — this doc is the original design rationale + gap analysis. For the operator/author how-to (and the now-self-contained pack layout: in-pack providers + `source_hosts`), see the guide: [Authoring an Authority Pack](../../guides/authoring-authority-packs.md). | |
| 6 | +| **Supersedes / builds on** | 0001 — Generic scheduled scraping + corpus groups (PR #1444; the `0001-…` proposal doc is not yet written) | |
7 | 7 | | **Relates to** | PR #1305 (Bolivian-law contributor PR, closed/reference), the Authority architecture (PRs #1990 / #1997 / #2037), [`authority-console.md`](../authority-console.md), [`reference-web-enrichment.md`](../reference-web-enrichment.md) | |
8 | 8 | | **Author** | follow-up to #1305 / #1444 | |
9 | 9 |
|
@@ -37,6 +37,19 @@ rather than against a `scraping/` app that was never built. |
37 | 37 | > therefore ships taxonomy + curated content + personas (no live fetch, so no |
38 | 38 | > host-allowlist edit is needed yet). |
39 | 39 |
|
| 40 | +> **Update — packs are now self-contained (gaps 1 & 6 closed).** A pack may now |
| 41 | +> ship its scraper inside the pack (`<pack>/providers/*.py`, discovered by the |
| 42 | +> pipeline registry from in-tree packs and out-of-tree dirs on the |
| 43 | +> `AUTHORITY_PACK_PATHS` setting) and declare the hosts it fetches from in |
| 44 | +> `pack.yaml` (`source_hosts:`, merged into the SSRF allowlist at runtime). The |
| 45 | +> "one un-packable edit" of §3 (the hardcoded host allowlist) and the |
| 46 | +> "single hardcoded package" of gap 6 (§7) no longer hold — a fetching pack is |
| 47 | +> portable as a directory, secrets still living in the `PipelineSettings` vault. |
| 48 | +> See [Authoring an Authority Pack](../../guides/authoring-authority-packs.md) |
| 49 | +> (tests: `test_authority_pack_providers.py`, `test_authority_source_hosts.py`). |
| 50 | +> The remaining gaps (scheduled scraping, multi-corpus orchestration, |
| 51 | +> config-declarable `authority_type`/shape grammars) are unchanged. |
| 52 | +
|
40 | 53 | ## 1. Context — three artifacts, one intent |
41 | 54 |
|
42 | 55 | | Artifact | What it is | Status | |
|
0 commit comments