Skip to content

Commit 80e132e

Browse files
committed
docs: govern historical artifacts
1 parent d7210e2 commit 80e132e

6 files changed

Lines changed: 34 additions & 3 deletions

File tree

AGENTS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Start with `CURRENT_STATE.md` for the current audit summary, then read `docs/age
1818
- `scripts/render_heatmap.py`: source-to-raster rebuild script
1919
- `scripts/check_homepage.py`: homepage smoke test script
2020
- `docs/heatmap-provenance.md`: provenance and rebuild notes
21+
- `docs/historical-artifacts.md`: policy for retained historical snapshots
2122

2223
## Commands To Run Before And After Changes
2324

@@ -56,7 +57,7 @@ Hand-authored:
5657

5758
Generated or historical artifact:
5859

59-
- `README.CRAWL.md`: historical crawl output with stale machine-local metadata; do not treat as canonical truth
60+
- `README.CRAWL.md`: historical crawl output with stale machine-local metadata; see `docs/historical-artifacts.md`
6061

6162
## Data, Citation, And Provenance Rules
6263

@@ -77,6 +78,7 @@ Generated or historical artifact:
7778
- Do not hand-edit generated artifacts if a real generator is introduced later.
7879
- Prefer edits that preserve GitHub Pages compatibility.
7980
- Avoid deleting historical artifacts unless their role is documented and the removal is clearly safe.
81+
- Keep retained historical snapshots documented in `docs/historical-artifacts.md`.
8082

8183
## Known Project-Specific Traps
8284

CURRENT_STATE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ This repository is a small GitHub Pages static site that publishes a single oral
1515
- `scripts/check_homepage.py` provides a non-UI smoke test for the rendered homepage.
1616
- The Jekyll config excludes `vendor/`, so the bundled gem tree is kept out of the public site build.
1717
- The validate workflow uses Node 24-compatible `actions/checkout@v6` and `actions/setup-python@v6`.
18+
- `README.CRAWL.md` is explicitly documented as a retained historical artifact in `docs/historical-artifacts.md`.
1819
- A repo-local validation command now checks required files, local links, key config values, and tracked OS junk.
1920
- The live GitHub Pages deployment returned HTTP 200 and served the expected page content in this session.
2021

@@ -64,6 +65,6 @@ The pinned build path is now committed in-repo, so local build instructions are
6465

6566
## Immediate Next Moves
6667

67-
1. Implement historical artifact governance in the next feature branch, focusing on the final treatment of `README.CRAWL.md` and any similar repo-only files.
68+
1. QA the historical artifact governance branch by confirming `README.CRAWL.md` stays out of the public site and the new policy doc stays discoverable.
6869
2. Keep `scripts/render_heatmap.py` and `Figure2-Teeth_v4.1.pdf` in sync if the figure changes.
6970
3. Run `python scripts/check_homepage.py` against `_site/index.html` after a local build if you need a homepage smoke check.

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,12 @@ A 5-point color scale indicates the health status of each tooth, from healthy (b
4141
* `scripts/render_heatmap.py`: Rebuild script for the published heatmap PNG.
4242
* `scripts/check_homepage.py`: Smoke test for the rendered homepage HTML.
4343
* `docs/heatmap-provenance.md`: Source artifact and rebuild notes.
44+
* `docs/historical-artifacts.md`: Policy for retained historical snapshots.
45+
* `README.CRAWL.md`: Historical crawl snapshot retained for auditability only.
46+
47+
## Historical Artifacts
48+
49+
`README.CRAWL.md` is kept as a historical artifact, not as canonical documentation. It is excluded from the public site, and the repo-level policy for retained snapshots lives in [`docs/historical-artifacts.md`](docs/historical-artifacts.md).
4450

4551
## Validate The Repo
4652

docs/agentic-overhaul/2026-05-audit.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Primary files:
1414
- `assets/css/style.scss.css`: site styling overrides
1515
- `oral-health-heatmap.png`: core visualization image
1616
- `README.CRAWL.md`: historical directory crawl artifact, not canonical documentation
17+
- `docs/historical-artifacts.md`: policy for retained historical snapshots
1718

1819
Infrastructure added in this audit:
1920

@@ -54,12 +55,12 @@ Scale: 0 = absent, 1 = fragile, 2 = basic, 3 = workable, 4 = strong, 5 = excelle
5455
- Local build tooling is now pinned with `Gemfile` and `.ruby-version`, and the build is wired into CI.
5556
- The heatmap source artifact is now committed as `Figure2-Teeth_v4.1.pdf`, with `scripts/render_heatmap.py` and `docs/heatmap-provenance.md` documenting the rebuild path.
5657
- The live GitHub Pages site returned `200 OK` and served the expected heatmap page in this session.
58+
- `README.CRAWL.md` remains at the repository root as a historical artifact, and `docs/historical-artifacts.md` records the policy for keeping it out of the public site.
5759

5860
## P1 Backlog
5961

6062
- Add a provenance note describing what inputs and scripts produced the final heatmap, if those materials can be published safely.
6163
- Add a screenshot or rendered-page check in CI if a stable local build path is introduced.
62-
- Clarify whether `README.CRAWL.md` should remain as an artifact, move under `docs/`, or be removed.
6364

6465
## P2 Backlog
6566

docs/historical-artifacts.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Historical Artifacts
2+
3+
This repository intentionally retains `README.CRAWL.md` as a historical crawl snapshot.
4+
5+
## Policy
6+
7+
- `README.CRAWL.md` is not canonical documentation.
8+
- It stays excluded from the public GitHub Pages site through `_config.yml`.
9+
- The file remains at the repository root so future audits can compare it against current repository truth.
10+
- Any future historical snapshots should follow the same pattern: document their purpose, keep them out of the public site, and point agents to the policy doc instead of the artifact itself.
11+
12+
## Current Scope
13+
14+
- The only retained historical artifact is `README.CRAWL.md`.
15+
- When in doubt, use `CURRENT_STATE.md`, `README.md`, and `AGENTS.md` as the authoritative operational docs.

scripts/validate_repo.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
ROOT / "AGENTS.md",
1818
ROOT / "docs" / "agentic-first-buildout-plan.md",
1919
ROOT / "docs" / "heatmap-provenance.md",
20+
ROOT / "docs" / "historical-artifacts.md",
2021
ROOT / "docs" / "agentic-overhaul" / "2026-05-audit.md",
2122
]
2223
REQUIRED_FILES = [
@@ -33,6 +34,7 @@
3334
ROOT / "scripts" / "check_homepage.py",
3435
ROOT / "docs" / "agentic-first-buildout-plan.md",
3536
ROOT / "docs" / "heatmap-provenance.md",
37+
ROOT / "docs" / "historical-artifacts.md",
3638
ROOT / "CURRENT_STATE.md",
3739
ROOT / "AGENTS.md",
3840
ROOT / "docs" / "agentic-overhaul" / "2026-05-audit.md",
@@ -137,6 +139,10 @@ def check_readme(warnings: list[str]) -> None:
137139
warnings.append("README.md does not document the homepage smoke test command.")
138140
if "Figure2-Teeth_v4.1.pdf" not in readme:
139141
warnings.append("README.md does not mention the committed heatmap source PDF.")
142+
if "docs/historical-artifacts.md" not in readme:
143+
warnings.append("README.md does not document the historical artifact policy.")
144+
if "README.CRAWL.md" not in readme:
145+
warnings.append("README.md does not mention the retained historical crawl snapshot.")
140146
crawl = ROOT / "README.CRAWL.md"
141147
if crawl.exists():
142148
crawl_text = crawl.read_text(encoding="utf-8")

0 commit comments

Comments
 (0)