Skip to content

Commit a3fdc46

Browse files
committed
Run the check for offline live links in the CI.
1 parent 75b68f9 commit a3fdc46

4 files changed

Lines changed: 31 additions & 10 deletions

File tree

.github/workflows/checks.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,13 @@ jobs:
5858
--offline --include-fragments
5959
--index-files index.html
6060
--root-dir docs/_site-offline
61-
docs/_site-offline
61+
docs/_site-offline
62+
- name: Check for surviving live-site links in offline tree
63+
# Flags any https://docs.twinbasic.com/<path> reference left in
64+
# _site-offline/ HTML outside <code>/<pre> blocks. After offlinify
65+
# strips the jekyll-seo-tag block, anything surviving is a source
66+
# link that points at the live site instead of using a relative or
67+
# /tB/... permalink that resolves locally. The bare root URL
68+
# (https://docs.twinbasic.com[/]) is exempt -- intentional "go to
69+
# the live site" links are allowed.
70+
run: python scripts/check_offline_live_links.py

.github/workflows/jekyll-gh-pages.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,15 @@ jobs:
101101
--index-files index.html
102102
--root-dir docs/_site-offline
103103
docs/_site-offline
104+
- name: Check for surviving live-site links in offline tree
105+
# Flags any https://docs.twinbasic.com/<path> reference left in
106+
# _site-offline/ HTML outside <code>/<pre> blocks. After offlinify
107+
# strips the jekyll-seo-tag block, anything surviving is a source
108+
# link that points at the live site instead of using a relative or
109+
# /tB/... permalink that resolves locally. The bare root URL
110+
# (https://docs.twinbasic.com[/]) is exempt -- intentional "go to
111+
# the live site" links are allowed.
112+
run: python scripts/check_offline_live_links.py
104113
- name: Upload Pages artifact
105114
uses: actions/upload-pages-artifact@v5
106115
with:

docs/Miscellaneous/Documentation Development.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ To check that none of the internal links in the most recent documentation build
201201

202202
check.bat
203203

204-
This runs three checks: [Lychee](https://github.com/lycheeverse/lychee) in offline mode against `_site/` (the live tree), the same against `_site-offline/` (the file://-browsable mirror), and a small Python pass over `_site-offline/` that flags any surviving `https://docs.twinbasic.com/<path>` link --- the offline mirror should not navigate back to the live docs site.
204+
This runs three checks: `scripts/check_links.py` against `_site/` (the live tree, in offline mode), the same against `_site-offline/` (the file://-browsable mirror), and `scripts/check_offline_live_links.py` over `_site-offline/` that flags any surviving `https://docs.twinbasic.com/<path>` link --- the offline mirror should not navigate back to the live docs site. The same three checks run in CI on every pull request and on every push to `staging`.
205205

206206
### Building and Local Serving
207207

docs/_plugins/offlinify.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -300,22 +300,25 @@ The offline build touches the following files:
300300
| `docs/_config.yml` | `also_build_offline: true` (default-on) and `exclude: [_site-offline]` (keeps Jekyll's watcher from rebuilding on the plugin's own output). |
301301
| `docs/build.bat` | Plain `bundle exec jekyll build` — produces `_site/`, `_site-offline/`, and (via `pdfify.rb`) `_site-pdf/` in one run. |
302302
| `docs/serve.bat` | `bundle exec jekyll serve` — watcher-friendly thanks to the exclude. |
303-
| `docs/check.bat` | Local link check (dev-side only; CI runs the two lychee passes directly). Three steps: lychee permissive on `_site/`, lychee strict on `_site-offline/`, and `scripts/check_offline_live_links.py` against `_site-offline/`. Exits non-zero on any failure. |
304-
| `scripts/check_offline_live_links.py` | Flags any `https://docs.twinbasic.com/<path>` reference that survived offlinify in `_site-offline/` HTML, outside `<code>` / `<pre>` blocks. Skips the bare root (`https://docs.twinbasic.com[/]`) since intentional "go to the live site" links are allowed. Caught locally by `check.bat`; not wired into CI. |
303+
| `docs/check.bat` | Local link check (CI runs the same three passes via the workflows). Three steps: `scripts/check_links.py` permissive on `_site/`, `scripts/check_links.py` strict on `_site-offline/`, and `scripts/check_offline_live_links.py` against `_site-offline/`. Exits non-zero on any failure. |
304+
| `scripts/check_offline_live_links.py` | Flags any `https://docs.twinbasic.com/<path>` reference that survived offlinify in `_site-offline/` HTML, outside `<code>` / `<pre>` blocks. Skips the bare root (`https://docs.twinbasic.com[/]`) since intentional "go to the live site" links are allowed. Run by `check.bat` locally and by both CI workflows after the offline link check. |
305305
| `docs/.gitignore` | `_site`, `_site-offline`, and `_site-pdf` all excluded from git. |
306-
| `.github/workflows/jekyll-gh-pages.yml` | CI workflow. Builds, runs lychee against both trees, deploys to Pages, and (on manual dispatch) packages `_site-offline/` as a release artifact. |
306+
| `.github/workflows/jekyll-gh-pages.yml` | Deploy workflow (push to `staging`, manual dispatch). Builds, runs lychee against `_site/`, runs `scripts/check_links.py` against `_site-offline/`, runs `scripts/check_offline_live_links.py` against `_site-offline/`, deploys to Pages, and (on manual dispatch) packages `_site-offline/` as a release artifact. |
307+
| `.github/workflows/checks.yml` | PR-gating workflow (pull-request to `main`, manual dispatch). Same three link-check steps as the deploy workflow; no deploy or release. |
307308

308309
## CI integration
309310

310311
`bundle exec jekyll build` in CI passes `--baseurl "${{ steps.pages.outputs.base_path }}"` from `actions/configure-pages`. For a Pages site with a custom domain (CNAME), base_path is empty. For a project page without a custom domain, it's `/repo-name`. Offlinify handles both cases — `normalize_baseurl` in `setup` produces the right prefix to strip.
311312

312-
The workflow has two lychee steps after the build:
313+
The workflow has three link-check steps after the build:
313314

314-
1. **Against `_site/`**, with `--fallback-extensions html` and a `--remap` that strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like `/FAQ` get served as `/FAQ.html`. Without `--fallback-extensions html`, every pretty permalink would appear broken in this check.
315+
1. **Lychee against `_site/`**, with `--fallback-extensions html` and a `--remap` that strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like `/FAQ` get served as `/FAQ.html`. Without `--fallback-extensions html`, every pretty permalink would appear broken in this check. Lychee (not `scripts/check_links.py`) handles the online tree because `--remap` isn't implemented in the Python checker; the offline tree below has all baseurl prefixes already stripped by offlinify and doesn't need it.
315316

316-
2. **Against `_site-offline/`**, strict — no extension fallback (`--index-files 'index.html'` only; the online check also accepts the bare directory via `,.`). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not `Foo/index.html`) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance.
317+
2. **`scripts/check_links.py` against `_site-offline/`**, strict — no extension fallback (`--index-files index.html` only; the online check also accepts the bare directory via `,.`). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not `Foo/index.html`) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance. The Python checker is roughly 25× faster than lychee on this workload and a bit stricter (catches missing `<script src>` targets and trailing slashes on file-shaped URLs).
317318

318-
Both checks set `fail: true`. Any unresolved link fails the build, blocks the Pages deploy, and blocks the release upload. After both lychee runs succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like `docs-YYYY-MM-DD-HHMM` (UTC), and creates a GitHub release with `twinbasic-docs-offline.zip` attached via `softprops/action-gh-release@v2`.
319+
3. **`scripts/check_offline_live_links.py` against `_site-offline/`**, flagging any surviving `https://docs.twinbasic.com/<path>` reference outside `<code>` / `<pre>` blocks (the bare root is exempt — see [Failure modes: Surviving live-site links](#failure-modes)).
320+
321+
All three steps fail the build on the first non-zero exit, blocking the Pages deploy and the release upload. After they succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like `docs-YYYY-MM-DD-HHMM` (UTC), and creates a GitHub release with `twinbasic-docs-offline.zip` attached via `softprops/action-gh-release@v2`.
319322

320323
## Failure modes
321324

@@ -331,7 +334,7 @@ The plugin surfaces several conditions in its summary log lines:
331334

332335
- **`_site-offline/` triggering `jekyll serve` rebuilds.** Was a problem; now handled by two things in combination: `exclude: [_site-offline]` in `_config.yml`, and the "clean contents but keep the directory" trick in the wipe step (which keeps all watcher events under `_site-offline/...` where the exclude matches).
333336

334-
- **Surviving live-site links.** The [SEO block stripping](#seo-block-stripping) pass removes the bulk of `https://docs.twinbasic.com` references each page contains (canonical link, OpenGraph URL, JSON-LD `url`). Anything left in `_site-offline/` is a source link that points at the live docs site -- usually a markdown author writing `https://docs.twinbasic.com/<path>` instead of a relative link or `/tB/...` permalink, which would silently navigate the offline reader back online. `scripts/check_offline_live_links.py` (run by `check.bat` after the offline lychee pass) flags these locally; the bare root `https://docs.twinbasic.com[/]` is exempt since intentional "go to the live site" links are allowed. CI does not run this check.
337+
- **Surviving live-site links.** The [SEO block stripping](#seo-block-stripping) pass removes the bulk of `https://docs.twinbasic.com` references each page contains (canonical link, OpenGraph URL, JSON-LD `url`). Anything left in `_site-offline/` is a source link that points at the live docs site -- usually a markdown author writing `https://docs.twinbasic.com/<path>` instead of a relative link or `/tB/...` permalink, which would silently navigate the offline reader back online. `scripts/check_offline_live_links.py` flags these; the bare root `https://docs.twinbasic.com[/]` is exempt since intentional "go to the live site" links are allowed. Run locally by `check.bat` and in CI by both workflows after the offline link check.
335338

336339
## Performance
337340

0 commit comments

Comments
 (0)