Skip to content

Commit 68c04be

Browse files
authored
Merge pull request #143 from KubaO/staging
Speed up link checking by not using Lychee.
2 parents e07f396 + a3fdc46 commit 68c04be

18 files changed

Lines changed: 610 additions & 31 deletions

File tree

.github/workflows/checks.yml

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
- name: Build with Jekyll
3636
run: bundle exec jekyll build
3737
working-directory: ./docs
38-
- name: Run Lychee
38+
- name: Check online links (lychee)
3939
uses: lycheeverse/lychee-action@v2
4040
with:
4141
args: >-
@@ -45,4 +45,26 @@ jobs:
4545
--root-dir ${{ github.workspace }}/docs/_site
4646
./_site
4747
workingDirectory: ./docs
48-
fail: true
48+
fail: true
49+
- name: Set up Python for offline link check
50+
uses: actions/setup-python@v5
51+
with:
52+
python-version: '3.14'
53+
- name: Install Python deps
54+
run: pip install -r requirements.txt
55+
- name: Check offline links (check_links.py)
56+
run: >-
57+
python scripts/check_links.py
58+
--offline --include-fragments
59+
--index-files index.html
60+
--root-dir docs/_site-offline
61+
docs/_site-offline
62+
- name: Check for surviving live-site links in offline tree
63+
# Flags any https://docs.twinbasic.com/<path> reference left in
64+
# _site-offline/ HTML outside <code>/<pre> blocks. After offlinify
65+
# strips the jekyll-seo-tag block, anything surviving is a source
66+
# link that points at the live site instead of using a relative or
67+
# /tB/... permalink that resolves locally. The bare root URL
68+
# (https://docs.twinbasic.com[/]) is exempt -- intentional "go to
69+
# the live site" links are allowed.
70+
run: python scripts/check_offline_live_links.py

.github/workflows/jekyll-gh-pages.yml

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ jobs:
5757
env:
5858
JEKYLL_ENV: production
5959
PAGES_REPO_NWO: "${{ github.repository }}"
60-
- name: Run Lychee against the online tree
60+
- name: Check online links (lychee)
6161
uses: lycheeverse/lychee-action@v2
6262
with:
6363
# --remap matches the fully-resolved file URI (not the raw href), so the pattern
@@ -68,6 +68,11 @@ jobs:
6868
# `--fallback-extensions html` mirrors what GitHub Pages does at request time:
6969
# an extensionless URL like `/FAQ` is served as `/FAQ.html`. Without the flag
7070
# lychee would flag every pretty permalink on the site.
71+
#
72+
# Lychee, not the Python checker, handles the online tree here because the
73+
# `--remap` flag isn't implemented by scripts/check_links.py; the offline tree
74+
# below has all baseurl prefixes already stripped by the offlinify plugin and
75+
# so doesn't need it.
7176
args: >-
7277
--offline --include-fragments
7378
--fallback-extensions html
@@ -77,22 +82,34 @@ jobs:
7782
./_site
7883
workingDirectory: ./docs
7984
fail: true
80-
- name: Run Lychee against the offline tree
81-
uses: lycheeverse/lychee-action@v2
85+
- name: Set up Python for offline link check
86+
uses: actions/setup-python@v5
8287
with:
83-
# Strict check on `_site-offline/`: every link must resolve to an actual file
84-
# under `file://`, with no extension fallback. Catches relative links in
85-
# markdown sources that point at a permalink that doesn't match the rendered
86-
# filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not
87-
# `Foo/index.html`) -- the kind of breakage the online check above hides
88-
# behind `--fallback-extensions html`.
89-
args: >-
90-
--offline --include-fragments
91-
--index-files 'index.html'
92-
--root-dir ${{ github.workspace }}/docs/_site-offline
93-
./_site-offline
94-
workingDirectory: ./docs
95-
fail: true
88+
python-version: '3.14'
89+
- name: Install Python deps
90+
run: pip install -r requirements.txt
91+
- name: Check offline links (check_links.py)
92+
# Strict check on `_site-offline/`: every link must resolve to an actual file
93+
# under `file://`, with no extension fallback. Catches relative links in
94+
# markdown sources that point at a permalink that doesn't match the rendered
95+
# filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not
96+
# `Foo/index.html`) -- the kind of breakage the online check above hides
97+
# behind `--fallback-extensions html`.
98+
run: >-
99+
python scripts/check_links.py
100+
--offline --include-fragments
101+
--index-files index.html
102+
--root-dir docs/_site-offline
103+
docs/_site-offline
104+
- name: Check for surviving live-site links in offline tree
105+
# Flags any https://docs.twinbasic.com/<path> reference left in
106+
# _site-offline/ HTML outside <code>/<pre> blocks. After offlinify
107+
# strips the jekyll-seo-tag block, anything surviving is a source
108+
# link that points at the live site instead of using a relative or
109+
# /tB/... permalink that resolves locally. The bare root URL
110+
# (https://docs.twinbasic.com[/]) is exempt -- intentional "go to
111+
# the live site" links are allowed.
112+
run: python scripts/check_offline_live_links.py
96113
- name: Upload Pages artifact
97114
uses: actions/upload-pages-artifact@v5
98115
with:

docs/Miscellaneous/Documentation Development.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ To check that none of the internal links in the most recent documentation build
201201

202202
check.bat
203203

204-
This runs three checks: [Lychee](https://github.com/lycheeverse/lychee) in offline mode against `_site/` (the live tree), the same against `_site-offline/` (the file://-browsable mirror), and a small Python pass over `_site-offline/` that flags any surviving `https://docs.twinbasic.com/<path>` link --- the offline mirror should not navigate back to the live docs site.
204+
This runs three checks: `scripts/check_links.py` against `_site/` (the live tree, in offline mode), the same against `_site-offline/` (the file://-browsable mirror), and `scripts/check_offline_live_links.py` over `_site-offline/` that flags any surviving `https://docs.twinbasic.com/<path>` link --- the offline mirror should not navigate back to the live docs site. The same three checks run in CI on every pull request and on every push to `staging`.
205205

206206
### Building and Local Serving
207207

docs/_plugins/offlinify.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -300,22 +300,25 @@ The offline build touches the following files:
300300
| `docs/_config.yml` | `also_build_offline: true` (default-on) and `exclude: [_site-offline]` (keeps Jekyll's watcher from rebuilding on the plugin's own output). |
301301
| `docs/build.bat` | Plain `bundle exec jekyll build` — produces `_site/`, `_site-offline/`, and (via `pdfify.rb`) `_site-pdf/` in one run. |
302302
| `docs/serve.bat` | `bundle exec jekyll serve` — watcher-friendly thanks to the exclude. |
303-
| `docs/check.bat` | Local link check (dev-side only; CI runs the two lychee passes directly). Three steps: lychee permissive on `_site/`, lychee strict on `_site-offline/`, and `scripts/check_offline_live_links.py` against `_site-offline/`. Exits non-zero on any failure. |
304-
| `scripts/check_offline_live_links.py` | Flags any `https://docs.twinbasic.com/<path>` reference that survived offlinify in `_site-offline/` HTML, outside `<code>` / `<pre>` blocks. Skips the bare root (`https://docs.twinbasic.com[/]`) since intentional "go to the live site" links are allowed. Caught locally by `check.bat`; not wired into CI. |
303+
| `docs/check.bat` | Local link check (CI runs the same three passes via the workflows). Three steps: `scripts/check_links.py` permissive on `_site/`, `scripts/check_links.py` strict on `_site-offline/`, and `scripts/check_offline_live_links.py` against `_site-offline/`. Exits non-zero on any failure. |
304+
| `scripts/check_offline_live_links.py` | Flags any `https://docs.twinbasic.com/<path>` reference that survived offlinify in `_site-offline/` HTML, outside `<code>` / `<pre>` blocks. Skips the bare root (`https://docs.twinbasic.com[/]`) since intentional "go to the live site" links are allowed. Run by `check.bat` locally and by both CI workflows after the offline link check. |
305305
| `docs/.gitignore` | `_site`, `_site-offline`, and `_site-pdf` all excluded from git. |
306-
| `.github/workflows/jekyll-gh-pages.yml` | CI workflow. Builds, runs lychee against both trees, deploys to Pages, and (on manual dispatch) packages `_site-offline/` as a release artifact. |
306+
| `.github/workflows/jekyll-gh-pages.yml` | Deploy workflow (push to `staging`, manual dispatch). Builds, runs lychee against `_site/`, runs `scripts/check_links.py` against `_site-offline/`, runs `scripts/check_offline_live_links.py` against `_site-offline/`, deploys to Pages, and (on manual dispatch) packages `_site-offline/` as a release artifact. |
307+
| `.github/workflows/checks.yml` | PR-gating workflow (pull-request to `main`, manual dispatch). Same three link-check steps as the deploy workflow; no deploy or release. |
307308

308309
## CI integration
309310

310311
`bundle exec jekyll build` in CI passes `--baseurl "${{ steps.pages.outputs.base_path }}"` from `actions/configure-pages`. For a Pages site with a custom domain (CNAME), base_path is empty. For a project page without a custom domain, it's `/repo-name`. Offlinify handles both cases — `normalize_baseurl` in `setup` produces the right prefix to strip.
311312

312-
The workflow has two lychee steps after the build:
313+
The workflow has three link-check steps after the build:
313314

314-
1. **Against `_site/`**, with `--fallback-extensions html` and a `--remap` that strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like `/FAQ` get served as `/FAQ.html`. Without `--fallback-extensions html`, every pretty permalink would appear broken in this check.
315+
1. **Lychee against `_site/`**, with `--fallback-extensions html` and a `--remap` that strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like `/FAQ` get served as `/FAQ.html`. Without `--fallback-extensions html`, every pretty permalink would appear broken in this check. Lychee (not `scripts/check_links.py`) handles the online tree because `--remap` isn't implemented in the Python checker; the offline tree below has all baseurl prefixes already stripped by offlinify and doesn't need it.
315316

316-
2. **Against `_site-offline/`**, strict — no extension fallback (`--index-files 'index.html'` only; the online check also accepts the bare directory via `,.`). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not `Foo/index.html`) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance.
317+
2. **`scripts/check_links.py` against `_site-offline/`**, strict — no extension fallback (`--index-files index.html` only; the online check also accepts the bare directory via `,.`). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not `Foo/index.html`) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance. The Python checker is roughly 25× faster than lychee on this workload and a bit stricter (catches missing `<script src>` targets and trailing slashes on file-shaped URLs).
317318

318-
Both checks set `fail: true`. Any unresolved link fails the build, blocks the Pages deploy, and blocks the release upload. After both lychee runs succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like `docs-YYYY-MM-DD-HHMM` (UTC), and creates a GitHub release with `twinbasic-docs-offline.zip` attached via `softprops/action-gh-release@v2`.
319+
3. **`scripts/check_offline_live_links.py` against `_site-offline/`**, flagging any surviving `https://docs.twinbasic.com/<path>` reference outside `<code>` / `<pre>` blocks (the bare root is exempt — see [Failure modes: Surviving live-site links](#failure-modes)).
320+
321+
All three steps fail the build on the first non-zero exit, blocking the Pages deploy and the release upload. After they succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like `docs-YYYY-MM-DD-HHMM` (UTC), and creates a GitHub release with `twinbasic-docs-offline.zip` attached via `softprops/action-gh-release@v2`.
319322

320323
## Failure modes
321324

@@ -331,7 +334,7 @@ The plugin surfaces several conditions in its summary log lines:
331334

332335
- **`_site-offline/` triggering `jekyll serve` rebuilds.** Was a problem; now handled by two things in combination: `exclude: [_site-offline]` in `_config.yml`, and the "clean contents but keep the directory" trick in the wipe step (which keeps all watcher events under `_site-offline/...` where the exclude matches).
333336

334-
- **Surviving live-site links.** The [SEO block stripping](#seo-block-stripping) pass removes the bulk of `https://docs.twinbasic.com` references each page contains (canonical link, OpenGraph URL, JSON-LD `url`). Anything left in `_site-offline/` is a source link that points at the live docs site -- usually a markdown author writing `https://docs.twinbasic.com/<path>` instead of a relative link or `/tB/...` permalink, which would silently navigate the offline reader back online. `scripts/check_offline_live_links.py` (run by `check.bat` after the offline lychee pass) flags these locally; the bare root `https://docs.twinbasic.com[/]` is exempt since intentional "go to the live site" links are allowed. CI does not run this check.
337+
- **Surviving live-site links.** The [SEO block stripping](#seo-block-stripping) pass removes the bulk of `https://docs.twinbasic.com` references each page contains (canonical link, OpenGraph URL, JSON-LD `url`). Anything left in `_site-offline/` is a source link that points at the live docs site -- usually a markdown author writing `https://docs.twinbasic.com/<path>` instead of a relative link or `/tB/...` permalink, which would silently navigate the offline reader back online. `scripts/check_offline_live_links.py` flags these; the bare root `https://docs.twinbasic.com[/]` is exempt since intentional "go to the live site" links are allowed. Run locally by `check.bat` and in CI by both workflows after the offline link check.
335338

336339
## Performance
337340

docs/check.bat

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
1-
@rem Use lychee to check the links in both build outputs, then scan
1+
@rem Run the Python-based link checker on both build outputs, then scan
22
@rem _site-offline/ for live-site links that survived offlinify.
33
@rem
4+
@rem Same arguments as lychee.bat -- only the executable differs. The Python
5+
@rem script is faster on this workload (~25x on Windows) and a bit stricter:
6+
@rem it flags <script src> targets that don't exist and rejects trailing
7+
@rem slashes on file-shaped URLs (e.g. `foo.html/`), both of which lychee
8+
@rem silently accepts. lychee.bat remains available as a second opinion.
9+
@rem
410
@rem _site/ Online tree. `--fallback-extensions html` mirrors what
511
@rem GitHub Pages does at request time: an extensionless
612
@rem URL like /FAQ is served as /FAQ.html. Without the flag
@@ -23,9 +29,9 @@
2329
@rem script exits non-zero if any fails (earlier failures take precedence
2430
@rem in the reported code).
2531
@setlocal
26-
@set LYCHEE="%~dp0..\.claude\lychee.exe"
32+
@set CHECK=python "%~dp0..\scripts\check_links.py"
2733
@echo Checking _site/ (online) ...
28-
@%LYCHEE% --offline --include-fragments --fallback-extensions html --index-files "index.html,." --root-dir ".\_site" ".\_site" %*
34+
@%CHECK% --offline --include-fragments --fallback-extensions html --index-files "index.html,." --root-dir ".\_site" ".\_site" %*
2935
@set EXIT1=%ERRORLEVEL%
3036
@echo.
3137
@echo Checking _site-offline/ (offline) ...
@@ -34,7 +40,7 @@
3440
@rem above accepts `.` because GitHub Pages can serve an unstyled
3541
@rem directory listing or a 404 in that case; offline, there's no
3642
@rem such fallback, and the link is just broken.
37-
@%LYCHEE% --offline --include-fragments --index-files "index.html" --root-dir ".\_site-offline" ".\_site-offline" %*
43+
@%CHECK% --offline --include-fragments --index-files "index.html" --root-dir ".\_site-offline" ".\_site-offline" %*
3844
@set EXIT2=%ERRORLEVEL%
3945
@echo.
4046
@echo Checking _site-offline/ for live-site links ...

docs/lychee.bat

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
@rem Use lychee to check the links in both build outputs, then scan
2+
@rem _site-offline/ for live-site links that survived offlinify.
3+
@rem
4+
@rem _site/ Online tree. `--fallback-extensions html` mirrors what
5+
@rem GitHub Pages does at request time: an extensionless
6+
@rem URL like /FAQ is served as /FAQ.html. Without the flag
7+
@rem every pretty permalink would appear broken.
8+
@rem _site-offline/ Offline tree. No extension fallback -- every link must
9+
@rem resolve to an actual file under file://, since the
10+
@rem browser does no rewriting. Catches relative links in
11+
@rem markdown sources whose permalink shape doesn't match
12+
@rem the rendered filename (e.g. `[Foo](Foo/)` when Jekyll
13+
@rem wrote `Foo.html`, not `Foo/index.html`).
14+
@rem live-links Greps _site-offline/ HTML for any surviving
15+
@rem https://docs.twinbasic.com reference outside <code> /
16+
@rem <pre> blocks. After _plugins/offlinify.rb strips the
17+
@rem jekyll-seo-tag block from each page, none should
18+
@rem remain -- a hit means a source link goes to the live
19+
@rem site instead of the canonical /tB/... permalink.
20+
@rem See ../scripts/check_offline_live_links.py.
21+
@rem
22+
@rem All three checks always run so you see all errors in one pass; the
23+
@rem script exits non-zero if any fails (earlier failures take precedence
24+
@rem in the reported code).
25+
@setlocal
26+
@set LYCHEE="%~dp0..\.claude\lychee.exe"
27+
@echo Checking _site/ (online) ...
28+
@%LYCHEE% --offline --include-fragments --fallback-extensions html --index-files "index.html,." --root-dir ".\_site" ".\_site" %*
29+
@set EXIT1=%ERRORLEVEL%
30+
@echo.
31+
@echo Checking _site-offline/ (offline) ...
32+
@rem No `.` in --index-files: under file://, a bare directory URL
33+
@rem (`Foo/`) requires an actual index.html inside. The online check
34+
@rem above accepts `.` because GitHub Pages can serve an unstyled
35+
@rem directory listing or a 404 in that case; offline, there's no
36+
@rem such fallback, and the link is just broken.
37+
@%LYCHEE% --offline --include-fragments --index-files "index.html" --root-dir ".\_site-offline" ".\_site-offline" %*
38+
@set EXIT2=%ERRORLEVEL%
39+
@echo.
40+
@echo Checking _site-offline/ for live-site links ...
41+
@python "%~dp0..\scripts\check_offline_live_links.py"
42+
@set EXIT3=%ERRORLEVEL%
43+
@if %EXIT1% NEQ 0 exit /b %EXIT1%
44+
@if %EXIT2% NEQ 0 exit /b %EXIT2%
45+
@exit /b %EXIT3%
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
// noop
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
body { color: black; }
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
<!DOCTYPE html>
2+
<title>no index here</title>
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
<!DOCTYPE html>
2+
<title>dir index</title>
3+
<h2 id="dir-anchor">Dir Anchor</h2>

0 commit comments

Comments
 (0)