Skip to content

Commit b16393d

Browse files
authored
ci: add lychee dead-link detection workflows (#346)
* ci: add lychee dead-link detection workflows Adds two workflows plus a shared lychee config: - link-check-pr.yml runs on PRs that touch .md/.mdx files, scoped to the changed files only and limited to external http(s) URLs. Internal links are already validated by Docusaurus in build.yml. Non-blocking via continue-on-error during a tuning period. - link-check-weekly.yml runs Mondays 12:00 UTC and on workflow_dispatch. Builds the site and crawls rendered HTML under build/docs, community, and faq. On failure, opens or updates a single rolling GitHub issue labeled link-rot. - lychee.toml: shared cache, retries, and excludes for both jobs. Signed-off-by: “Kevin” <kevlar_ksb@yahoo.com> * test(ci): add canary broken link to verify lychee PR check Temporary. This commit will be reverted in the next commit on this branch. Verifies that link-check-pr.yml correctly fails when a PR introduces an unreachable external URL. Signed-off-by: “Kevin” <kevlar_ksb@yahoo.com> * test(ci): remove canary broken link Signed-off-by: “Kevin” <kevlar_ksb@yahoo.com> * fix(ci): use step output for lychee exit code in weekly workflow lycheeverse/lychee-action@v2 exposes exit_code as a step output, not an env var. The previous env.lychee_exit_code reference was always empty, and '' != '0' evaluates to true, which would have caused the tracking issue to be opened/updated on every weekly run regardless of whether any links were actually broken. Signed-off-by: “Kevin” <kevlar_ksb@yahoo.com> --------- Signed-off-by: “Kevin” <kevlar_ksb@yahoo.com>
1 parent d7f60eb commit b16393d

3 files changed

Lines changed: 165 additions & 0 deletions

File tree

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Pull-request link check.
2+
#
3+
# Scope: external HTTP(S) links in markdown files changed by the PR.
4+
# Internal/relative links are validated by Docusaurus during build.yml
5+
# (onBrokenLinks: 'throw', onBrokenMarkdownLinks: 'throw'); re-checking
6+
# them here would be redundant and prone to false positives because
7+
# lychee cannot replay Docusaurus's slug routing.
8+
name: Link check (PR)
9+
10+
on:
11+
pull_request:
12+
paths:
13+
- '**/*.md'
14+
- '**/*.mdx'
15+
- 'lychee.toml'
16+
- '.github/workflows/link-check-pr.yml'
17+
18+
permissions:
19+
contents: read
20+
21+
jobs:
22+
lychee:
23+
runs-on: ubuntu-latest
24+
# Non-blocking on day one. Once the check is stable for a couple of
25+
# weeks, remove this line and mark "Link check (PR) / lychee" as a
26+
# required status check in branch protection.
27+
continue-on-error: true
28+
steps:
29+
- name: Checkout
30+
uses: actions/checkout@v4
31+
with:
32+
fetch-depth: 0
33+
persist-credentials: false
34+
35+
- name: Detect changed markdown files
36+
id: changed
37+
uses: tj-actions/changed-files@v45
38+
with:
39+
files: |
40+
**/*.md
41+
**/*.mdx
42+
43+
- name: Restore lychee cache
44+
if: steps.changed.outputs.any_changed == 'true'
45+
uses: actions/cache@v4
46+
with:
47+
path: .lycheecache
48+
key: lychee-pr-${{ github.run_id }}
49+
restore-keys: lychee-pr-
50+
51+
- name: Run lychee on changed files
52+
if: steps.changed.outputs.any_changed == 'true'
53+
uses: lycheeverse/lychee-action@v2
54+
with:
55+
# --scheme http --scheme https restricts lychee to absolute
56+
# HTTP(S) URLs only (i.e., external links). See header comment.
57+
args: --no-progress --scheme http --scheme https --config lychee.toml ${{ steps.changed.outputs.all_changed_files }}
58+
fail: true
59+
env:
60+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Weekly site-wide link check.
2+
#
3+
# Builds the Docusaurus site and runs lychee against the rendered HTML
4+
# under build/docs, build/community, and build/faq (blog and changelog
5+
# are intentionally excluded). On failure, opens or updates a single
6+
# rolling GitHub issue labeled "link-rot" so findings are tracked over
7+
# time without spamming new issues each week.
8+
name: Link check (weekly)
9+
10+
on:
11+
schedule:
12+
- cron: '0 12 * * 1' # Mondays 12:00 UTC
13+
workflow_dispatch:
14+
15+
permissions:
16+
contents: read
17+
issues: write
18+
19+
jobs:
20+
lychee:
21+
runs-on: ubuntu-latest
22+
steps:
23+
- name: Checkout
24+
uses: actions/checkout@v4
25+
with:
26+
persist-credentials: false
27+
28+
- name: Use Node.js lts/jod (v22)
29+
uses: actions/setup-node@v4
30+
with:
31+
node-version: lts/jod
32+
cache: 'npm'
33+
34+
- name: Install and build
35+
run: |
36+
npm ci
37+
npm run build
38+
39+
- name: Restore lychee cache
40+
uses: actions/cache@v4
41+
with:
42+
path: .lycheecache
43+
key: lychee-weekly-${{ github.run_id }}
44+
restore-keys: lychee-weekly-
45+
46+
- name: Run lychee on built docs, community, and faq
47+
id: lychee
48+
uses: lycheeverse/lychee-action@v2
49+
with:
50+
args: >-
51+
--no-progress
52+
--config lychee.toml
53+
--base ./build
54+
'./build/docs/**/*.html'
55+
'./build/community/**/*.html'
56+
'./build/faq/**/*.html'
57+
# Do not fail the job so the next step can open/update the
58+
# tracking issue with the report.
59+
fail: false
60+
output: ./lychee-report.md
61+
env:
62+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
63+
64+
- name: Create or update tracking issue
65+
# lychee-action@v2 exposes exit_code as a STEP OUTPUT, not an env var.
66+
# Using env.lychee_exit_code here would always be empty, and
67+
# '' != '0' evaluates to true in GitHub Actions expressions, so the
68+
# issue would be created on every run.
69+
if: steps.lychee.outputs.exit_code != '0'
70+
uses: peter-evans/create-issue-from-file@v5
71+
with:
72+
title: 'Link check report'
73+
content-filepath: ./lychee-report.md
74+
labels: |
75+
link-rot
76+
automated

lychee.toml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Lychee link checker config.
2+
# Shared by .github/workflows/link-check-pr.yml and link-check-weekly.yml.
3+
# Docs: https://lychee.cli.rs/usage/config/
4+
5+
cache = true
6+
max_cache_age = "1d"
7+
max_retries = 3
8+
retry_wait_time = 2
9+
timeout = 20
10+
max_concurrency = 16
11+
12+
# Treat rate-limit/partial responses as success so transient throttling
13+
# from external sites is not reported as link rot.
14+
accept = [200, 203, 206, 429]
15+
16+
# Skip URLs that routinely 403 bots regardless of how politely we crawl.
17+
exclude = [
18+
"^http(s)?://localhost",
19+
"^http(s)?://127\\.0\\.0\\.1",
20+
"^http(s)?://0\\.0\\.0\\.0",
21+
"linkedin\\.com",
22+
]
23+
24+
# Never traverse these paths when expanding inputs.
25+
exclude_path = [
26+
"node_modules",
27+
".docusaurus",
28+
".preview-pages",
29+
]

0 commit comments

Comments
 (0)