_plugins/offlinify.rb produces a file://-browsable copy of the rendered site. The plugin hooks into Jekyll's render pipeline per page: it captures each rendered page.output in memory, rewrites every page-to-page link to a page-relative path with an explicit file extension, and writes the result straight to _site-offline/<rel> — no detour through disk. After Jekyll's WRITE phase, a final hook copies static files (images, fonts, the just-the-docs.js theme asset), patches the two just-the-docs JS functions that break under file://, and rewires the lunr search index to load from a <script src=> instead of an XHR call. The result is a fully self-contained tree that opens cleanly when you double-click index.html on disk — no HTTP server required.
This file sits in _plugins/ for two reasons: it lives next to the code it documents, and Jekyll's _plugins/ folder is plugin-only territory, so this Markdown never gets rendered into the public site.
Three things in a stock Jekyll/just-the-docs build assume an HTTP server is in front of the files:
-
Root-absolute URLs. Every
hrefandsrcin the rendered HTML starts with/, e.g./assets/css/just-the-docs-combined.css. Underfile://a leading slash resolves against the filesystem root, not the site root, so the asset never loads. -
Extensionless permalinks. The site uses
permalink:frontmatter like/tB/Core/Const, which Jekyll writes to_site/tB/Core/Const.html. The HTML refers to it as/tB/Core/Constand the server is expected to map that toConst.html. Browsers do no such mapping underfile://. -
just-the-docs JS.
navLink()matches the active nav entry by string-comparingdocument.location.pathnameagainst linkhrefattribute values; underfile://the pathname is a filesystem path that no link matches, so the sidebar collapses on every navigation.initSearch()fires anXMLHttpRequestfor/assets/js/search-data.json; browsers blockfile://XHR for file resources.
Pure Jekyll can't fix any of these. relative_url is site-relative, not page-relative — it has no access to the source page's URL when rendering a link, so it can't decide how many ../s to prepend. Per-page permalink: frontmatter overrides any global URL-shape change. And the upstream theme's JS is out of our hands. The fix has to come after render.
Activated by also_build_offline: true (the default in _config.yml). The plugin registers four hooks that fire across the build:
Jekyll::Hooks.register :site, :pre_render do |site|
next unless site.config["also_build_offline"]
Offlinify.setup(site)
end
Jekyll::Hooks.register :pages, :post_render do |page|
Offlinify.process_page(page)
end
Jekyll::Hooks.register :documents, :post_render do |doc|
Offlinify.process_page(doc)
end
Jekyll::Hooks.register :site, :post_write do |site|
next unless site.config["also_build_offline"]
Offlinify.finish(site)
endOfflinify.setup reads from site.config and the in-memory site.pages + site.static_files + site.documents, wipes <site.dest>-offline/, and seeds per-build state on the module's @state ivar. The per-page hooks transform page.output and write straight to _site-offline/. Offlinify.finish reads static files from site.dest (_site/, now fully written by Jekyll's WRITE phase) and copies them across, then patches just-the-docs.js and writes search-data.js.
One Jekyll invocation produces _site/, _site-offline/ (this plugin), and _site-pdf/ (via pdfify.rb). Flip the flag to false if you only want the online site.
Incremental mode (--incremental) is not supported. The per-page write model writes only the changed pages while the pre_render hook still wipes the offline tree — net result: an incomplete offline build. setup detects the flag and emits a warning instead of running. Use plain jekyll build for the offline tree.
Three phases, four hook callbacks.
Fires at :site, :pre_render — once at the start of the build, after Jekyll has read the source tree and generated all pages (including jekyll-redirect-from stubs and SCSS-derived CSS pages) but before any page renders.
-
Bail on
--incremental. Set@state = niland emit a warning. The per-page write model would leave the offline tree incomplete — the wipe runs unconditionally, but only changed pages re-fire their post_render hook. -
Wipe the output directory's contents. The directory itself is preserved across builds — recreating it makes Jekyll's watcher report a bare
_site-offlinechange event (no trailing slash, since the directory is momentarily absent at notification time) that the YAML exclude entry_site-offlinedoesn't match (jekyll-watch auto-appends a trailing slash to directory excludes, turning the rule into the regex_site-offline/), and the result is an infinite rebuild loop onjekyll serve. -
Build
site_pathsfrom Jekyll's in-memory model (site.pages + site.static_files + site.documents). Each item'sdestination(site.dest)gives the absolute file path Jekyll will write; converting to a site-rooted forward-slash form yields keys like/tB/Core/Const.html. The keys are decoded — filesystem names likeForm Designer.htmlgo in literally, notForm%20Designer.html. Resolution incompute_relativeis then an O(1)Set#include?probe per candidate, instead of 2-3File.file?syscalls each (very slow on Windows). -
Normalise
baseurl. Readsite.config["baseurl"], strip trailing slashes, prepend a leading slash if missing. The result matches the prefixrelative_urlactually emits in the rendered HTML — e.g./twinBASIC-docson a GitHub Pages project site. Used during URL resolution to strip the prefix before probingsite_paths. -
Seed state on the module's
@stateivar so the per-page and finish hooks can pick up where setup left off: caches (seg_cache,result_cache), counters, normalised baseurl, exclude patterns, dest paths, cumulative timer. Cleared at the end offinishso a fresh build starts clean.
Fires at :pages, :post_render and :documents, :post_render — once per page after Jekyll renders it. page.output is the final HTML/CSS/etc bytes; the plugin transforms it and writes the result to _site-offline/<rel>. Jekyll's WRITE phase writes the same page.output to _site/<rel> a moment later, so the online and offline files come from the same in-memory string — no re-read.
For each page:
-
Compute
relfrompage.destination(@state[:dest])via a plain string slice (dest_path[(@state[:dest_root_fs].length + 1)..]) rather than a Pathname round-trip.Pathname#relative_path_fromis roughly 2 ms per call on Windows and would dominate per-page cost on a 1000+ page build. -
Check
offline_exclude(see Exclude list). Matched files increment theexcluded_filescounter and skip the write. -
Detect jekyll-redirect-from stubs by class-name string check (
page.class.name == "JekyllRedirectFrom::RedirectPage"). The stubs are tiny HTML files whose meta-refresh, canonical link,<script>location=, and fallback<a>all reference an absolutehttps://<site.url>/<path>URL produced byabsolute_url. Online these redirect to the canonical page; offline they would require network access and land on the live site rather than the local file — defeating the offline scenario. Rewrite each<site.url><path>occurrence to its resolved page-relative form via the samecompute_relativethe main HTML pass uses, then write the stub. Counted underrewritten_redirectsin the summary log line. Some source pages (notablyMiscellaneous/Documentation Development.md) intentionally link viaredirect_fromURLs as a stable-URL pattern, so the rewritten stubs let those source links navigate locally instead of failing. The class-name string check is used rather thanis_a?so the plugin still loads if jekyll-redirect-from is removed. Ifsite.urlis unset (empty) the stub is written verbatim — the path-portion targets still resolve under lychee's offline check the same way the main HTML pass's link targets do. -
Dispatch on output extension:
.html: duppage.output, strip the jekyll-seo-tag block (see SEO block stripping), scan for code-block ranges, run the combined HTML URL rewrite (see HTML URL rewriting), inject the search-setup script tags, write..css: duppage.output, run theurl()rewrite (see CSSurl()rewriting), write.- Anything else (XML feeds, JSON, etc.): write
page.outputverbatim.
-
Accumulate self-time into
@state[:cumulative_ms]. The reported total at the end is just Offlinify's CPU time across all hook invocations, not the wall-clock between pre_render and post_write (which would include Jekyll's render and write phases between our hooks).
Fires at :site, :post_write — once after Jekyll's WRITE phase has populated _site/.
-
Copy static files (
site.static_files) from_site/to_site-offline/. Static files don't fire:pages, :post_render, so they're handled here. Theoffline_excludecheck runs again for each. -
Patch
assets/js/just-the-docs.jsin_site-offline/. Replace thenavLink()andinitSearch()function bodies with offline-friendly versions. -
Generate
assets/js/search-data.js. Read thesearch-data.jsonthat Phase 2 wrote (the jekyll-search Page object renders the JSON, whichprocess_pagecaptures and writes verbatim), wrap inwindow.SEARCH_DATA = {...};, write next to the JSON. -
Log the summary. Three or four lines under the
Offlinify:topic prefix, ending withOfflinifier ran in Xms.(cumulative self-time, not wall-clock). -
Clear
@stateso a subsequent build starts with no leftover counters or caches.
The jekyll-seo-tag plugin emits a ~900-byte block at the top of every page's <head>, bracketed by <!-- Begin Jekyll SEO tag vX.Y.Z --> and <!-- End Jekyll SEO tag --> comments. Inside live a <title>, a generator tag, OpenGraph and Twitter Card meta, a <link rel="canonical"> pointing at the live site, and a JSON-LD structured-data <script>. All of it exists for search-engine crawlers and social-media link previewers that never see _site-offline/.
The whole block is stripped, except the <title> (the browser tab label, the only thing in the block a local reader actually uses). The bracketing comments go away too. On the current ~830-page site, the strip saves roughly 750 KB across the offline tree and removes three of the four https://docs.twinbasic.com references each page would otherwise contain (the fourth, the JSON-LD "url" field, is also inside the SEO block).
Runs first in the .html branch of process_page so the URL rewrite isn't doing work on URLs we're about to delete, and so the code-block scan's byte offsets are valid against the post-strip content.
A single combined regex matches both absolute and page-relative URLs in href/src attributes:
\b(href|src)=(["'])(\/(?!\/)[^"']*|(?![#/]|[a-zA-Z][a-zA-Z0-9+.\-]*:)[^"']+)\2
The third capture (the URL) has two alternatives:
- Absolute (
\/(?!\/)[^"']*): starts with a single/, not//(protocol-relative). Produced byrelative_url. Goes throughcompute_relative. - Page-relative (
(?![#/]|[a-zA-Z][a-zA-Z0-9+.\-]*:)[^"']+): does not start with#(fragment-only — leave alone),/(handled by the first alternative), or ascheme:prefix (http:,mailto:,tel:,javascript:, etc.). Comes from markdown sources verbatim ([Description](Attributes#description)-style); Jekyll passes these through without applyingrelative_url, so they reach the rendered HTML without a baseurl prefix. Goes throughcompute_rel_url.
The two alternatives are disjoint at the start of the URL, so a single gsub handles both. Inside the block, dispatch on raw.start_with?("/"). (An earlier two-regex design ran two full gsubs and re-scanned the file for code-block ranges between them; combining them halved the per-file regex work — see Performance.)
For each absolute-URL match, the steps are:
-
Split off query/fragment.
#sectionand?foo=barare preserved verbatim onto the rewritten URL. -
Percent-decode the path.
/Tutorials/CustomControls/Form%20Designerbecomes/Tutorials/CustomControls/Form Designerso it can be compared against the literal filesystem-derived keys insite_paths. -
Strip the baseurl prefix. If
baseurlis/twinBASIC-docsand the URL is/twinBASIC-docs/tB/Core/Const, the path becomes/tB/Core/Const. Two forms are handled: an exact match (/twinBASIC-docs→/) and a normal subpath (/twinBASIC-docs/foo→/foo). -
Probe three candidates against
site_paths. In priority order:<path>as-is — e.g./assets/css/just-the-docs-combined.cssmatches its own file.<path>.html— e.g./FAQ→/FAQ.html. Only tried if the path has no extension and doesn't end with/.<path>/index.html— e.g./Tutorials/CEF/→/Tutorials/CEF/index.html.
First hit wins. A miss means the URL stays as-is and the unresolved counter increments (reported in the build summary).
-
Compute the page-relative URL. Find the longest common prefix between the source file's directory segments (computed once per file by
file_dir_segs_from_rel) and the target's path segments (cached globally byseg_cache). Emit"../" * (depth - common) + encoded_segs[common..].join("/"). Re-encode only path segments that contain reserved characters; URL-safe segments pass through verbatim and share strings between the decoded and encoded arrays. -
Reattach the query/fragment tail.
Worked example: from _site-offline/tB/Core/Const.html, the input URL is /twinBASIC-docs/Tutorials/CustomControls/Form%20Designer#section.
raw = "/twinBASIC-docs/Tutorials/CustomControls/Form%20Designer#section"
path/sep/tail = "/twinBASIC-docs/Tutorials/CustomControls/Form%20Designer" / "#" / "section"
decoded = "/twinBASIC-docs/Tutorials/CustomControls/Form Designer"
after strip = "/Tutorials/CustomControls/Form Designer"
candidates = ["/Tutorials/CustomControls/Form Designer",
"/Tutorials/CustomControls/Form Designer.html",
"/Tutorials/CustomControls/Form Designer/index.html"]
matched = "/Tutorials/CustomControls/Form Designer.html"
file_segs = ["tB", "Core"]
target_segs = ["Tutorials", "CustomControls", "Form Designer.html"] (decoded)
encoded_segs = ["Tutorials", "CustomControls", "Form%20Designer.html"]
common = 0
ascend = "../../"
descend = "Tutorials/CustomControls/Form%20Designer.html"
result = "../../Tutorials/CustomControls/Form%20Designer.html#section"
For each page-relative match (e.g. Attributes#description in Const.html), the steps are:
-
Normalise the relative path against the current page's directory segments.
..pops the stack,.and consecutive slashes are skipped, anything else is pushed. The result is an absolute site path (/tB/Core/Attributesfor theAttributesexample, starting fromtB/Core/Const.html). -
Probe the same three candidates as the absolute path.
-
Append the matching suffix to the original relative URL. Crucially, the output is the original raw plus the suffix that worked — not a freshly computed relative path. From the
Attributes#descriptionexample: the path is already correctly relative to the current page (same directory), the only fix needed is.html. SoAttributes→Attributes.htmland the original#descriptiontail is reattached, givingAttributes.html#description.
If the original is already correct (e.g. href="foo.html" where foo.html exists), the probe of <path> matches and the suffix is empty — the URL is left untouched and the match doesn't contribute to the "changed" count. If no candidate matches, the URL is left as-is and the unresolved counter is incremented.
Before the rewrite regex runs, the file's content is scanned once for <code>…</code> and <pre>…</pre> blocks. The byte ranges of their bodies are passed to the regex callback, which returns the match verbatim when the match offset falls inside any range. The skip has two consequences:
- Example URLs in tutorial code samples (e.g.
<script src="/script.js">displayed verbatim in a CEF page) are not rewritten and don't count toward the "unresolved" counter. The unresolved counter is now a real bug signal: anything it reports is either a broken source link or an upstream-theme change. - Rouge's syntax highlighter HTML-escapes
<and>inside code but leaves"alone, sosrc="/foo"survives literally inside<code>bodies and would otherwise match the URL regex. The code-block skip is what makes this invisible.
Two <script> elements are inserted right before the existing <script src="...just-the-docs.js"> tag in each rendered HTML:
<script>window.OFFLINE_SITE_ROOT="../../";</script>
<script src="../../assets/js/search-data.js"></script>-
window.OFFLINE_SITE_ROOTis the per-page relative prefix from the page's directory to the offline site root. Computed from the samefile_segsthe URL rewriter uses — empty string at root,"../../"at depth 2, etc. The patchedinitSearch()reads this to convert search-result URLs into page-relative paths. -
<script src="...search-data.js">loads the lunr index data intowindow.SEARCH_DATA. Loaded as a classic script tag, which browsers allow underfile://(the same-origin restriction is onfetch/XHR, not script execution).
Both run in source order before just-the-docs.js, so the globals are populated before the document-ready callback fires initSearch().
The injection finds the just-the-docs.js script tag via a regex that captures the relative-path prefix in the existing tag's src attribute (e.g. ../../assets/js/). The same prefix is reused for the new search-data.js reference. This works because the HTML URL rewriting pass has already converted the just-the-docs.js src from root-absolute to page-relative form by the time the injection runs.
The just-the-docs theme ships background-image: url("/favicon.png") for the site logo. Without rewriting, this would fail under file://.
The regex url\(\s*(["']?)(\/(?!\/)[^"'()\s]*)\1\s*\) matches url(...) references whose URL starts with a single slash, optionally wrapped in quotes. The rewrite uses the same compute_relative as the HTML absolute-URL path.
In the CSS file the source dir is _site-offline/assets/css/ so the rewrite emits url("../../favicon.png").
Both patches go into _site-offline/assets/js/just-the-docs.js. Each is a full function-body replacement matched by a regex anchored on the upstream function signature and a stable trailer. A miss emits a warning that points at the constant to update — the early-warning signal that just-the-docs has shipped a new version of the function.
navLink() patch. The upstream version matches the active nav entry by string-comparing document.location.pathname against link href attribute values. Under file://, pathname is the document's filesystem path (/D:/.../Const.html) and the nav href attributes are page-relative (Const.html). No selector matches, so no nav-list-item gets class="active" and the sidebar appears collapsed on every navigation.
The patched version compares the link's resolved .href DOM property (an absolute URL the browser produced from the relative attribute) against window.location.href:
function navLink() {
var here = window.location.href.split('#')[0].split('?')[0];
var links = document.getElementById('site-nav').querySelectorAll('a.nav-list-link');
for (var i = 0; i < links.length; i++) {
if (links[i].href === here) return links[i];
}
return null;
}Works in both online (https://...) and offline (file:///...) contexts.
initSearch() patch. The upstream version fires XMLHttpRequest for /assets/js/search-data.json and builds a lunr index from the response. Browsers block file:// XHR for file resources, so the request fails silently in request.onerror and the search box is non-functional.
The patched version reads window.SEARCH_DATA directly (preloaded by the per-page <script src="search-data.js"> tag), rewrites each doc.url from a root-absolute permalink (/tB/Core/Const) to a page-relative path (<OFFLINE_SITE_ROOT>tB/Core/Const.html), then builds the lunr index and hands it to searchLoaded(index, docs). The URL transformation mirrors the rules in the Ruby compute_relative: trailing slash → index.html, no extension → .html, #fragment preserved. searchLoaded is left unchanged — it just reads the now-modified doc.url values as click targets.
A subtle but important detail: the patched code reads doc.relUrl, not doc.url, as the source of the rewrite. search-data.json contains both fields — url has the baseurl prefix (since absolute_url produced it), relUrl does not. By using relUrl we avoid having to also strip a baseurl prefix that varies between deployments.
After the per-file walk, build_search_data_js! reads _site-offline/assets/js/search-data.json and writes a sibling search-data.js containing:
window.SEARCH_DATA = { ...the JSON contents... };A single line is prepended to the JSON contents; the structure is otherwise unchanged. The .json file is left in place — it's no longer used by the offline build but removing it has no benefit and keeps the offline tree closer to the online layout.
If search-data.json doesn't exist (e.g. someone has set search_enabled: false in a custom config overlay), the step is a no-op. The per-page script injection still inserts the <script src="...search-data.js"> tag; under file:// it'll 404 silently and the patched initSearch() will log a console message and return early.
Some files Jekyll writes to _site/ make sense on a live HTTP-served deployment but are pointless under file://:
CNAMEis GitHub Pages' custom-domain config.sitemap.xmlandrobots.txtare for search-engine crawlers.redirects.jsonis jekyll-redirect-from's machine-readable output.*.batare Windows build helpers Jekyll picks up from the source directory and copies into_site/because it doesn't know they aren't content.
The offline copy drops these. The list lives in _config.yml as offline_exclude:, so editing the policy doesn't require touching the plugin:
offline_exclude:
- CNAME
- robots.txt
- sitemap.xml
- redirects.json
- "*.bat"Patterns are File.fnmatch-style with File::FNM_PATHNAME, matched against each file's site-rooted forward-slash path. * does not cross directory separators, so *.bat catches only top-level .bat files; use **/*.bat to match at any depth. Specific paths like subdir/foo.txt also work and match exactly.
A missing or empty offline_exclude entry skips the pattern check entirely.
The exclude check runs in two places:
- Inside
build_site_pathsinsetup, so URL-resolution candidates can't point at an excluded target (a stray<a href="/sitemap.xml">in the source would simply fail to resolve, instead of resolving to a now-missing file). - Inside
process_pageand the static-file loop infinish, where the write is skipped so the file never appears in_site-offline/.
In addition to the pattern-based excludes, jekyll-redirect-from stubs get their absolute URLs rewritten to page-relative form rather than being excluded (detected by the JekyllRedirectFrom::RedirectPage class-name check in process_page). The stubs contain only a meta-refresh / canonical link / <script>location= / fallback <a>, all referencing https://<site.url>/<path>. Left alone, following one offline would require network access and land on the live site. Each <site.url><path> occurrence is run through the same compute_relative the main HTML pass uses and replaced with the resolved relative path, so the stub navigates locally instead. The rewritten stubs are reachable from the offline tree, which matters for source pages (notably Miscellaneous/Documentation Development.md) that intentionally link via redirect_from URLs as a stable-URL pattern. Counted under rewritten_redirects in the summary log line, distinct from the pattern-matched excluded_files.
The summary log line reports both counts: … rewrote N redirect stub(s) … excluded M file(s) ….
Three caches keep the per-match work to a single Hash lookup once warmed up:
-
site_paths(Setof strings). Built once insetupfromsite.pages + site.static_files + site.documents. Every file path that Jekyll will write, keyed by its site-rooted forward-slash form (/tB/Core/Const.html). Used bycompute_relativeandcompute_rel_urlto probe candidate paths. -
seg_cache(Hashofsite_path→[decoded_segs, encoded_segs]). Lazily populated. For each unique target site path that the URL rewriter resolves to, this holds the decoded path segments (used for LCP comparison against filesystem-derivedfile_segs) and the URL-encoded segments (joined for the output URL). Most segments are URL-safe and share strings between the two arrays. -
result_cache(Hashof"#{file_dir}\x00#{raw}"→final_rel_urlornil). The big win. Subsumes step 1 (raw → site_path) and step 2 (site_path → page-relative URL) so each unique(file_dir, raw)pair is computed exactly once across the build. Every page shares its nav and aux-nav with every other page — those links resolve once on the first page and hit cache on every subsequent page. Without this cache the offlinify pass takes ~7× longer.
The cache is shared between the absolute-URL and page-relative-URL dispatches inside the combined HTML pass — the raw shapes are disjoint (absolute starts with /, relative doesn't), so there's no collision. The \x00 separator between file_dir and raw prevents path-name collisions inside the cache key.
The offline build touches the following files:
| Path | Role |
|---|---|
docs/_plugins/offlinify.rb |
The plugin. Hooks :site, :pre_render (setup), :pages, :post_render + :documents, :post_render (per-page write), :site, :post_write (static files, JS patches, search-data.js). |
docs/_plugins/offlinify.md |
This file. |
docs/_config.yml |
also_build_offline: true (default-on) and exclude: [_site-offline] (keeps Jekyll's watcher from rebuilding on the plugin's own output). |
docs/build.bat |
Plain bundle exec jekyll build — produces _site/, _site-offline/, and (via pdfify.rb) _site-pdf/ in one run. |
docs/serve.bat |
bundle exec jekyll serve — watcher-friendly thanks to the exclude. |
docs/check.bat |
Local link check (CI runs the same three passes via the workflows). Three steps: scripts/check_links.py permissive on _site/, scripts/check_links.py strict on _site-offline/, and scripts/check_offline_live_links.py against _site-offline/. Exits non-zero on any failure. |
scripts/check_offline_live_links.py |
Flags any https://docs.twinbasic.com/<path> reference that survived offlinify in _site-offline/ HTML, outside <code> / <pre> blocks. Skips the bare root (https://docs.twinbasic.com[/]) since intentional "go to the live site" links are allowed. Run by check.bat locally and by both CI workflows after the offline link check. |
docs/.gitignore |
_site, _site-offline, and _site-pdf all excluded from git. |
.github/workflows/jekyll-gh-pages.yml |
Deploy workflow (push to staging, manual dispatch). Builds, runs lychee against _site/, runs scripts/check_links.py against _site-offline/, runs scripts/check_offline_live_links.py against _site-offline/, deploys to Pages, and (on manual dispatch) packages _site-offline/ as a release artifact. |
.github/workflows/checks.yml |
PR-gating workflow (pull-request to main, manual dispatch). Same three link-check steps as the deploy workflow; no deploy or release. |
bundle exec jekyll build in CI passes --baseurl "${{ steps.pages.outputs.base_path }}" from actions/configure-pages. For a Pages site with a custom domain (CNAME), base_path is empty. For a project page without a custom domain, it's /repo-name. Offlinify handles both cases — normalize_baseurl in setup produces the right prefix to strip.
The workflow has three link-check steps after the build:
-
Lychee against
_site/, with--fallback-extensions htmland a--remapthat strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like/FAQget served as/FAQ.html. Without--fallback-extensions html, every pretty permalink would appear broken in this check. Lychee (notscripts/check_links.py) handles the online tree because--remapisn't implemented in the Python checker; the offline tree below has all baseurl prefixes already stripped by offlinify and doesn't need it. -
scripts/check_links.pyagainst_site-offline/, strict — no extension fallback (--index-files index.htmlonly; the online check also accepts the bare directory via,.). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g.[Foo](Foo/)when Jekyll wroteFoo.html, notFoo/index.html) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance. The Python checker is roughly 25× faster than lychee on this workload and a bit stricter (catches missing<script src>targets and trailing slashes on file-shaped URLs). -
scripts/check_offline_live_links.pyagainst_site-offline/, flagging any survivinghttps://docs.twinbasic.com/<path>reference outside<code>/<pre>blocks (the bare root is exempt — see Failure modes: Surviving live-site links).
All three steps fail the build on the first non-zero exit, blocking the Pages deploy and the release upload. After they succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like docs-YYYY-MM-DD-HHMM (UTC), and creates a GitHub release with twinbasic-docs-offline.zip attached via softprops/action-gh-release@v2.
The plugin surfaces several conditions in its summary log lines:
-
Unresolved links.
rewrote 837 HTML and 4 CSS file(s), copied 516 asset(s) (N unresolved link(s) left as-is). Each match the regex picked up but couldn't resolve againstsite_pathsincrements the counter. The code-block skip keeps example URLs inside<code>/<pre>off this counter, so a non-zero value here is a real bug signal — usually a broken source link, or an upstream-theme change that broke a regex. -
JS regex misses.
could not locate navLink() in assets/js/just-the-docs.js(or the equivalent forinitSearch()). The corresponding patch is skipped. Means just-the-docs has shipped a new version of the function and the regex constant needs updating. The plugin emits a warning pointing at the specific constant to update. -
Missing
search-data.json. Silent — the search-data.js generation step is a no-op. The per-page script tag injection still runs, so each page will requestsearch-data.jsand the browser will log a 404. The patchedinitSearch()will hit itswindow.SEARCH_DATA not foundbranch and log a console message. -
Real broken links in markdown sources. Caught by the strict lychee step in CI (or by
check.batlocally). These don't surface in the offlinify summary because the rewrite passes correctly identify them as unresolvable and leave them alone — that's the right behavior, the source markdown needs fixing. Source markdown linking at aredirect_fromURL is reachable in the offline tree (the redirect stub is rewritten to navigate locally), but a stub that itself references a missing target falls back to the originalhttps://<site.url>/...URL and lychee will then surface it as broken — same right-thing-to-do behaviour. -
_site-offline/triggeringjekyll serverebuilds. Was a problem; now handled by two things in combination:exclude: [_site-offline]in_config.yml, and the "clean contents but keep the directory" trick in the wipe step (which keeps all watcher events under_site-offline/...where the exclude matches). -
Surviving live-site links. The SEO block stripping pass removes the bulk of
https://docs.twinbasic.comreferences each page contains (canonical link, OpenGraph URL, JSON-LDurl). Anything left in_site-offline/is a source link that points at the live docs site -- usually a markdown author writinghttps://docs.twinbasic.com/<path>instead of a relative link or/tB/...permalink, which would silently navigate the offline reader back online.scripts/check_offline_live_links.pyflags these; the bare roothttps://docs.twinbasic.com[/]is exempt since intentional "go to the live site" links are allowed. Run locally bycheck.batand in CI by both workflows after the offline link check.
The optimization story is captured in the commit history. Briefly:
- Naïve first version (per-file
File.file?probes for each candidate): ~30 s. - +
site_pathsSet (O(1) lookup): down to ~10 s, before further work. - +
result_cache,seg_cache, manual LCP (replacedPathname.relative_path_fromper match with a string-segment comparison): down to ~7 s as the site grew past 800 pages. - + combined HTML regex (single gsub matching both absolute and page-relative URLs in one pass — eliminating the second full file scan and the interim re-scan of code-block ranges that used to sit between two separate passes): down to ~4 s. Roughly 40% off the HTML walk.
- + per-page hook architecture (
:pages, :post_renderconsumespage.outputin memory rather than re-reading the rendered HTML from_site/at:site, :post_write): the per-fileFile.binreadis eliminated. Cumulative self-time across hooks is ~5-6 s on the current ~830-page site, dominated by per-page Jekyll hook dispatch overhead and the per-pageFile.binwrite. The ~290 jekyll-redirect-from stubs go through a much cheaper code path than the main HTML pass (a single regex over a few hundred bytes, no code-block scan, no search-setup injection) so they're a small slice of the total.
The remaining cumulative time is mostly File.binwrite across ~830 HTML files (Windows file I/O on NTFS is the dominant cost) plus the regex pass over the SCSS-compiled just-the-docs-combined.css.
The static-file copy in finish adds an additional ~200 ms of FileUtils.cp for the binary assets (images, fonts, etc.) that don't need rewriting.
-
Source-only broken links, where the markdown points at a permalink shape that doesn't match the rendered filename, can't be fixed by the plugin —
compute_rel_urlcorrectly identifies the target as nonexistent and leaves the link unchanged. The strict lychee step in CI surfaces these as real errors so they get fixed at the source. -
<a href>values inside<code>blocks were not distinguishable from real links at the regex level; example URLs in tutorial code samples surfaced as false-positive entries in the unresolved counter. The code-block skip now suppresses them — both the rewrite and the counter increment. Worth keeping an eye on if the upstream syntax highlighter (Rouge) ever switches away from wrapping highlighted code in<code>/<pre>. -
The search index is hefty.
search-data.jsis ~2.8 MB (mostly text content for every page on the site, pretty-printed). It's loaded fresh on every page navigation underfile://since browsers don't cache aggressively acrossfile://documents. The size is acceptable on SSDs but could be a couple-second delay on spinning disks. Minifying the JSON before wrapping would save ~30-40%; the plugin currently doesn't. -
The plugin is regex-based, not AST-based. This is fast and has no external dependencies, but means we rely on stable shapes for the just-the-docs.js function signatures. A warning is emitted on a regex miss, which is the early-warning signal that the upstream theme has changed.
In source order in offlinify.rb:
setup(site)—:site, :pre_renderhook entry. Buildssite_pathsfrom the in-memory page set, wipes the offline tree, seeds per-build state on@state. Bails out with a warning if--incrementalis set.normalize_baseurl(raw_baseurl)— helper forsetup. Coerces the configured baseurl to either empty string or/segment...with no trailing slash, matching the formrelative_urlactually prepends.build_site_paths(site, exclude_patterns)— helper forsetup. Iteratessite.pages + site.static_files + site.documentsand builds the URL Set from each item'sdestination(site.dest), decoded and forward-slash-normalised.wipe_out_dest_contents(out_dest)— helper forsetup. Removes the offline tree contents while leaving the directory itself in place (see Phase 1).process_page(page)—:pages, :post_renderand:documents, :post_renderhook entry. Transformspage.outputand writes the offline copy. Dispatches on output extension and on page class (jekyll-redirect-from stubs get a dedicated branch that rewrites their absolute<site.url>/<path>URLs to page-relative form).finish(site)—:site, :post_writehook entry. Copies static files from_site/to_site-offline/, patches just-the-docs.js, generates search-data.js, logs the summary, clears@state.rewrite_html!(content, file_dir, file_segs, site_paths, seg_cache, result_cache, baseurl, code_ranges)— the combined HTML pass. Onegsubper file overHTML_COMBINED_RE, dispatching onraw.start_with?("/"): absolute URLs go throughcompute_relative, page-relative URLs throughcompute_rel_url. Single cache lookup per match.rewrite_css!(content, file_dir, file_segs, site_paths, seg_cache, result_cache, baseurl)— the CSS pass. Onegsubper file overCSS_URL_RE, dispatched tocompute_relative(CSS only carries absolute URLs in this codebase). No code-block handling — CSS has no equivalent concept.inject_search_setup!(content, file_segs)— the second HTML transformation. Single regex substitution per file: finds the just-the-docs.js script tag and prepends the two new ones.strip_seo!(content)— removes the jekyll-seo-tag plugin's output block from a page's<head>, keeping only the<title>tag. Runs first in the.htmlbranch ofprocess_pageso the URL rewrite and code-block scan see the post-strip content.compute_relative(raw, file_segs, site_paths, seg_cache, baseurl)— the absolute-URL resolver. Strip baseurl, probe candidates, compute LCP, return final URL.compute_rel_url(raw, file_segs, site_paths)— the page-relative-URL resolver. Normalise against the current page's dir, probe candidates, return original raw plus matching suffix.patch_jtd_js!(out_dest)— does thenavLink()andinitSearch()body substitutions.build_search_data_js!(out_dest)— generatessearch-data.jsfromsearch-data.json.
Together these are ~280 lines of Ruby plus inline JS replacement strings. The rest of the file is doc comments.