Detailed implementation plan for the seventh phase of the tbdocs builder.
Read this together with PLAN.md (the architecture overview),
PLAN-1.md (DISCOVER), PLAN-2.md (COMPUTE),
PLAN-3.md (RENDER), PLAN-4.md (TEMPLATE),
PLAN-5.md (WRITE ONLINE), and PLAN-6.md
(AUXILIARIES). The canonical Jekyll reference is
docs/_plugins/offlinify.rb (the
~1,460-line Ruby implementation) and its companion writeup
docs/_plugins/offlinify.md.
The WRITE OFFLINE phase has one job: take the rendered _site/ tree
that Phases 5+6 just produced and mirror it into _site-offline/,
rewriting every link so the tree opens cleanly under file:// with no
HTTP server. Two consequences of the project's URL shape make this
necessary: every href / src in the rendered HTML is root-absolute
(/assets/css/...), and pages use extensionless permalinks
(/tB/Core/Const). Under file:// a leading slash resolves against
the filesystem root, not the site root, and browsers don't auto-append
.html. The fix has to run after render.
What Phase 7 does NOT do:
- Render markdown, compute nav, wrap chrome, write the online tree, or produce the sitemap / robots / search-data / redirect stubs (Phases 1-6 already did).
- Assemble or write
book.htmlfor the PDF tree (Phase 8). - Modify
_site/in any way -- the online tree is read-only input here; transformations land in_site-offline/only. - Run the offline link check (
check_links.mjs --forbid ...-- a post-build harness, not a phase).
Target: ~400-600 ms wall time on the current Windows dev machine
for the full offline mirror, processing ~1,130 input files
(837 pages + 290 redirect stubs + 4 CSS + 234 static files + 7 theme
assets + 1 search-data.json). The Jekyll equivalent (offlinify.rb)
runs ~2.65 s on the same machine, dominated by Ruby's per-page hook
plumbing and the gsub callback rate. The JS port targets a ~5× gain
by collapsing the per-page Ruby hook overhead and processing files in
bulk through Node's libuv concurrency.
Implementation landed in builder/offline.mjs (~620
lines including doc comments + the new derive* exports for the diff
tools). The verify harness (verify-phase7.mjs (retired Phase 10))
runs end-to-end on the production tree and all 30+ acceptance checks
pass. Byte parity vs Jekyll's docs/_site-offline/ is exact for every
HTML page (829 match + 8 accepted-divergence pages whose Phase 3/4
divergences propagate through), every redirect stub, every CSS file,
the patched just-the-docs.js, and the search-data.js wrap.
Phase 7 wall-time on the dev machine: ~870-1090 ms (below the 1500 ms soft cap, above the 800 ms target). The HTML pass dominates; the nav-block caching deferral from §7.D7 stayed deferred because the total stayed under the soft cap. See §10's measured-timings table for the per-substep breakdown.
Phase 7 surfaced one finding worth recording at the top: the URL
resolver's sitePaths Set needs to include the redirect-stub
destinations Phase 6 emits, not just pages + staticFiles + theme assets.
Without them, a page-relative link from inside a source markdown file
to a target whose only on-disk presence is a redirect stub (e.g.
tB/Core/LBound.html, the stub redirecting to
tB/Modules/Information/LBound.html) resolves to nothing and the
rewrite leaves the link as the original bare LBound. The fix
threads the redirect-stub list through buildSitePaths; see §6.1.
The { pages, staticFiles, site, destRoot } object the orchestrator
carries after Phase 6, plus the auxiliary outputs Phase 6 returned.
Phase 7 reads:
| Field | Why Phase 7 reads it |
|---|---|
page.html |
The full layouted HTML document. Source for the per-page rewrite. undefined for book.html (skipped, per §7.D5). |
page.destPath |
Output path relative to destRoot. Drives the offline-tree destination (<offlineRoot>/<destPath>) and seeds the site_paths Set (§6.1) so URL resolution knows which files exist. |
page.permalink |
Read indirectly via the site_paths Set (page resolution targets <destPath> already, derived from permalink in Phase 1). Not consumed directly. |
staticFile.srcPath |
Source path to copy from -- avoids re-reading _site/. |
staticFile.destRel |
Destination path relative to destRoot / offlineRoot. Drives both copy destination and site_paths membership. |
site.config.url |
Origin of the absolute redirect-target URLs (https://docs.twinbasic.com). Phase 7 strips this prefix from redirect stubs and rewrites the path to page-relative. |
site.config.baseurl |
Currently empty. Phase 7 strips it from any relative_url-shaped path before resolution (§6.4). Honoured for forward-compat with GitHub Pages project sites. |
site.config.offline_exclude |
Glob patterns for files to skip during the mirror (CNAME, robots.txt, sitemap.xml, book.html). Honoured via File.fnmatch-equivalent FNM_PATHNAME semantics (§6.5). |
destRoot |
The _site/ root Phases 5+6 wrote to. Phase 7 derives offlineRoot = destRoot + "-offline" and writes there. |
Phase 6's auxStats.search.json (NEW) |
The full ~2.8 MB JSON string Phase 6 just built. Phase 7 wraps it as window.SEARCH_DATA = ...; for the offline tree. Avoids a redundant ~2.8 MB disk re-read. See §7.D8. |
Phase 6's auxStats.redirects.stubs (NEW) |
The { destPath, html, sourcePage, fromPath } list deriveRedirectStubs already produced. Phase 7 rewrites the absolute target URL in each stub's HTML and writes to the offline tree. Re-deriving would work too, but the bytes are already there. |
Phase 7 does NOT read page.frontmatter, page.rawContent,
page.renderedContent, page.navPath, page.breadcrumbs,
page.children, page.navLevels, page.seo*, site.navTree,
site.bookData, or site.buildInfo -- every per-page derivation
Phases 2-4 produced has already been baked into page.html. The
template-level state is invisible at the offline layer; only the byte
output matters.
The theme assets Phase 5 already copied to <destRoot>/assets/:
assets/css/
just-the-docs-combined.css ~288 KB (compiled theme, custom colours baked in)
just-the-docs-head-nav.css ~287 B (per-page nav-prefix override)
print.css ~18 KB (used by Phase 8's PDF tree too)
rouge.css ~2.3 KB (syntax-highlight scope-to-colour rules)
assets/js/
just-the-docs.js ~19.5 KB (sidebar / search / copy-button runtime)
theme-switch.js ~1.2 KB (dark-mode toggle)
vendor/lunr.min.js ~31 KB (search runtime)
Phase 7 reads just-the-docs.js from _site/assets/js/ (already
copied by Phase 5) and writes the patched copy to
_site-offline/assets/js/just-the-docs.js. Other JS / CSS files
either copy verbatim (no patches) or undergo a CSS url() rewrite
(the just-the-docs-combined.css rule referencing /favicon.png).
offlineRoot = <destRoot>-offline. Phase 7 wipes its contents
(keeping the directory in place -- see §7.D2) at entry, then
populates from scratch. This mirrors Jekyll offlinify.rb's
wipe_out_dest_contents behaviour.
| Value | Default | Source |
|---|---|---|
offlineRoot |
<destRoot>-offline -- a sibling of destRoot with the -offline suffix. On the current dev machine: D:\OCP\wc\twinBASIC-documentation\docs\_site-new-offline. |
Derived inside Phase 7; not a separate CLI flag. |
dryRun |
false |
The orchestrator's existing --dry-run flag (Phase 5+6 honour it). When true, Phase 7 logs intended writes but produces no files. |
Phase 7 has no new CLI flags. The --no-offline opt-out (parallel to
Jekyll's also_build_offline: false) is a future addition; the
default-on behaviour matches Jekyll exactly (also_build_offline: true in _config.yml).
The orchestrator awaits Phase 5 (page writes) and Phase 6 (sitemap +
robots + redirects + search-data) before invoking Phase 7. Reading
from _site/ during Phase 7 (the just-the-docs.js asset is the only
case) is safe: those files are flushed to disk before Phase 7's first
read.
Phase 6's parallel Promise.all settles in <300 ms on the dev
machine, so this is a non-issue in practice -- Phase 7's setup pass
(building site_paths) runs at least that long anyway.
Phase 7 produces a fully populated <offlineRoot>/ directory on disk:
<offlineRoot>/ ~1,140 files
index.html URL-rewritten copy of <destRoot>/index.html
404.html URL-rewritten copy
Reference.html URL-rewritten copy
Reference/
Core/Const.html URL-rewritten copy
...
tB/
Core/Const.html URL-rewritten copy
...
Tutorials/CustomControls/Form Designer.html URL-rewritten copy
...
assets/
css/
just-the-docs-combined.css verbatim copy + url() rewrite for /favicon.png
just-the-docs-head-nav.css verbatim copy
print.css verbatim copy
rouge.css verbatim copy
js/
just-the-docs.js patched copy (navLink + initSearch bodies replaced)
theme-switch.js verbatim copy
vendor/lunr.min.js verbatim copy
search-data.json verbatim copy of Phase 6's output
search-data.js NEW: `window.SEARCH_DATA = {...the JSON...};`
images/
mmd/<hash>.svg verbatim copy
mmd/<hash>.mmd verbatim copy (mermaid source -- ships alongside)
Tutorials/.../Images/*.png verbatim copies (content images)
Features/Images/*.png verbatim copies
favicon.png verbatim copy
lib/*.mjs verbatim copies (PDF helpers, shipped offline too)
render-book.mjs verbatim copy
What's excluded from <offlineRoot>/ per the offline_exclude
config in _config.yml:
CNAME-- GitHub Pages custom-domain config; pointless underfile://.sitemap.xml-- crawler metadata.robots.txt-- crawler metadata.book.html-- never written to_site/either (Phase 5 skips it, layoutbook-combinedis Phase 8's territory).
What's added that wasn't in _site/:
assets/js/search-data.js-- the JS-wrapped form of search-data.json (loaded via<script src=>, which works underfile://; XHR forsearch-data.jsondoes not).
What's patched (content changed vs _site/):
- Every
.htmlpage: everyhref/srcattribute starting with/rewritten to a page-relative path. Every page-relativehref(e.g.Attributes#description) gains the appropriate.html//index.htmlsuffix. SEO block stripped (the jekyll-seo-tag output that Phase 4'srenderHeadSeoemits byte-for-byte). Two<script>tags injected before<script src="...just-the-docs.js">:window.OFFLINE_SITE_ROOT="..."and<script src=".../search-data.js">. - Every CSS file: every
url(/...)rewritten to a page-relative path (coversjust-the-docs-combined.css's favicon reference). - Every redirect-stub HTML: the four occurrences of the absolute
<site.url>/<path>URL each rewritten to a page-relative path. assets/js/just-the-docs.js:navLink()andinitSearch()function bodies replaced.
Filesystem mutations only. Phase 7 doesn't shell out, doesn't mutate any in-memory data structure beyond the per-build caches it allocates itself, doesn't network. The single visible effect is "the offline tree on disk now matches the intended output."
_site/ is the canonical online-deploy artifact and must keep its
root-absolute URLs (so GitHub Pages serves it correctly). The offline
tree is a derivative; producing it alongside the online tree -- both
from the same in-memory page set -- means we ship one build that
satisfies both deployments. The cost is ~25 MB of disk space (the
offline tree is roughly the same size as _site/) and the ~400 ms
Phase 7 wall time. Worth it.
One new file ships in Phase 7's first cut, with internal section boundaries that match Jekyll offlinify.rb's structure:
builder/
offline.mjs ~620 lines as shipped. Exports:
writeOffline(pages, staticFiles, site, destRoot, { auxStats })
-- the orchestrator entry point.
buildOfflineState(pages, staticFiles, site, destRoot,
{ stubs })
-- assembles the shared {sitePaths, caches, baseurl,
siteUrl, excludePatterns, destRoot} object.
deriveOfflinePage(page, state) -- per-page transform.
deriveOfflineRedirect(stub, state) -- per-stub transform.
deriveOfflineCss(cssIn, themeRel, state) -- per-CSS transform.
deriveOfflineJtdJs(srcBytes) -- JTD JS patches.
deriveOfflineSearchDataJs(jsonBytes) -- search-data.js wrap.
The derive* helpers are pure-compute (no I/O) and are the
surface the diff tools (`_diff.mjs --offline*`,
`_triage.mjs auditOffline*`) consume so they don't have to
re-implement the per-input transforms or shell out to the
full writeOffline. Internal sections:
§A Top-level orchestration (entry + dispatch loop)
§B Site-paths set + caches (§6.1, the URL resolver state)
§C URL resolution (compute_relative, compute_rel_url,
resolve_raw, build_segs, decode)
§D HTML rewrite pipeline (strip_seo, rewrite_html,
inject_search_setup)
§E CSS rewrite pipeline (rewrite_css)
§F Redirect-stub rewrite (rewrite_redirect_stub)
§G just-the-docs.js patches (navLink + initSearch replacements
+ search-data.js wrapper write)
§H Static-file pass (exclude filter + copy)
§I Pure-compute derive* (re-export surface for diff tools)
The Jekyll offlinify.rb is one ~1,460-line file. The JS port targets
~600 lines (the gsub boilerplate, Pathname workarounds, and
~280 lines of doc comments compress significantly in JS). The whole
file fits comfortably in a single module; reviewers can navigate by
section boundary instead of jumping between files.
The URL resolver (§C) is the densest piece -- ~150 lines covering five helpers and three caches -- and is a natural extraction target if Phase 7 ever grows past ~800 lines. Today, splitting forces a public API for the cache shapes that isn't worth the surface-area cost.
PLAN-5 (write.mjs, ~250 lines) followed the same "one module while
it's small" reasoning. Phase 7 follows it for the same reason.
See §2's "Why a wholly separate tree" rationale. The orchestrator
treats _site/ as immutable input from Phase 7's perspective; only
the just-the-docs.js asset gets read from _site/ (and that's a read,
not a write).
mkdirRec,runLimited,writeFileMkdirp,WRITE_LIMITfromwrite.mjs(Phase 5). The offline pass writes ~1,130 files in parallel under the same concurrency cap; reusing the cap keeps resource pressure aligned across phases.safeWritewrapper fromwrite.mjs(the path-stamping error helper). Currently a private function insidewrite.mjs; Phase 7 needs it for the copy paths in §5.4 + §5.5. First-step refactor: promote to a module-level export (one-line change towrite.mjs, zero behaviour impact, verify-phase5 stays green). Phase 7's per-file copies becomesafeWrite(dest, () => fs.copyFile(src, dest))-- same shape Phase 5 uses forfs.writeFile.isUnderProjectguard fromwrite.mjs(the wipe-safety check). Same promotion-to-export treatment assafeWrite.absoluteUrlandstripHtmlfromseo.mjs(Phase 2). Phase 7 doesn't needabsoluteUrl(every URL it produces is page-relative), and only marginally needsstripHtml(the SEO strip uses a fixed-pattern regex, not the generic HTML stripper). Re-export not required.deriveRedirectStubsfromredirects.mjs(Phase 6). Phase 6's return value gains astubsfield carrying the same{ destPath, html, sourcePage, fromPath }array its derivation produced, so Phase 7 can rewrite the absolute URLs in each stub without re-deriving. See §7.D8 for the orchestrator-side change.- Phase 6's search-data JSON bytes. Same pattern -- the
auxStatsobject Phase 6 returns gains ajsonfield carrying the full search-data.json string. Phase 7 wraps it aswindow.SEARCH_DATA = ...;without re-reading from disk.
The Phase 6 return-shape extensions are non-breaking:
auxStats.redirects.written and auxStats.search.entries keep their
existing semantics; the new fields are additive.
{ pages, staticFiles, site, destRoot, auxStats } // after Phase 6
│
▼
[1] setupOfflineDest(offlineRoot) ← §5.1
(wipe-contents + recreate of <offlineRoot>/;
equivalent to Jekyll's wipe_out_dest_contents)
│
▼
[2] buildSitePaths(pages, staticFiles, destRoot, ← §6.1
excludePatterns, stubs)
(async; Set<string> of every site-rooted
forward-slash path the URL resolver will probe;
~1,140 entries -- pages + statics + redirect-stub
destPaths + theme assets walked from
<destRoot>/assets/.)
│
▼
[3] Read & patch the JS asset once:
patchJustTheDocsJs(<destRoot>/assets/js/just-the-docs.js,
<offlineRoot>/assets/js/just-the-docs.js)
writeSearchDataJs(<offlineRoot>/assets/js/search-data.js,
auxStats.search.json)
│
▼
[4] In parallel (runLimited fans out):
writeOfflinePages(pages, ...) ← §5.2 (~837 pages)
writeOfflineRedirects(auxStats.redirects.stubs, ...) ← §5.3 (~290 stubs)
copyOfflineStatics(staticFiles, ...) ← §5.4 (~234 files, minus exclude)
copyOfflineThemeAssets(themeFiles, ...) ← §5.5 (~7 files, CSS rewritten)
copyOfflineSearchData(auxStats.search.json, ...) ← §5.6 (verbatim JSON copy)
│
▼
[5] summarise(totals) ← §5.7
(HTML / CSS / redirect / asset counts;
unresolved counter; one log line.)
The five parallel substeps in step [4] write to disjoint destination
paths (pages → *.html files outside assets/; redirects → small
HTML files outside assets/; statics → mostly under Images/, lib/,
or top-level; theme assets → assets/css/ and assets/js/; search-
data → assets/js/), so they don't race. The shared mutable state
(URL-resolution caches in §C) lives behind the offline.mjs module's
private scope; each per-file rewrite is a pure transformation given
the immutable site-paths Set + the lazy caches.
The JS-asset patching in step [3] runs sequentially before the parallel
fan-out because the patched file is what subsequent reads in _site/
(none, in fact -- the asset is read once here and never again) would
target. Splitting it from the per-file loop also lets the warnings
("could not locate navLink() in just-the-docs.js") surface early in
the build output.
Each write surface uses runLimited with WRITE_LIMIT = 64 (the
Phase 5 cap). Five concurrent surfaces × 64 = 320 max in-flight
operations -- well within libuv's default 4-thread pool's capacity
(file I/O is kernel-async on Windows and Linux). On the dev machine,
no cap at all also works; the 64 cap protects constrained systems
from EMFILE.
Two reasons, mirroring PLAN-5 §4:
- Correctness.
buildSitePathsis the URL resolver's source of truth. Per-file rewrites query the Set. Building it after writes start would race. - Predictability. The wipe step deletes the previous offline tree. A failure here (locked file, permission error) surfaces cleanly without interleaving with per-file write errors.
The setup pass (wipe + buildSitePaths + JS patch + search-data.js write) is ~60 ms total. Sequencing it is cheap; parallelising would save it but risk the failure modes above.
const OFFLINE_SUFFIX = "-offline";
const LIMIT = WRITE_LIMIT; // re-use Phase 5's capTwo lines. Everything else (regex constants, JS-patch templates, cache containers) lives inline next to the functions that use them.
The entry point assembles a single deps object and threads it through
every substep. Centralised so the substep signatures stay narrow and
the cache lifetime is one build. The state-building half is factored
into the exported buildOfflineState so the diff tools can reuse it
without going through the writer-side I/O:
export async function writeOffline(pages, staticFiles, site, destRoot, { auxStats } = {}) {
const stubs = auxStats?.redirects?.stubs ?? [];
const state = await buildOfflineState(pages, staticFiles, site, destRoot, { stubs });
const deps = {
...state,
offlineRoot: destRoot + OFFLINE_SUFFIX,
counters: {
html: 0, css: 0, redirects: 0, statics: 0, assets: 0,
excluded: 0, unresolved: 0,
},
};
await setupOfflineDest(deps.offlineRoot);
const jtdSrc = path.join(destRoot, "assets/js/just-the-docs.js");
const jtdDest = path.join(deps.offlineRoot, "assets/js/just-the-docs.js");
const jtdPatches = await patchJustTheDocsJs(jtdSrc, jtdDest);
await writeSearchDataJs(
path.join(deps.offlineRoot, "assets/js/search-data.js"),
auxStats?.search?.json ?? null,
);
await Promise.all([
writeOfflinePages(pages, deps),
writeOfflineRedirects(stubs, deps),
copyOfflineStatics(staticFiles, deps),
copyOfflineThemeAssets(deps),
copyOfflineSearchData(auxStats?.search?.json ?? null, deps),
]);
return { ...deps.counters, jtdPatches };
}
export async function buildOfflineState(pages, staticFiles, site, destRoot, { stubs = [] } = {}) {
const excludePatterns = Array.isArray(site.config?.offline_exclude)
? site.config.offline_exclude.map(String) : [];
return {
destRoot, // input tree root (Phase 5 wrote it; read-only here)
sitePaths: await buildSitePaths(pages, staticFiles, destRoot, excludePatterns, stubs),
caches: {
rawResolution: new Map(), // raw → [sep, tail, sitePath]
seg: new Map(), // sitePath → [decodedSegs, encodedSegs]
result: new Map(), // fileDir → Map<raw, finalRelUrl|null>
},
baseurl: normalizeBaseurl(site.config?.baseurl),
siteUrl: String(site.config?.url ?? "").replace(/\/+$/, ""),
excludePatterns,
};
}The caches are fresh per build (not cross-build memoised); see
§7.D4. The split lets the diff tools (_diff.mjs --offline*,
_triage.mjs auditOffline*) build state, derive expected bytes for a
single input via one of the pure derive* helpers (§I), and
byte-compare against Jekyll's _site-offline/ -- without going
through the writer at all.
The fall-throughs on auxStats?.search?.json and
auxStats?.redirects?.stubs are defensive against a Phase 6 that
didn't ship the API extension from §7.D8 yet -- a degraded but still
correct build: pages and statics process, redirects and search-data
skip with null inputs.
Purpose. Ensure <offlineRoot>/ exists and is empty when Phase 7
begins writing. The orchestrator gates the whole writeOffline call
behind !dryRun, so this helper doesn't need its own dry-run branch.
Algorithm.
import { existsSync } from "node:fs"; // sync existence probe; one-off use
async function setupOfflineDest(offlineRoot) {
if (!isUnderProject(offlineRoot)) {
throw new Error(`refusing to clean ${offlineRoot}: not under the project tree`);
}
// Wipe contents, keep directory in place. See §7.D1.
if (existsSync(offlineRoot)) {
const entries = await fs.readdir(offlineRoot);
await Promise.all(entries.map(name =>
fs.rm(path.join(offlineRoot, name), { recursive: true, force: true }),
));
} else {
await fs.mkdir(offlineRoot, { recursive: true });
}
}The isUnderProject guard is the same shape as write.mjs's
isUnderProject (PLAN-5 §5.1) -- promoted to a module-level export
during Phase 7 implementation along with safeWrite.
Why "wipe contents, keep directory" rather than fs.rm of the
directory itself? See §7.D2 -- Jekyll's jekyll serve watcher pattern
would otherwise infinite-loop on the directory-recreated event.
tbdocs doesn't ship a watcher today, but the convention is cheap to
honour and removes one footgun if a watcher lands later.
Purpose. For each page with page.html !== undefined, apply the
HTML transformations (via the exported pure-compute deriveOfflinePage)
and write to <offlineRoot>/<page.destPath>.
Algorithm.
async function writeOfflinePages(pages, deps) {
const { offlineRoot } = deps;
const writable = pages.filter(p => p.html !== undefined);
await runLimited(writable, LIMIT, async (page) => {
const { html, misses } = deriveOfflinePage(page, deps);
const dest = path.join(offlineRoot, page.destPath);
await writeFileMkdirp(dest, html);
deps.counters.html += 1;
deps.counters.unresolved += misses;
});
}
export function deriveOfflinePage(page, state) {
const { sitePaths, caches, baseurl } = state;
const fileDir = posixDirname(page.destPath);
const fileSegs = fileDirSegsFromRel(page.destPath);
let html = page.html;
html = stripSeo(html);
const { rewritten, misses } = rewriteHtml(html, fileDir, fileSegs, sitePaths, caches, baseurl);
html = rewritten;
html = injectSearchSetup(html, fileSegs);
return { html, misses };
}The pure-compute deriveOfflinePage is the surface _diff.mjs --offline=
and _triage.mjs auditOfflinePages consume to derive expected bytes
without writing to disk. The writer just wraps it with the I/O and
counter bookkeeping.
Each per-page rewrite is the same three-stage pipeline as Jekyll's
offlinify process_page's .html branch:
stripSeo-- delete the jekyll-seo-tag block, keep only its<title>. Defined in §6.2.rewriteHtml-- one regex pass over<href|src>=...attributes, dispatched tocomputeRelative(absolute URLs) orcomputeRelUrl(page-relative URLs). Code-block content pre-empted by the combined regex's leading alternatives. Defined in §6.6.injectSearchSetup-- inject two<script>tags before the<script src="...just-the-docs.js">tag. Defined in §6.8.
Order matters: SEO strip first (saves the rewrite from doing work on
URLs about to be deleted), URL rewrite second (touches the
just-the-docs.js src to make it page-relative -- the script-injection
step then matches it as the anchor for the new <script> tags), inject
last (looks up the rewritten src to derive the relative prefix).
Why page.html (in-memory) rather than re-reading _site/<destPath>.
The bytes are already in memory from Phase 4. Re-reading saves
nothing and costs ~22 MB of disk I/O across all pages. The in-memory
copy is also guaranteed to match what _site/ holds (Phase 5 wrote
the same bytes).
Encoding. utf8. Match Phase 5's writePages.
Why book.html is skipped. page.html === undefined for book.html
(layout book-combined, Phase 8 territory). The filter at the top of
writeOfflinePages drops it. Also configured in offline_exclude for
defence in depth.
Purpose. For each Phase 6 redirect stub, rewrite the four
absolute-target-URL occurrences to page-relative form (via the
exported pure-compute deriveOfflineRedirect) and write to
<offlineRoot>/<destPath>.
Algorithm.
async function writeOfflineRedirects(stubs, deps) {
const { offlineRoot } = deps;
await runLimited(stubs, LIMIT, async (s) => {
const html = deriveOfflineRedirect(s, deps);
await writeFileMkdirp(path.join(offlineRoot, s.destPath), html);
deps.counters.redirects += 1;
});
}
export function deriveOfflineRedirect(stub, state) {
const { sitePaths, caches, baseurl, siteUrl } = state;
if (!siteUrl) return stub.html; // no site.url → write verbatim
const siteUrlEsc = escapeRegExp(siteUrl);
const prefixRe = new RegExp(`${siteUrlEsc}(/[^"' >]*)`, "g");
const fileDir = posixDirname(stub.destPath);
const fileSegs = fileDirSegsFromRel(stub.destPath);
const pageCache = getPageCache(caches.result, fileDir);
return stub.html.replace(prefixRe, (match, raw) => {
let rel = pageCache.get(raw);
if (rel === undefined) {
rel = computeRelative(raw, fileSegs, sitePaths, caches, baseurl);
pageCache.set(raw, rel);
}
return rel ?? match; // unresolved → leave the absolute URL verbatim
});
}The substitution pattern matches <site.url><path> -- e.g.
https://docs.twinbasic.com/tB/Modules/DateTime/Day becomes
../../../tB/Modules/DateTime/Day.html after probing for the actual
file under that path. Unresolved matches stay as the absolute URL --
the offline link check (with --forbid 'https://docs.twinbasic.com')
will then flag them as a source-side bug, which is the right
behaviour.
Why rewrite at all rather than leaving the absolute URLs in place?
Following a stub offline would otherwise require network access and
land the reader on the live site. Some source pages (notably
Miscellaneous/Documentation Development.md) intentionally link via
redirect_from URLs as a stable-URL pattern, and those need to
navigate locally.
Re-uses caches across pages and redirects. The caches.result
nested map (file_dir → raw → final_rel_url) is shared between the
page pass and the redirect pass -- the redirect's path keys are
disjoint from the absolute-URL ones (redirects only carry one URL
shape, the <site.url>/<path> form, while the page pass sees both
/path absolutes and path relatives), so there's no collision.
Purpose. Copy each staticFile to <offlineRoot>/<destRel>,
honouring the offline_exclude patterns.
Algorithm.
async function copyOfflineStatics(staticFiles, deps) {
const { offlineRoot, excludePatterns, counters } = deps;
await runLimited(staticFiles, LIMIT, async (file) => {
if (offlineExcluded(file.destRel, excludePatterns)) {
counters.excluded += 1;
return;
}
const dest = path.join(offlineRoot, file.destRel);
await mkdirRec(path.dirname(dest));
await safeWrite(dest, () => fs.copyFile(file.srcPath, dest));
counters.statics += 1;
});
}The exclude check matches per-pattern via fnmatch-equivalent
behaviour (§6.5). On the current site, CNAME is the only static
file matched by an exclude pattern (the others -- robots.txt,
sitemap.xml, book.html -- aren't in staticFiles[], they're
emitted by Phase 5/6 substeps).
Why copy from staticFile.srcPath rather than _site/<destRel>.
The source-path copy avoids the _site/ round-trip and matches the
copy copyStaticFiles already does in Phase 5. Either source produces
byte-identical output -- _site/ was populated from srcPath
unchanged.
Purpose. Copy theme CSS/JS from _site/assets/ (or builder/assets/
-- see below) to _site-offline/assets/, rewriting CSS url()
references and skipping just-the-docs.js (handled separately in
step [3] of §4).
Algorithm.
async function copyOfflineThemeAssets(deps) {
const { destRoot, offlineRoot, counters } = deps;
const themeRoot = path.join(destRoot, "assets");
if (!existsSync(themeRoot)) return;
const themeEntries = await collectThemeFiles(themeRoot);
await runLimited(themeEntries, LIMIT, async (e) => {
if (e.isJtdJs) return; // step [3] already wrote the patched copy
const dest = path.join(offlineRoot, "assets", e.relUnderAssets);
if (e.isCss) {
const cssIn = await fs.readFile(e.srcAbs, "utf8");
const relRel = path.posix.join("assets", e.relUnderAssets);
const { css, misses } = deriveOfflineCss(cssIn, relRel, deps);
await writeFileMkdirp(dest, css);
counters.css += 1;
counters.unresolved += misses;
} else {
await mkdirRec(path.dirname(dest));
await safeWrite(dest, () => fs.copyFile(e.srcAbs, dest));
counters.assets += 1;
}
});
}
export function deriveOfflineCss(cssIn, themeRel, state) {
const { sitePaths, caches, baseurl } = state;
const fileDir = posixDirname(themeRel);
const fileSegs = fileDirSegsFromRel(themeRel);
const { rewritten, misses } = rewriteCss(cssIn, fileDir, fileSegs, sitePaths, caches, baseurl);
return { css: rewritten, misses };
}The four CSS files run through rewriteCss (§6.7); the three JS
files plus the vendor/lunr.min.js file copy verbatim. Reading from
destRoot/assets/ (rather than builder/assets/) keeps "the offline
tree mirrors what _site/ shipped" as the model -- if Phase 5 ever
gains a post-copy theme rewrite, the offline tree picks it up
automatically. The downside is one extra disk read per CSS file (~310
KB total); negligible.
Purpose. Copy search-data.json verbatim from _site/ to
_site-offline/ (kept in place for parity with the online tree --
nothing reads it offline, but its absence would surface as a missing
asset to anyone diffing the trees).
Algorithm. A single write of the in-memory JSON bytes (Phase 6 returned them; Phase 7 doesn't re-read from disk):
async function copyOfflineSearchData(jsonBytes, deps) {
const dest = path.join(deps.offlineRoot, "assets/js/search-data.json");
await writeFileMkdirp(dest, jsonBytes);
deps.counters.assets += 1;
}The search-data.js wrapper is written separately by
writeSearchDataJs in step [3] of §4 (see §6.10) -- same JSON bytes,
different wrapper. Both writes use the same in-memory string, so
there's no risk of the two files diverging.
Purpose. One line summarising what Phase 7 did. Matches the
Phase 5+6 summary line in tbdocs.mjs. Actual shipped output:
Phase 1+2+3+4+5+6+7 done: 838 pages, 234 static files
wrote: 837 pages (1 skipped), 7 theme assets, 234 static files -> .../_site-new
aux: 290 redirect stubs, 836 sitemap entries, 2587 search-index entries
offline: 837 HTML, 4 CSS, 290 redirect stubs, 239 assets, 1 excluded (0 unresolved) -> .../_site-new-offline
discover=98ms nav=26ms seo=17ms book=9ms buildInfo=0ms render=1964ms template=565ms write=434ms auxiliaries=141ms offline=1092ms
Notes on the actual numbers vs the first-draft projection:
- 239 assets, not 240. Phase 5 emits 7 theme files; we patch one (just-the-docs.js) and CSS-rewrite four; the remaining two (the vendored lunr.min.js + theme-switch.js) are verbatim copies. Combined with the ~234 static-file copies + 1 search-data.json verbatim copy: 234 statics + 7 theme + 1 search-data + (-3 already accounted for as CSS rewrites) = ~239. The exact arithmetic depends on which counters bucket bumps for each file class; see §5.7.A below.
- 1 excluded, not 4. The 4-file
offline_excludelist (CNAME / robots.txt / sitemap.xml / book.html) only matches files that actually appear instaticFiles[]. Of those, onlyCNAMEis in the static-file inventory;robots.txt/sitemap.xmlare Phase 6 output (not instaticFiles[], not written to_site-offline/), andbook.htmlis skipped by Phase 5 / Phase 7 via thebook-combinedlayout filter. So the exclude counter bumps once. Defence-in-depth vs the offline-exclude regex; the four-file pattern still does what it should. (0 unresolved). Phase 7 produces zero unresolved URLs on the production tree -- meaning every/foo-shape href,src, andurl()resolves to an actual file in<offlineRoot>/. A nonzero value here would be the fast-feedback channel for a source-side bug.
When --profile-offline lands (deferred follow-up), the breakdown
rows would mirror the Jekyll offlinify.rb table -- per-pass setup /
strip_seo / rewrite_html / inject_search / write_html /
rewrite_css / rewrite_redirect / patch_jtd / search_data /
copy_static accumulators.
The eight counters returned in offlineStats map to the substeps as
follows:
| Counter | Bumped by | Source |
|---|---|---|
html |
writeOfflinePages |
1 per rewritten page. 837 on the current tree. |
css |
copyOfflineThemeAssets (the CSS branch) |
1 per rewritten CSS file. 4 on the current tree. |
redirects |
writeOfflineRedirects |
1 per stub. 290 on the current tree. |
statics |
copyOfflineStatics |
1 per static file copied. ~233 on the current tree (234 minus CNAME). |
assets |
copyOfflineThemeAssets (the non-CSS branch) + copyOfflineSearchData |
1 per theme-asset file copied verbatim (3: just-the-docs.js, theme-switch.js, vendor/lunr.min.js -- the JTD JS isn't actually counted here, see note) + 1 for search-data.json. ~6 on the current tree. |
excluded |
copyOfflineStatics |
1 per static file matched by offline_exclude. 1 on the current tree (CNAME). |
unresolved |
writeOfflinePages + copyOfflineThemeAssets (CSS) |
Total URL-resolver misses across HTML and CSS rewrites. 0 on the current tree. |
jtdPatches |
patchJustTheDocsJs |
The list of patch labels (["navLink()", "initSearch()"] on the current tree). |
The assets counter is the slightly noisy one in the table because
the JTD JS is written by patchJustTheDocsJs (step [3]) which doesn't
bump any counter; it just returns its patch list. The remaining
verbatim theme-asset copies + the search-data.json copy land in
assets.
Purpose. Produce a Set<string> of every site-rooted forward-slash
path the URL resolver will probe. Built once per build, queried via
Set#has during per-file rewriting.
Algorithm.
async function buildSitePaths(pages, staticFiles, destRoot, excludePatterns, stubs = []) {
const paths = new Set();
for (const p of pages) {
if (p.frontmatter?.layout === "book-combined") continue;
const rel = p.destPath.replaceAll("\\", "/");
if (offlineExcluded(rel, excludePatterns)) continue;
paths.add("/" + rel);
}
for (const s of staticFiles) {
const rel = s.destRel.replaceAll("\\", "/");
if (offlineExcluded(rel, excludePatterns)) continue;
paths.add("/" + rel);
}
for (const stub of stubs) {
const rel = stub.destPath.replaceAll("\\", "/");
if (offlineExcluded(rel, excludePatterns)) continue;
paths.add("/" + rel);
}
const themeRoot = path.join(destRoot, "assets");
if (existsSync(themeRoot)) {
const themeFiles = await collectThemeFiles(themeRoot);
for (const f of themeFiles) {
const rel = "assets/" + f.relUnderAssets;
if (offlineExcluded(rel, excludePatterns)) continue;
paths.add("/" + rel);
}
}
paths.add("/assets/js/search-data.json");
return paths;
}The function ships with three signature changes vs the initial spec, each driven by a finding during integration testing:
- Async +
destRoot. Phase 5 copies theme assets (CSS, vendor JS) frombuilder/assets/directly into<destRoot>/assets/; they don't pass throughstaticFiles[]. Without walking<destRoot>/assets/here, the resolver doesn't know any theme asset is a valid target, and every<link href="/assets/css/...">and<script src="/assets/js/...">in the rendered HTML comes back as unresolved. The walk costs ~3-5 ms on the current asset tree. stubsparameter. Phase 6'sderiveRedirectStubsreturns the list of stub destinations (e.g.tB/Core/LBound.html, which redirects totB/Modules/Information/LBound.html). The offline tree writes them out, so they exist on disk, butpages[]doesn't include them. Without adding them to the resolver's set, a page-relative link likeLBoundfromtB/Core/Option.html(resolving to probe path/tB/Core/LBound) misses every candidate and stays as the bareLBound. Threading stubs through fixes it. The orchestrator passesauxStats.redirects.stubs; diff tools derive stubs locally viaderiveRedirectStubs(pages, site).- Page filter by layout, not by
page.html. The initial spec saidif (p.html === undefined) continue;-- skipping book.html. But the diff tools (_diff.mjs --offline-redirect,_triage.mjs auditOfflineRedirects) callbuildOfflineStatewithout running templatePhase, so every page haspage.html === undefinedand the Set would be empty. The reliable signal ispage.frontmatter?.layout === "book-combined"-- the only case templatePhase would have skipped. Same semantic, broader callability.
The Set's keys mirror the filesystem-decoded form (e.g. /Tutorials/ CustomControls/Form Designer.html, not the percent-encoded URL form
/Tutorials/CustomControls/Form%20Designer.html). URL resolution
percent-decodes the input path before probing, so the comparison
operates on decoded strings.
Performance. Set construction is O(n) over ~1,140 entries (~3-5 ms on the dev machine including the theme-tree walk). Per-lookup cost is O(1).
Purpose. Remove the jekyll-seo-tag block from a page's <head>,
keeping only its <title> tag.
Algorithm.
const SEO_BLOCK_RE = /<!-- Begin Jekyll SEO tag.*?<!-- End Jekyll SEO tag -->/s;
const TITLE_RE = /<title>.*?<\/title>/s;
function stripSeo(html) {
if (!html.includes("<!-- Begin Jekyll SEO tag")) return html;
return html.replace(SEO_BLOCK_RE, (block) => {
const titleMatch = block.match(TITLE_RE);
return titleMatch ? titleMatch[0] : "";
});
}The block bracketed by the <!-- Begin Jekyll SEO tag vX.Y.Z --> /
<!-- End Jekyll SEO tag --> comments is what Phase 4's
renderHeadSeo emits byte-for-byte (verified earlier: see the
template.mjs check around renderHeadSeo). Inside the
block live <title>, generator/OpenGraph/Twitter Card meta, the
<link rel="canonical"> (pointing at the live site), and a JSON-LD
structured-data <script>. The <title> is the only thing a local
reader uses (browser tab label); the rest exists for search-engine
crawlers and social-media previewers that never see _site-offline/.
Stripping the block saves ~750 KB across the offline tree (~900 B per
page × 837 pages) and removes three of the four
https://docs.twinbasic.com references each page would otherwise
contain. The fourth (the JSON-LD "url" field) was inside the block
too, so all four go away.
Why runs before rewriteHtml. The rewrite touches every
href/src attribute via regex; pruning ~900 bytes of soon-to-be-
deleted SEO content saves the rewrite from doing work it'll throw
away. Also keeps the <link rel="canonical" href="https://...">
absolute URL (which the rewrite would otherwise see and ignore --
it's not a /-leading path) from confusing any future "is this still
a live-site URL?" check.
Purpose. Resolve an absolute URL (/tB/Core/Const) to a
page-relative URL (../Const.html) given the source file's directory
segments.
Algorithm. Port of Jekyll offlinify.rb compute_relative +
resolve_raw:
function computeRelative(raw, fileSegs, sitePaths, caches, baseurl) {
// File-dir-independent half: parse, decode, baseurl-strip, probe.
// Cached by `raw` alone -- the resolution is shared across every
// source file that emits the same URL.
let resolved = caches.rawResolution.get(raw);
if (resolved === undefined) {
resolved = resolveRaw(raw, sitePaths, baseurl);
caches.rawResolution.set(raw, resolved);
}
const [sep, tail, sitePath] = resolved;
if (sitePath === null) return null;
// File-dir-dependent half: LCP walk + relative-path build.
// Cached by `sitePath` -- segments are reused across every emit.
let segCacheEntry = caches.seg.get(sitePath);
if (segCacheEntry === undefined) {
segCacheEntry = buildSegs(sitePath);
caches.seg.set(sitePath, segCacheEntry);
}
const [decodedSegs, encodedSegs] = segCacheEntry;
let common = 0;
const fsLen = fileSegs.length;
const tsLen = decodedSegs.length;
while (common < fsLen && common < tsLen && fileSegs[common] === decodedSegs[common]) {
common++;
}
const ascend = "../".repeat(fsLen - common);
const descend = encodedSegs.slice(common).join("/");
let rel = ascend + descend;
if (rel === "") rel = "./";
return rel + sep + tail;
}
function resolveRaw(raw, sitePaths, baseurl) {
const hashIdx = raw.search(/[?#]/);
const path = hashIdx === -1 ? raw : raw.slice(0, hashIdx);
const sep = hashIdx === -1 ? "" : raw[hashIdx];
const tail = hashIdx === -1 ? "" : raw.slice(hashIdx + 1);
let fsPath = decode(path);
if (baseurl) {
if (fsPath === baseurl) fsPath = "/";
else if (fsPath.startsWith(baseurl + "/")) fsPath = fsPath.slice(baseurl.length);
}
let candidates;
if (fsPath.endsWith("/")) {
candidates = [fsPath, fsPath + "index.html"];
} else if (fsPath.includes(".")) {
candidates = [fsPath, fsPath + "/index.html"];
} else {
candidates = [fsPath, fsPath + ".html", fsPath + "/index.html"];
}
const sitePath = candidates.find(c => sitePaths.has(c)) ?? null;
return [sep, tail, sitePath];
}The decode helper percent-decodes (%20 → ):
function decode(s) {
return s.replace(/%([0-9A-Fa-f]{2})/g, (_, h) => String.fromCharCode(parseInt(h, 16)));
}The buildSegs helper produces the decoded/encoded segment pair, with
URL-safe segments sharing strings between the two arrays:
const PATH_SAFE_RE = /[^A-Za-z0-9\-_.~!$&'()*+,;=:@]/g;
function buildSegs(sitePath) {
const decoded = sitePath.slice(1).split("/");
const encoded = decoded.map(seg =>
PATH_SAFE_RE.test(seg)
? seg.replace(PATH_SAFE_RE, c => "%" + c.charCodeAt(0).toString(16).toUpperCase().padStart(2, "0"))
: seg,
);
return [decoded, encoded];
}Three-cache caching strategy (mirrors Jekyll offlinify.rb):
caches.rawResolution:Map<raw, [sep, tail, sitePath]>. File-dir-independent. Each unique URL resolves once across the whole build.caches.seg:Map<sitePath, [decodedSegs, encodedSegs]>. Computed once per unique target file.caches.result:Map<file_dir, Map<raw, finalRelUrl>>. End-to-end cache. Composed of caches 1+2 plus the per-source-dir LCP walk. The inner Map hoisted once per file-dir at the start of each rewrite so per-match cost is one map lookup.
Without cache 3, Jekyll's offlinify pass takes ~7× longer. The nested
shape (rather than a composite-key Map<"file_dir\x00raw", url>)
avoids the per-match string allocation; cumulative saving ~280 ms on
Jekyll's HTML walk.
Purpose. Resolve a page-relative URL (Attributes#description)
against the current file's directory, probing the filesystem for
candidate extensions.
Algorithm. Port of Jekyll offlinify.rb compute_rel_url:
function computeRelUrl(raw, fileSegs, sitePaths) {
const hashIdx = raw.search(/[?#]/);
const path = hashIdx === -1 ? raw : raw.slice(0, hashIdx);
const sep = hashIdx === -1 ? "" : raw[hashIdx];
const tail = hashIdx === -1 ? "" : raw.slice(hashIdx + 1);
if (path === "") return null;
const decoded = decode(path);
const trailingSlash = decoded.endsWith("/");
const stack = [...fileSegs];
for (const seg of decoded.split("/")) {
if (seg === "" || seg === ".") continue;
if (seg === "..") stack.pop();
else stack.push(seg);
}
let probePath = "/" + stack.join("/");
if (trailingSlash && !probePath.endsWith("/")) probePath += "/";
let candidates;
if (probePath.endsWith("/")) {
candidates = [["", probePath], ["index.html", probePath + "index.html"]];
} else if (probePath.includes(".")) {
candidates = [["", probePath], ["/index.html", probePath + "/index.html"]];
} else {
candidates = [["", probePath], [".html", probePath + ".html"], ["/index.html", probePath + "/index.html"]];
}
for (const [suffix, full] of candidates) {
if (sitePaths.has(full)) return path + suffix + sep + tail;
}
return null;
}Critical difference from computeRelative. This returns the
original raw plus the matching suffix (not a freshly computed
relative path). The path is already correctly relative to the source
file; the only fix needed is the suffix (.html / /index.html /
none). Returning a freshly-computed path would break the source's
intent on cases where the relative form encodes nuance (e.g. a
sibling reference within the same folder).
When path === raw_suffix === "" (i.e. the URL was just ?query or
#fragment), the early return guards. Fragment-only URLs are
prevented from entering this function by the outer regex in
rewriteHtml (the (?![#/]...) lookahead).
Purpose. Test a site-rooted forward-slash path against the
offline_exclude patterns. Matches Jekyll's
File.fnmatch(pattern, rel, File::FNM_PATHNAME) semantics: * does
NOT cross directory separators.
Algorithm.
function offlineExcluded(rel, patterns) {
if (!patterns.length) return false;
return patterns.some(pat => fnmatchPathname(pat, rel));
}
function fnmatchPathname(pattern, str) {
// Convert pattern to regex. `*` -> `[^/]*`, `?` -> `[^/]`,
// `**` -> `.*` (cross-segment), everything else literal.
// Mirrors Ruby File::FNM_PATHNAME.
let re = "^";
for (let i = 0; i < pattern.length; i++) {
const c = pattern[i];
if (c === "*") {
if (pattern[i + 1] === "*") { re += ".*"; i++; }
else { re += "[^/]*"; }
} else if (c === "?") {
re += "[^/]";
} else if (".+^$()|[]{}\\".includes(c)) {
re += "\\" + c;
} else {
re += c;
}
}
re += "$";
return new RegExp(re).test(str);
}The current offline_exclude list (CNAME, robots.txt,
sitemap.xml, book.html) is all plain-string patterns, so the
regex compilation is essentially a literal-string check. The full
implementation supports future additions like **/*.bat correctly.
Why not use a npm package (e.g. minimatch)? The semantics are
narrow enough that ~25 lines of code suffice, and adding a dependency
for fnmatch alone is overkill. Mirrors the "no fs-extra / cpy"
rationale from PLAN-5 §3.
Purpose. Single regex pass over HTML, rewriting every absolute
and page-relative URL in href / src attributes while skipping
content inside <code> / <pre> blocks.
Algorithm.
const HTML_COMBINED_RE = /<code\b[^>]*>[\s\S]*?<\/code>|<pre\b[^>]*>[\s\S]*?<\/pre>|\b(href|src)=(["'])(\/(?!\/)[^"']*|(?![#/]|[a-zA-Z][a-zA-Z0-9+.\-]*:)[^"']+)\2/g;
function rewriteHtml(html, fileDir, fileSegs, sitePaths, caches, baseurl) {
let misses = 0;
const pageCache = getPageCache(caches.result, fileDir);
const rewritten = html.replace(HTML_COMBINED_RE, (match, attrName, quote, rawUrl) => {
if (attrName === undefined) {
// <code> or <pre> block -- leave verbatim.
return match;
}
let rel = pageCache.get(rawUrl);
if (rel === undefined) {
rel = rawUrl.startsWith("/")
? computeRelative(rawUrl, fileSegs, sitePaths, caches, baseurl)
: computeRelUrl(rawUrl, fileSegs, sitePaths);
pageCache.set(rawUrl, rel);
}
if (rel === null) {
misses++;
return match;
}
if (rel === rawUrl) {
// File already correct at the relative path (e.g. `Foo.html` exists).
return match;
}
return `${attrName}=${quote}${rel}${quote}`;
});
return { rewritten, misses };
}
function getPageCache(resultCache, fileDir) {
let pageCache = resultCache.get(fileDir);
if (!pageCache) {
pageCache = new Map();
resultCache.set(fileDir, pageCache);
}
return pageCache;
}The combined regex carries three top-level alternatives:
<code\b[^>]*>[\s\S]*?</code>-- a<code>block. Matched atomically; the callback returns it verbatim becauseattrNameis undefined on this branch.<pre\b[^>]*>[\s\S]*?</pre>-- same for<pre>.\b(href|src)=(["'])(URL)\2-- a real href/src attribute. The URL alternation (/(?!/)[^"']*|(?![#/]|[scheme:])[^"']+) matches either:- an absolute path (
/foo) that's not protocol-relative (//). - a page-relative path (
Foo,Foo#frag) that doesn't start with#(fragment-only),/(absolute, handled above), or a URL scheme (http:,mailto:, etc.).
- an absolute path (
The disjoint URL shapes let one regex handle both cases; the
raw.startsWith("/") test inside the callback dispatches to
computeRelative vs computeRelUrl.
Why fold the code-block skip into the regex rather than a
separate code_block_ranges precompute? Jekyll offlinify.rb measured
~800 ms saving from the fold (per its history). In tbdocs, the same
fold saves the per-match offset_in_code_block? linear scan inside a
gsub callback. Cleaner and faster.
Note on shiki output. Phase 3's syntax highlighter produces
<pre> / <code> blocks wrapping each highlighted code sample,
matching the same shape Rouge produces in Jekyll. The skip works
either way -- both highlighters HTML-escape < and > inside code
bodies but leave " alone, so src="/foo" inside a code sample
would match the href/src alternative without the skip. The atomic
consumption of the code-block alternatives is what makes the tutorial
code samples come through unrewritten.
Nav-block caching (deferred). Jekyll's offlinify reaps ~1,900 ms of savings from a per-source-dir nav cache (the ~112 KB just-the-docs sidebar is identical across pages within a source dir). tbdocs's projected per-page rewrite cost is ~0.3 ms × 837 pages = ~250 ms even without the nav cache; the cache would bring it to ~50 ms. Phase 7's first cut does NOT implement the nav cache. If profiling shows the HTML pass dominating, the cache lands as a follow-up (added complexity vs ~200 ms saving -- not a clear win at the projected baseline). See §7.D7 for the deferral rationale.
Purpose. Same as rewriteHtml but for CSS url(...) references.
Algorithm.
const CSS_URL_RE = /url\(\s*(["']?)(\/(?!\/)[^"'()\s]*)\1\s*\)/g;
function rewriteCss(css, fileDir, fileSegs, sitePaths, caches, baseurl) {
let misses = 0;
const pageCache = getPageCache(caches.result, fileDir);
const rewritten = css.replace(CSS_URL_RE, (match, quote, rawUrl) => {
let rel = pageCache.get(rawUrl);
if (rel === undefined) {
rel = computeRelative(rawUrl, fileSegs, sitePaths, caches, baseurl);
pageCache.set(rawUrl, rel);
}
if (rel === null) {
misses++;
return match;
}
return `url(${quote}${rel}${quote})`;
});
return { rewritten, misses };
}CSS has no code-block concept (no nested <pre> / <code>), and
url() references are always absolute on this site (the favicon
reference in just-the-docs-combined.css is the only current case).
No page-relative URL form in CSS.
The cache slot is shared with rewriteHtml's page cache -- same
(file_dir, raw) keys with the same absolute URL shape, so a CSS
file rewriting /favicon.png from assets/css/ gets a fresh cache
slot (file_dir = assets/css); a page rewriting the same URL from
tB/Core/ gets a different slot.
Purpose. Inject two <script> tags right before the existing
<script src="...just-the-docs.js"> tag in each rendered HTML page.
Algorithm.
const JTD_SCRIPT_TAG_RE = /<script\s+src="([^"]*)just-the-docs\.js"/;
function injectSearchSetup(html, fileSegs) {
return html.replace(JTD_SCRIPT_TAG_RE, (match, prefix) => {
const siteRoot = fileSegs.length === 0 ? "" : "../".repeat(fileSegs.length);
return `<script>window.OFFLINE_SITE_ROOT="${siteRoot}";</script>\n` +
`<script src="${prefix}search-data.js"></script>\n` +
match;
});
}Two scripts inserted:
<script>window.OFFLINE_SITE_ROOT="../../";</script>-- the per-page relative prefix from the page's directory to the offline site root. Computed fromfileSegs(the source page's depth). Empty string at root;"../../"at depth 2. The patchedinitSearch()reads this to convert search-result URLs into page-relative paths.<script src="<prefix>search-data.js"></script>-- loads the lunr index data intowindow.SEARCH_DATA. The<prefix>is captured from the existing just-the-docs.js script tag's src (e.g.../../assets/js/); reusing it placessearch-data.jsnext tojust-the-docs.json disk.
Both run in source order before just-the-docs.js, so the globals
are populated before initSearch() fires inside the document-ready
callback.
Why find the just-the-docs.js script tag by regex rather than inserting at a fixed position? The script tag's relative path depth varies by page depth, and finding it by-name gives a stable anchor regardless of where the head sits in the template output. The same approach Jekyll's offlinify uses.
Idempotent: if the page has no just-the-docs.js tag (e.g. a hypothetical layout-less page), the regex doesn't match and the page is left as-is.
Purpose. Replace navLink() and initSearch() function bodies
in just-the-docs.js with offline-friendly versions. The disk-touching
side (read + write) is the writer; the byte-perfect-vs-Ruby transform
itself is exported as deriveOfflineJtdJs for the diff tools.
Algorithm.
const JTD_NAVLINK_RE = /function navLink\(\) \{[\s\S]*?return null; \/\/ avoids `undefined`\s*\}/;
const JTD_INITSEARCH_FN_RE = /function initSearch\(\) \{[\s\S]*?request\.send\(\);\s*\}/;
// IMPORTANT: the "Patched by _plugins/offlinify.rb" comment strings
// are kept verbatim from the Ruby Offlinify constants so the patched
// JS in <offlineRoot>/assets/js/just-the-docs.js is byte-identical to
// Jekyll's _site-offline/assets/js/just-the-docs.js. Don't rename to
// "offline.mjs" without first updating the byte-parity matrix in §10.
const JTD_NAVLINK_REPLACEMENT = `function navLink() {
// Patched by _plugins/offlinify.rb for file:// compatibility.
var here = window.location.href.split('#')[0].split('?')[0];
var links = document.getElementById('site-nav').querySelectorAll('a.nav-list-link');
for (var i = 0; i < links.length; i++) {
if (links[i].href === here) return links[i];
}
return null;
}`;
const JTD_INITSEARCH_FN_REPLACEMENT = `function initSearch() {
// Patched by _plugins/offlinify.rb for file:// compatibility.
var docs = window.SEARCH_DATA;
if (!docs) {
console.log('Offlinify: window.SEARCH_DATA not found; ensure search-data.js loads before just-the-docs.js');
return;
}
var siteRoot = window.OFFLINE_SITE_ROOT || '';
for (var i in docs) {
var rel = docs[i].relUrl;
if (typeof rel === 'string' && rel.charAt(0) === '/') {
var hash = '';
var hashIdx = rel.indexOf('#');
if (hashIdx !== -1) {
hash = rel.slice(hashIdx);
rel = rel.slice(0, hashIdx);
}
rel = rel.slice(1);
if (rel.endsWith('/')) {
rel = rel + 'index.html';
} else {
var lastSlash = rel.lastIndexOf('/');
var lastSeg = lastSlash === -1 ? rel : rel.slice(lastSlash + 1);
if (lastSeg.indexOf('.') === -1) rel = rel + '.html';
}
docs[i].url = siteRoot + rel + hash;
}
}
lunr.tokenizer.separator = /[\\s\\-\\/]+/;
var index = lunr(function(){
this.ref('id');
this.field('title', { boost: 200 });
this.field('content', { boost: 2 });
this.field('relUrl');
this.metadataWhitelist = ['position'];
for (var i in docs) {
this.add({
id: i,
title: docs[i].title,
content: docs[i].content,
relUrl: docs[i].relUrl
});
}
});
searchLoaded(index, docs);
}`;
async function patchJustTheDocsJs(srcPath, destPath) {
let src;
try { src = await fs.readFile(srcPath, "utf8"); }
catch (err) {
if (err.code === "ENOENT") {
console.warn(`offline: ${srcPath} not found; skipping JTD patch`);
return [];
}
throw err;
}
const { js, patches, warnings } = deriveOfflineJtdJs(src);
for (const w of warnings) console.warn(w);
await writeFileMkdirp(destPath, js);
return patches;
}
export function deriveOfflineJtdJs(src) {
let out = src;
const patches = [];
const warnings = [];
let next = out.replace(JTD_NAVLINK_RE, JTD_NAVLINK_REPLACEMENT);
if (next !== out) { patches.push("navLink()"); out = next; }
else warnings.push(
"offline: could not locate navLink() in just-the-docs.js -- " +
"nav-active detection will be broken under file://. Update " +
"JTD_NAVLINK_RE in builder/offline.mjs.",
);
next = out.replace(JTD_INITSEARCH_FN_RE, JTD_INITSEARCH_FN_REPLACEMENT);
if (next !== out) { patches.push("initSearch()"); out = next; }
else warnings.push(
"offline: could not locate initSearch() in just-the-docs.js -- " +
"offline search will not work. Update JTD_INITSEARCH_FN_RE in " +
"builder/offline.mjs.",
);
return { js: out, patches, warnings };
}deriveOfflineJtdJs returns the patched bytes plus the list of patch
labels and any warnings -- the writer prints warnings; the diff tool
displays them inline. Both consume the same transform.
Two replacement function bodies, byte-for-byte identical to the
Jekyll offlinify.rb constants (verified by side-by-side review of
the JS string contents). The escape sequences for the lunr tokenizer
separator regex (/[\\s\\-\\/]+/) need an extra layer of escaping in
the JS source string compared to Ruby's heredoc; one source of
fragility worth noting in a comment.
Why a regex-based replacement rather than parsing the JS AST?
Two patterns to substitute, both anchored on stable function shapes
in the upstream just-the-docs theme. A miss emits a warning
identifying the constant that needs updating -- the early-warning
signal that just-the-docs has shipped a new version of the function.
An AST-based replacement (via acorn or similar) adds a dependency
tree for ~50 lines of logic; the cost / benefit doesn't favour it.
Purpose. Write search-data.js -- the JS-wrapped form of the
search index that <script src=> can load under file://. The wrap
itself is the exported pure-compute deriveOfflineSearchDataJs.
Algorithm.
async function writeSearchDataJs(destPath, jsonBytes) {
if (jsonBytes == null) return 0;
const js = deriveOfflineSearchDataJs(jsonBytes);
await writeFileMkdirp(destPath, js);
return js.length;
}
export function deriveOfflineSearchDataJs(jsonBytes) {
return `window.SEARCH_DATA = ${jsonBytes};\n`;
}Single-line wrap; no transformation of the JSON itself. The
just-the-docs.js patched initSearch() reads window.SEARCH_DATA
directly, so the wrapper's form (var vs let vs direct assignment)
doesn't matter as long as the global lands.
Why use the in-memory JSON bytes rather than re-reading from
_site/assets/js/search-data.json? Phase 6's writeSearchData has
the bytes; passing them through to Phase 7 saves a ~2.8 MB disk read
(~10 ms on the dev machine). Phase 6 already returns { entries };
adding { entries, json } is a one-line change to the substep
signature.
The matching read-from-disk fallback (for a hypothetical future where Phase 6 doesn't return the bytes) would be:
const jsonBytes = await fs.readFile(
path.join(destRoot, "assets/js/search-data.json"), "utf8",
);Costs nothing more than the read; mention in a comment as the alternative if the orchestrator API ever changes.
Purpose. Compute a file's directory segments for use in the URL resolver's LCP walk.
Algorithm.
function fileDirSegsFromRel(rel) {
const normalised = rel.replaceAll("\\", "/");
const dir = path.posix.dirname(normalised);
if (dir === "." || dir === "") return [];
return dir.split("/");
}Returns an empty array for files at the root (index.html →
fileSegs = []). For tB/Core/Const.html → ["tB", "Core"].
Purpose. Coerce site.config.baseurl into the exact prefix the
template's relative_url filter emits, so the URL resolver can
strip-match it character-for-character.
Algorithm. Port of Jekyll offlinify.rb normalize_baseurl:
function normalizeBaseurl(raw) {
let baseurl = String(raw ?? "").replace(/\/+$/, "");
if (baseurl && !baseurl.startsWith("/")) baseurl = "/" + baseurl;
return baseurl;
}Strips any trailing slashes; prepends a leading slash if missing; an
empty / null / undefined input returns "". The current site has
baseurl unset (empty); the function is a no-op then. A future
deployment to a GitHub Pages project page (baseurl: /repo-name)
would feed the resolver the same /repo-name prefix the template
prepends to every URL.
Purpose. Escape regex metacharacters for safe interpolation into
a dynamically-constructed regex. Used in writeOfflineRedirects to
build the <site.url><path> matcher.
Algorithm.
function escapeRegExp(s) {
return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}The standard ten-character escape set. Sufficient for the current
site.config.url (https://docs.twinbasic.com -- only the .s
need escaping) and any future URL the implementer might configure.
Purpose. Walk _site/assets/ recursively, classifying each file
for the offline theme-asset pass (§5.5).
Algorithm.
async function collectThemeFiles(themeRoot) {
const out = [];
async function walk(relPath) {
const dirents = await fs.readdir(
path.join(themeRoot, relPath), { withFileTypes: true },
);
for (const d of dirents) {
const childRel = relPath === "" ? d.name : path.posix.join(relPath, d.name);
if (d.isDirectory()) {
await walk(childRel);
} else if (d.isFile()) {
out.push({
relUnderAssets: childRel, // e.g. "css/print.css"
srcAbs: path.join(themeRoot, childRel),
isCss: childRel.endsWith(".css"),
isJtdJs: childRel === "js/just-the-docs.js",
});
}
}
}
await walk("");
return out;
}isJtdJs is the flag that tells the per-file dispatch in §5.5 to
skip this entry -- step [3] of §4 already wrote the patched copy.
isCss selects the rewriteCss path; everything else is a verbatim
copy.
On the current site themeRoot (<destRoot>/assets) holds 7 files
(css/just-the-docs-combined.css, css/just-the-docs-head-nav.css,
css/print.css, css/rouge.css, js/just-the-docs.js,
js/theme-switch.js, js/vendor/lunr.min.js). The walk is ~5 ms.
A future addition under assets/ (e.g. a new vendored font under
assets/fonts/) shows up automatically -- the walker doesn't enumerate
a fixed list. Image assets that ship via staticFiles[] rather than
the theme tree (e.g. assets/images/mmd/*.svg) go through the
static-file pass (§5.4), not here.
Jekyll's offlinify.rb wipe_out_dest_contents removes everything
under _site-offline/ but keeps the directory in place. The rationale
(per the inline comment): if the directory itself disappears and is
re-created, jekyll serve's watcher reports a bare _site-offline
event (no trailing slash, since the directory is momentarily absent at
notification time) that the _site-offline/ exclude pattern doesn't
match. Result: infinite rebuild loop.
tbdocs doesn't ship a watcher today, but follows the same convention:
- Cheap to honour (one extra
readdir+ per-entryrminstead of onefs.rmof the parent). - Removes the footgun if a watcher lands later.
- Matches the Jekyll behaviour exactly, so a side-by-side build of
both produces the same
_site-offline/snapshot.
The orchestrator chains the phases in order: Phase 5 writes _site/,
Phase 6 adds sitemap/robots/redirects/search-data, Phase 7 mirrors
the whole tree to _site-offline/. The ordering matters for two
reasons:
site_pathscompleteness. Phase 7'sbuildSitePathsderives the URL-resolution targets frompages[]+staticFiles[]. If Phase 6's outputs (search-data.json) aren't in the Set, a page linking to/assets/js/search-data.jsonwould resolve to "not found" and the link would stay absolute. Defensive: §6.1 addssearch-data.jsonto the Set explicitly.- Redirect-stub byte source. Phase 7 transforms Phase 6's redirect-stub HTML to swap absolute target URLs for page-relative. The stubs are passed in-memory from Phase 6 (see §7.D8).
The orchestrator awaits Phase 6's Promise.all before invoking
Phase 7; the wait is implicit in the existing await Promise.all(...)
in tbdocs.mjs.
Phase 7 reads _site/assets/js/just-the-docs.js and (depending on
implementation choice) _site/assets/js/search-data.json. Both are
reads only -- Phase 7 never writes back to _site/. The online
deploy artifact stays canonical.
If the reads moved to all-in-memory (search-data via §7.D8, JTD via a
pre-read in Phase 5), Phase 7 wouldn't need to touch _site/ at all.
The current spec accepts the JTD read for simplicity; promotion to
all-in-memory is a follow-up.
The three caches (rawResolution, seg, result) are allocated at
the top of writeOffline and destroyed when it returns. No other
phase consumes them; no cross-build memoisation. Per-build cost is
~25 MB peak (Jekyll's measurement on the same site; JS's Map
representation is in the same ballpark).
If a future "incremental rebuild" mode lands, the caches would become per-rebuild rather than per-build. Not relevant today.
book.html has layout: book-combined (PLAN-4 §5.10). Phase 5 skips
its write (page.html === undefined). Phase 6's outputs don't touch
it (no redirect_from, no sitemap entry, no search entry -- per
PLAN-6 §7.D13). Phase 7 also doesn't touch it:
- The
pages[]filter inwriteOfflinePagesskipspage.html === undefined(same as Phase 5). - The
offline_excludelist includesbook.htmldefensively, so even if it were in_site/, the static-copy pass would skip it.
Phase 8 (pdf.mjs) is the owner of book.html; it ships only to
_site-pdf/.
A URL the rewriter can't resolve (no matching site path) is left as the original absolute string. This is the same behaviour Jekyll offlinify uses, and the rationale is the same:
- Unresolved is a real bug signal. The offline link check
(
check_links.mjs --forbid 'https://docs.twinbasic.com') catches surviving absolute URLs; if a rewrite silently dropped or redirected, the bug would hide. - No semantic guess. The rewriter has no way to know whether
/some/missing/pathwas meant to point atsome/missing/path.html,some/missing/path/index.html, or somewhere else entirely. Best to surface the failure than to guess.
The per-build unresolved counter (in the summary log line) is the
fast feedback channel: a non-zero value means one or more sources
need fixing.
Jekyll offlinify reaps ~1,900 ms from a per-source-dir nav-block cache (the ~112 KB just-the-docs sidebar HTML is identical across pages and rewritten the same way per source directory). On Ruby, that cache turns ~600k of ~720k per-match callbacks into hash lookups.
tbdocs's projected baseline (no cache) is ~250 ms for the HTML pass (~0.3 ms per page × 837 pages, derived from cache-warmed Phase 6 timings). Adding the cache would bring it to ~50 ms -- a ~200 ms saving for ~80 lines of extra code (placeholder substitution, cache storage, splice-back logic).
The first cut doesn't include it. Two reasons:
- The total Phase 7 budget is ~480 ms (per §10's projection -- with the caveat that the projection is a guess; see §10). ~200 ms is meaningful but not transformative; if Phase 7 hits its budget without the cache, the added complexity isn't justified.
- The cache adds a subtle invariant (the placeholder string must not collide with content; the post-gsub splice must find the placeholder) that's harder to verify than the cache-less form.
Reconsider during Phase 7's first measured run, not as a follow-up. If the HTML pass exceeds 800 ms on the dev machine in the first real timing capture, the cache moves from "deferred follow-up" to "in-scope before merge." Eight hundred ms is the threshold at which Phase 7's total wall time risks blowing past the 1500 ms soft cap once the other substeps' actual costs land. Below that threshold the deferral stands. Track the threshold call in FUTURE-WORK.md when Phase 7 ships.
The default Phase 6 substep returns are { entries: N } (sitemap,
search) and { written: N } (redirects). Phase 7 needs more:
- The
search-data.jsonbyte string (to wrap assearch-data.js). - The redirect-stub
{ destPath, html }array (to rewrite per stub).
Recommended Phase 6 API extension (additive, non-breaking):
// Phase 6 substep signatures
writeSearchData(pages, site, destRoot) → { entries, json }
writeRedirects(pages, site, destRoot) → { written, stubs }
writeSitemap(pages, site, destRoot) → { entries, robots } // unchangedThe orchestrator's auxStats shape then carries the new fields:
auxStats = {
redirects: { written, stubs },
sitemap: { entries, robots },
search: { entries, json },
};Phase 7 reads from auxStats.search.json and auxStats.redirects.stubs.
The verify-phase6 harness should remain green -- the existing checks
only look at entries / written counts, not the new fields.
The alternative -- have Phase 7 re-read _site/assets/js/search-data.json
and re-derive the redirect stubs via deriveRedirectStubs(pages, site)
-- would work too, with these costs:
- The search-data re-read costs ~10 ms (2.8 MB at SSD speed).
- The redirect-stub re-derive costs ~5 ms (290 stubs × ~17 µs each for the string-template substitution + collision check). Throws on collision a second time, redundantly. Surfaces the same error.
The in-memory path is preferred (cleaner, no duplicate work) but the re-read path is an acceptable fallback if Phase 6's API extension is deferred.
Jekyll offlinify gates a 4-thread write pool on Windows (Gem.win_platform?)
to overlap mkdir_p + binwrite with the next page's render and
rewrite. The pool saves ~530 ms on Windows -- on Linux ext4 the
overhead exceeds the saving.
tbdocs doesn't need it: Node's libuv pool already runs file writes
asynchronously (the same kernel-async path on Windows), and the
runLimited-based fan-out from write.mjs keeps the in-flight count
bounded. No threading code in JavaScript, no platform-gate logic, and
the per-file write cost is comparable.
If profiling on Windows shows write throughput is a bottleneck after
all, the runLimited cap can be tuned (current 64; bump to 128 or
256). Not needed at the projected baseline.
Phase 7 has internal section boundaries (orchestration, URL resolver,
HTML rewrite, CSS rewrite, redirect rewrite, JS patch, static copy)
but ~600 lines total -- comfortably one file. Mirrors write.mjs's
Phase 5 layout. See §3's "Why one module" subsection.
A future split into offline.mjs + offline-resolve.mjs would
extract §C's URL resolver into a standalone module with the three
caches as instance state. Reserve as a refactor target if §C ever
exceeds ~300 lines.
Phase 7's outer call from the orchestrator is one await writeOffline(...) after the existing await Promise.all([... auxiliaries ...]). Phase 7 internally fans out the five substeps
in §4 step [4] via its own Promise.all; this is independent of
Phase 6's fan-out.
The orchestrator's per-phase timing (t.lap("offline")) captures
the full Phase 7 wall time.
The SEO block in template.mjs is byte-identical to jekyll-seo-tag
v2.8.0's output for the current site config (verified by an earlier
template.mjs check). The stripSeo regex matches the <!-- Begin Jekyll SEO tag vX.Y.Z --> / <!-- End Jekyll SEO tag --> brackets
regardless of the version string -- so a future tbdocs change to the
version comment, or a future jekyll-seo-tag version bump, won't break
the strip.
If tbdocs ever stops emitting the bracketed comments (or emits a
different comment shape), the strip becomes a no-op. The page would
still build correctly -- just with the SEO block intact in the
offline copy. The acceptance check in §10 catches this (0 surviving
https://docs.twinbasic.com references per offline page).
Two options for the source of the theme-asset copy:
- From
_site/assets/(recommended). What Phase 5 just copied. Tracks any future post-copy transformation Phase 5 might apply. - From
builder/assets/(the source of truth). One disk read fewer per file (already paid by Phase 5).
Option 1 wins on the "what's offline mirrors what's online" model. The disk-read cost is negligible (~310 KB total across 7 files). Option 2 would be faster by ~5 ms but couples Phase 7 to Phase 5's upstream source rather than to its output -- a fragility if Phase 5 ever transforms the bytes between read and write.
Jekyll's also_build_offline: false skips the offline build entirely
(production deploy doesn't need the offline tree). tbdocs's first cut
doesn't expose this flag; the offline build always runs (~480 ms
cost). If the production deploy ever wants to skip it, add a
--no-offline CLI flag to parseArgs and gate the writeOffline
call.
Currently the _config.yml has also_build_offline: true; tbdocs
honours that as the default-and-only behaviour. Worth gating on the
config value when the flag lands (so the config file remains the
source of truth).
Jekyll offlinify copies them as static files. tbdocs follows -- the
staticFiles[] from Phase 1 includes render-book.mjs + lib/*.mjs,
and the offline static-copy pass copies all of them. No exclude
pattern targets them.
The semantic justification: the offline reader might want to re-render the book PDF from the offline tree. The helpers are a few KB total; cost-free to ship.
| Case | Handling |
|---|---|
URL with percent-encoded path (/Tutorials/CustomControls/Form%20Designer) |
resolveRaw percent-decodes before probing; buildSegs re-encodes for the output URL. The single page with this shape (Form Designer.html) resolves to ../../Tutorials/CustomControls/Form%20Designer.html. |
URL with #fragment (/tB/Core/Const#syntax) |
resolveRaw splits the fragment, probes the path, reattaches the fragment to the output. |
URL with ?query (rare; defensive) |
Same #fragment treatment. |
URL with both ?query and #fragment (/foo?q#f) |
The first [?#] match is the split; the rest is the tail. Reattached verbatim. |
Protocol-relative URL (//cdn.example.com/foo) |
Excluded by the regex's \/(?!\/) lookahead. Left untouched. |
Absolute URL with scheme (https://example.com/foo) |
Excluded by the page-relative alternative's (?![scheme:]) lookahead. Left untouched. |
Fragment-only URL (#main-content) |
Excluded by the page-relative alternative's (?![#/]) lookahead. Left untouched. |
Empty URL (href="") |
Excluded by the URL alternative requiring at least one character. Left untouched. |
mailto:, tel:, javascript: schemes |
Excluded by the scheme lookahead. Left untouched. |
| URL pointing at a file the resolver can't find | computeRelative returns null; the callback returns the unchanged match; the unresolved counter increments. |
URL pointing at a file in the offline_exclude list |
buildSitePaths skips excluded files when building the Set, so resolution returns null -- same handling as "not found". The link check will flag it if needed. |
URL with .. segments (../foo) |
Only valid as a page-relative URL. computeRelUrl's stack-based normalisation handles .. popping. |
URL with consecutive slashes (/foo//bar) |
computeRelUrl skips empty segments. Absolute URLs are left as-is by computeRelative (its resolveRaw doesn't normalise). |
URL ending with / (/Tutorials/) |
resolveRaw probes <path>/ and <path>/index.html. The directory form matches if index.html exists at the path. |
URL ending with explicit extension (/foo.html) |
resolveRaw probes <path> (matches the file directly). |
URL with no extension and not in any candidate (/missing-page) |
All three candidates miss; returns null. Unresolved counter increments. |
| Case | Handling |
|---|---|
<a href="/foo"> |
Absolute path → computeRelative → <a href="../foo.html"> (depth-relative). |
<a href="foo"> (no leading slash) |
Page-relative → computeRelUrl → <a href="foo.html"> if foo.html exists in the same dir. |
<a href="foo.html"> (already correct) |
computeRelUrl probes <path> first, matches, returns path + "" (the empty suffix). The rewriter's rel === rawUrl check catches this; no change emitted. |
<script src="...just-the-docs.js"> |
Absolute path → rewritten to e.g. ../../assets/js/just-the-docs.js. The post-rewrite src is what injectSearchSetup matches to insert the new tags. |
<img src="/Tutorials/Images/foo.png"> |
Absolute path → computeRelative → <img src="../Tutorials/Images/foo.png">. |
<a href="https://www.twinbasic.com"> |
Excluded by the scheme lookahead. Left untouched. (The aux_links config emits these; they navigate to the live twinBASIC home, which is correct under file:// too.) |
<code> block containing <a href="/script.js"> (literal source code) |
The combined regex's <code> alternative consumes the block atomically. The literal href is left untouched and does NOT increment the unresolved counter. |
<pre> block containing escaped HTML |
Same <pre> skip. |
Nested <code> inside <pre> |
The first matching block (whichever opens first) is consumed; the inner block is part of the outer's body. The shape Phase 3 produces always has <pre><code>...</code></pre> (Shiki convention), so the <pre> skip consumes the whole thing. |
Self-closing tags (e.g. <img /> with XHTML-style closure) |
The regex doesn't require self-closure shape; the href="..." / src="..." attribute matches independently of how the tag closes. |
Attribute value containing the matched quote (e.g. href='foo"bar') |
The regex uses backreferenced quote; this is the regex's only failure mode for in-attribute quotes. No content on this site triggers it. |
| Case | Handling |
|---|---|
background-image: url("/favicon.png") |
Rewritten to url("../../favicon.png") from assets/css/just-the-docs-combined.css. |
background-image: url('/favicon.png') (single quotes) |
Same handling; the regex captures either quote. |
background-image: url(/favicon.png) (bare URL) |
Same handling; the regex captures the optional quote as empty. |
background-image: url(./relative.png) (relative URL) |
Excluded by \/(?!\/) lookahead. Left untouched. (No content uses this on the current site.) |
Multiple url(...) references in one rule |
Each matches independently. |
| Case | Handling |
|---|---|
| Stub with the standard four absolute URL occurrences | Each occurrence rewrites to the same page-relative form. |
Stub whose target page was excluded from site_paths (e.g. via a future config change) |
The rewrite returns null; the absolute URL stays. The offline link check flags it. |
Stub written when site.url is empty |
The early return in writeOfflineRedirects writes the stub verbatim. The absolute URLs become bare <path> URLs that the link check probes against the local file -- works if the target file exists. |
Stub whose destPath collides with a page (/FAQ.html) |
Phase 6's deriveRedirectStubs already throws on this collision; Phase 7 never sees the conflict. |
| Case | Handling |
|---|---|
CNAME |
Matched by offline_exclude pattern CNAME. Skipped. |
| Image files (PNG, SVG, GIF) | Verbatim copy. |
render-book.mjs, lib/*.mjs, lib/*.js |
Verbatim copy. Shipped to offline tree for PDF re-rendering. |
A future static file matching a **/*.bat pattern |
Skipped by the exclude rule. |
A static file whose destRel collides with a page-write |
Phase 5's assertNoDestinationCollisions already throws; Phase 7 never sees the conflict. |
| Case | Handling |
|---|---|
just-the-docs.js |
Skipped in copyOfflineThemeAssets because step [3] of §4 already wrote the patched copy. |
just-the-docs-combined.css |
Rewritten (favicon URL → relative). |
print.css, rouge.css, just-the-docs-head-nav.css |
Verbatim copy (no url() references to rewrite). |
theme-switch.js, vendor/lunr.min.js |
Verbatim copy. |
| A future theme CSS with multiple url() refs | All rewritten in one pass. |
| Case | Handling |
|---|---|
search-data.json Phase 6 produced |
Copied verbatim (in-memory bytes via auxStats.search.json). |
search-data.js Phase 7 generates |
Wrapped from the same JSON bytes (window.SEARCH_DATA = ${json};). |
Phase 6 ran with search_enabled: false and produced no JSON |
auxStats.search.json is null/undefined; both copyOfflineSearchData and writeSearchDataJs early-return; the per-page injection still emits the <script src="search-data.js"> tag which 404s silently; the patched initSearch() logs the missing-SEARCH_DATA message. |
These belong in other phases or are out of scope. Listed so the implementer doesn't get tempted.
- PDF generation -- Phase 8 (
book.mjsrenderer +pdf.mjs) assemblesbook.htmland writes the sparse_site-pdf/tree. Phase 7 doesn't touch_site-pdf/. - Live-link validation -- the offline link check
(
check_links.mjs --forbid 'https://docs.twinbasic.com') is a post-build harness, not a phase. Run it after Phase 7 viacheck.bat. - Incremental rebuilds -- the orchestrator does full builds only.
Phase 7 wipes
_site-offline/and re-populates from scratch every time. Adding incremental support would require tracking which input files changed and which output files need re-emission; not in scope. - Watch-mode rebuilds -- tbdocs has no watcher. Jekyll's
jekyll servetriggers offlinify on each rebuild; tbdocs would need an equivalent loop wrapper aroundnode builder/tbdocs.mjs. Defer. - Offline-tree compression / minification -- search-data.js is ~2.8 MB (the dominant offline-tree size). Minifying the JSON before wrapping would save ~30-40%. Out of scope for Phase 7's first cut; add as a follow-up if offline-tree size matters.
- Custom JS patches beyond navLink + initSearch -- if just-the-docs
ships a new version with more file://-incompatible code, the patch
set grows. Currently scoped to the two known issues; extend
patchJustTheDocsJsif needed. - A standalone
_site-offline.zippackage -- the GitHub release workflow produces this from_site-offline/. Phase 7 just writes the tree; the zip is built bysoftprops/action-gh-release@v2in.github/workflows/jekyll-gh-pages.yml.
-
After Phase 7 runs on the production tree:
<offlineRoot>/exists and is non-empty.- All
.htmlfiles Phase 5 wrote (minusbook.html) have offline copies under<offlineRoot>/<destPath>. - All redirect stubs Phase 6 wrote have offline copies under
<offlineRoot>/<destPath>. - All static files (minus
offline_excludematches) have offline copies under<offlineRoot>/<destRel>. - Theme assets (CSS / JS / SVG) are present under
<offlineRoot>/assets/. <offlineRoot>/assets/js/search-data.jsexists; its contents parse as JavaScript and setwindow.SEARCH_DATA.<offlineRoot>/CNAME,<offlineRoot>/sitemap.xml,<offlineRoot>/robots.txt,<offlineRoot>/book.htmlare absent (peroffline_exclude).
-
HTML page parity:
- For 10 spot-checked pages (a mix of top-level, deep,
redirect-target, and space-in-permalink): file content matches
Jekyll offlinify's
_site-offline/<destPath>byte-for-byte (or within an accepted-divergences allowance for documented diffs). - Every absolute URL in the original (e.g.
/assets/css/foo.css) is replaced with the corresponding page-relative URL. - Every page-relative URL gains the correct
.html//index.htmlsuffix. - The SEO block is stripped; only
<title>remains from inside it. - Two
<script>tags (OFFLINE_SITE_ROOT+search-data.js) are injected before the just-the-docs.js script tag. - The page parses as well-formed HTML.
- For 10 spot-checked pages (a mix of top-level, deep,
redirect-target, and space-in-permalink): file content matches
Jekyll offlinify's
-
CSS file parity:
just-the-docs-combined.css'surl("/favicon.png")rewritten tourl("../../favicon.png").- Other CSS files copied verbatim.
-
Redirect stub parity:
- All ~290 stubs present.
- For 5 spot-checked stubs: the four
https://docs.twinbasic.com/<path>URLs each rewritten to the same page-relative form. - Stubs whose target was unresolvable (none on the current site) would have the absolute URL preserved.
-
just-the-docs.js patch:
<offlineRoot>/assets/js/just-the-docs.jscontains the replacementnavLink()body (querieslinks[i].href/window.location.href).- It contains the replacement
initSearch()body (readswindow.SEARCH_DATA, rewritesdoc.url). - The patch summary log line includes both
"navLink()"and"initSearch()".
-
search-data.js generation:
<offlineRoot>/assets/js/search-data.jsopens withwindow.SEARCH_DATA = {.- The JSON bytes match
<offlineRoot>/assets/js/search-data.json(modulo the wrapper) byte-for-byte. - Load
<offlineRoot>/index.htmlin a real browser viafile://; the search box returns results.
-
Cross-substep / functional checks:
<offlineRoot>/has zero survivinghref="/...",src="/...", orurl(/...)references (after subtracting intentionally-allowedhttps://...external links).<offlineRoot>/has zero survivinghttps://docs.twinbasic.comreferences (the SEO strip + the redirect-stub rewrite together account for all known sources).- Run
scripts/check_links.mjs --offline --include-fragments --index-files "index.html" --forbid 'https://docs.twinbasic.com' --root-dir "<offlineRoot>" "<offlineRoot>"(the same invocationcheck.batuses); zero broken links, zero forbidden links.
-
Performance check:
- Total Phase 7 wall time under 800 ms on the dev machine. Soft
cap: 1500 ms (
verify-phase7.mjswarns if exceeded). - HTML pass (~837 pages) dominates; CSS / redirects / statics / patching are minor contributors.
- Total Phase 7 wall time under 800 ms on the dev machine. Soft
cap: 1500 ms (
verify-phase7.mjs (~350 lines), extends the verify-phase6.mjs
pattern. It:
- Runs
discover()throughwriteOffline()(Phases 1-7) into scratch destinations (docs/_site-verify/+docs/_site-verify-offline/). - Runs Phase 7 with timing capture.
- Asserts the items above. Where Jekyll offlinify output exists (in
docs/_site-offline/), diff against it as the parity reference. Where it doesn't, assert structural properties. - For the JS-patch case: read the patched
<offlineRoot>/assets/js/just-the-docs.js, regex-grep for the replacement function signatures, assert both patches landed. - For the absolute-URL-survival case: walk every
.htmlin<offlineRoot>/, grep forhref="/andsrc="/andurl(/; assert zero hits (modulo any intentional external links the spot-check whitelists). - For the search functionality: spawn a headless browser (Puppeteer
or Playwright) pointed at
file://<offlineRoot>/index.html, type into the search box, assert results show. OR (cheaper) skip the browser test and verify the search-data.js content matches by pattern; the cross-build offline link check is a stronger functional signal. - Prints
OK <check>/FAIL: <reason>per check, per-substep timings up front,WARNif total Phase 7 wall-time exceeds 1500 ms. - Cleans up
docs/_site-verify/+docs/_site-verify-offline/and exits non-zero on any failure.
Total checks: projected ~30 (8 base structure + 5 HTML byte parity + 3 CSS + 3 redirects + 2 JS patch + 3 search-data + 5 cross-substep + 1 perf line).
| Output | Target | Notes |
|---|---|---|
Each HTML page in <offlineRoot>/ |
byte-identical to Jekyll offlinify | SEO strip + URL rewrite + script injection all deterministic. The only known divergence source is the SEO version string (v2.8.0 in Phase 4 vs whatever jekyll-seo-tag emits) -- both pass through the strip identically. |
Each CSS file in <offlineRoot>/assets/css/ |
byte-identical | url() rewrites match by LCP walk. |
| Each redirect stub | byte-identical | Four occurrences replaced by the same rewrite. |
Patched just-the-docs.js |
byte-identical | Replacement function bodies match Jekyll's offlinify constants character-for-character. |
search-data.js |
byte-identical wrap of search-data.json | Single-line window.SEARCH_DATA = ${json}; wrapper. |
| Verbatim static copies | byte-identical | fs.copyFile from the same source. |
node builder/tbdocs.mjs # one-line per-phase timings
cd builder && node verify-phase7.mjs # ~30-check harness + timingsMeasured wall time on the dev machine (Windows 10, three runs averaged
of the verify harness which captures Phase 7 as a single t.lap()):
| Substep | Target | Measured | Notes |
|---|---|---|---|
| Phase 7 total | <800 ms | 870-1090 ms | Above the 800 ms target, well under the 1500 ms soft cap. |
| Phase 7 soft cap | 1500 ms | -- | verify-phase7.mjs exits non-zero if exceeded. |
Per-substep timings are not separately captured in the first cut --
adding the --profile-offline instrumentation parallel to Jekyll's
tick(:time_*) accumulators is one of the deferred follow-ups (see
§13). The HTML pass is the dominant cost on inspection: ~6,500 regex
matches per page × ~837 pages, ~5.4M matches total, with the
per-match work bound by V8's regex engine + Map lookups.
Findings on the measured numbers:
- The HTML pass landed roughly in the projected 100-500 ms range (extrapolating from the total minus the ~60 ms setup + ~200 ms static-copy + ~100 ms theme-asset + ~50 ms redirect work). The honesty-note caveat from the first draft of this section (the projection was a guess at V8-vs-MRI scaling) held up: the actual number is in the projected band, not blowing past it.
- The nav-block caching deferral (§7.D7) stayed deferred -- the HTML pass never crossed the 800 ms in-scope threshold. The cache would still save ~200 ms, but the added complexity isn't justified at the current measured baseline.
- Phase 7's ~870-1090 ms runs roughly 2.4-3× faster than Jekyll offlinify's 2.65 s with-nav-cache baseline on the same machine. The saving comes from collapsing the per-page hook plumbing, the simpler SEO strip regex, and Node's libuv I/O rather than MRI's per-syscall GIL release. Less dramatic than the originally projected 5× but squarely in the "ship and move on" zone -- not worth a second optimisation pass before later phases land.
Cumulative dependencies after Phase 7:
{
"dependencies": {
"gray-matter": "^4.0",
"fast-glob": "^3.3",
"js-yaml": "^4.1",
"markdown-it": "^14.0",
"markdown-it-attrs": "^4.0",
"shiki": "^1.0"
}
}New in Phase 7: nothing. The implementation uses only Node
stdlib (node:fs, node:path) plus the already-imported helpers
from write.mjs (mkdirRec, runLimited, writeFileMkdirp,
WRITE_LIMIT).
No regex engine beyond the V8 built-in. No glob library beyond §6.5's
inline fnmatchPathname. No HTML parser beyond the single combined
regex (§6.6) -- the same trade Jekyll offlinify makes.
The lunr dependency from PLAN.md's list remains unused; Phase 7
doesn't compile the index (the JSON is wrapped, not parsed).
<repo root>/
builder/
PLAN.md — architecture overview (Phase 7 status updated to "shipped" after landing)
PLAN-1.md — Phase 1 spec (shipped)
PLAN-2.md — Phase 2 spec (shipped)
PLAN-3.md — Phase 3 spec (shipped)
PLAN-4.md — Phase 4 spec (shipped)
PLAN-5.md — Phase 5 spec (shipped)
PLAN-6.md — Phase 6 spec (shipped)
PLAN-7.md — this file (Phase 7 spec, shipped)
FUTURE-WORK.md — append: nav-block cache deferral (§7.D7), --no-offline opt-out (§7.D14), --profile-offline (§10)
package.json — unchanged (no new deps)
discover.mjs — Phase 1
nav.mjs — Phase 2 nav
seo.mjs — Phase 2 SEO
book.mjs — Phase 2 book loader
build-info.mjs — Phase 2 build-info
render.mjs — Phase 3
highlight.mjs — Phase 3 highlight
template.mjs — Phase 4
compress.mjs — Phase 4 compress
write.mjs — Phase 5 (re-exports mkdirRec, runLimited, writeFileMkdirp, WRITE_LIMIT, safeWrite, isUnderProject for Phase 7)
paths.mjs — Phase 6 paths helper
redirects.mjs — Phase 6 (signature extended: returns { written, stubs })
sitemap.mjs — Phase 6 sitemap
search.mjs — Phase 6 (signature extended: returns { entries, json })
offline.mjs — NEW: writeOffline + buildOfflineState + deriveOffline{Page,Redirect,Css,JtdJs,SearchDataJs}
accepted-divergences.mjs — unchanged (the 8 propagated-accepted pages are pre-existing Phase 3/4 divergences)
tbdocs.mjs — orchestrator extended (writeOffline call after auxiliaries + summary line)
verify-phase1.mjs — Phase 1 harness (retired Phase 10)
verify-phase2.mjs — Phase 2 harness (retired Phase 10)
verify-phase3.mjs — Phase 3 harness (retired Phase 10)
verify-phase4.mjs — Phase 4 harness (retired Phase 10)
verify-phase5.mjs — Phase 5 harness (retired Phase 10)
verify-phase6.mjs — Phase 6 harness (retired Phase 10)
verify-phase7.mjs — NEW: §10 acceptance harness (30+ checks) (retired Phase 10)
_diff.mjs — first-divergence single-page diff (extended: --offline=, --offline-redirect=, --offline-css=, --offline-jtd, --offline-search)
_diff_all.mjs — per-bucket divergence audit (unchanged)
_triage.mjs — extended: auditOffline{Pages,Redirects,Css,Jtd,Search} with propagated-vs-offline-only classification
_sitemap_diff.mjs — unchanged
_spot.mjs — single-page output dump (unchanged)
# _offline_diff.mjs — NOT created; subsumed by _diff.mjs / _triage.mjs extensions (see §12.1)
docs/ — unchanged
WIP.md — extended: "Builder diff / triage / verify tools" subsection introducing the tools to future sessions
docs/.gitignore — extended: added _site-new-offline/
Phase 7 adds one substantive call to the orchestrator, plus a small extension to the Phase 6 fan-out to capture the in-memory bytes Phase 7 consumes:
import { writeOffline } from "./offline.mjs";
// ... existing main() body up through auxiliaries ...
let auxStats = null;
if (!dryRun) {
const [redirectStats, sitemapStats, searchStats] = await Promise.all([
writeRedirects(pages, site, destRoot),
writeSitemap(pages, site, destRoot),
writeSearchData(pages, site, destRoot),
]);
auxStats = { redirects: redirectStats, sitemap: sitemapStats, search: searchStats };
}
t.lap("auxiliaries");
let offlineStats = null;
if (!dryRun) {
offlineStats = await writeOffline(pages, staticFiles, site, destRoot, { auxStats });
}
t.lap("offline");
console.log(`Phase 1+2+3+4+5+6+7 done: ${pages.length} pages, ${staticFiles.length} static files`);
console.log(` wrote: ${writeStats.pages.written} pages (${writeStats.pages.skipped} skipped), ` +
`${writeStats.theme.copied} theme assets, ${writeStats.staticFiles.copied} static files ` +
`-> ${destRoot}`);
if (auxStats) {
console.log(` aux: ${auxStats.redirects.written} redirect stubs, ` +
`${auxStats.sitemap.entries} sitemap entries, ` +
`${auxStats.search.entries} search-index entries`);
}
if (offlineStats) {
console.log(` offline: ${offlineStats.html} HTML, ${offlineStats.css} CSS, ` +
`${offlineStats.redirects} redirect stubs, ` +
`${offlineStats.statics + offlineStats.assets} assets, ` +
`${offlineStats.excluded} excluded ` +
`(${offlineStats.unresolved} unresolved) -> ${destRoot}-offline`);
}
console.log(t.summary());--dry-run semantics: Phase 7 is guarded by if (!dryRun) matching
Phase 6's pattern. The dry-run path skips all writes; compute work
(buildSitePaths, plan derivation) could be split out for representative
timing if profiling demands.
Two substep return-shapes extend (PLAN-6 §3 → PLAN-7's §7.D8):
// builder/search.mjs
export async function writeSearchData(pages, site, destRoot) {
const entries = deriveSearchEntries(pages, site);
const body = entries.map(renderEntryString).join(",");
const json = `{` + body + `\n}\n`;
await writeFileMkdirp(path.join(destRoot, "assets/js/search-data.json"), json);
return { entries: entries.length, json }; // ← + json
}
// builder/redirects.mjs
export async function writeRedirects(pages, site, destRoot) {
const stubs = deriveRedirectStubs(pages, site);
await runLimited(stubs, WRITE_LIMIT, async (s) => {
await writeFileMkdirp(path.join(destRoot, s.destPath), s.html);
});
return { written: stubs.length, stubs }; // ← + stubs
}Zero behaviour change to existing consumers (verify-phase6 reads
entries / written only); the new fields are additive.
Promoted to module-level exports during Phase 7 landing:
safeWrite (the path-stamping error helper) and isUnderProject
(the wipe-safety guard). One-line changes; verify-phase5 stayed
green.
The plan's original §12.1 called for a standalone _offline_diff.mjs
CLI for the per-file walk. As Phase 7 shipped, we instead extended
the existing _diff.mjs and _triage.mjs with offline-aware modes,
because those tools already have the discover → render → template
plumbing wired up and (with the derive* helpers from §I) can derive
expected offline bytes in-memory for any single target. One workflow,
one set of conventions, no parallel tree-walker.
The _offline_diff.mjs standalone script was not created.
_diff.mjs new modes (see _diff.mjs for the full
implementation):
| Mode | Compares |
|---|---|
--offline=<srcRel> |
Derived offline HTML vs _site-offline/<destPath>. |
--offline-redirect=<fromPath> |
Derived offline redirect stub vs _site-offline/<destPath>. |
--offline-css=<themeRel> |
Derived offline CSS (url() rewrite) vs _site-offline/<themeRel>. |
--offline-jtd |
Derived patched just-the-docs.js vs _site-offline/assets/js/just-the-docs.js. |
--offline-search |
Derived search-data.js wrap vs _site-offline/assets/js/search-data.js. |
Each mode prints MATCH or DIFFER + first divergence offset + ~200
chars of context, matching the existing _diff.mjs convention. The
diff tools resolve URLs against Jekyll's _site/assets/ (not the
verify-tree's) so the derive runs against the same source the Jekyll
offline tree was built from -- otherwise a Phase 5 theme-asset
divergence in builder/assets/ would show up as a false-positive
Phase 7 divergence.
_triage.mjs new audit functions (added after the existing
sitemap / redirects / robots / search auxiliary audits):
auditOfflinePages-- walks every page throughderiveOfflinePage, classifies divergences as propagated-accepted (page inACCEPTED_DIVERGENCE_PATHS; Phase 3/4 divergence flowing through), propagated-unaccepted (online _site/ also differs but not in the acceptance list -- Phase 3/4 bug propagating to Phase 7), or offline-only (online matches but offline doesn't -- Phase 7- specific bug). Offline-only divergences are further bucketed by the nature of the first diff:strip-seo,href-src-rewrite,script-inject,css-url-rewrite,other.auditOfflineRedirects-- per-stub byte compare.auditOfflineCss-- per-CSS byte compare.auditOfflineJtd-- single-file byte compare with patch sanity checks.auditOfflineSearch-- search-data.js byte compare, with smart detection of "the divergence is just a wrap of the already-audited search-data.json divergence" → reported as MATCH-with-note.
Each prints one summary line (MATCH or DIFFER + counts), so a clean build shows a 4-line block:
Offline pages: MATCH (829 match, 8 accepted)
Offline redirects: MATCH (290 stubs)
Offline CSS: MATCH (7 files)
Offline JTD JS: MATCH (2/2 patches: navLink(), initSearch())
Offline search-data.js: MATCH (propagates from search-data.json -- see Search index above)
When a divergence surfaces, the _triage.mjs line surfaces the
bucket counts; _diff.mjs --offline=<srcRel> is the followup to
inspect a representative file.
The convention is documented in WIP.md's "Builder diff / triage / verify tools" subsection so future sessions discover these without having to grep the source.
The offline tree at <destRoot>-offline/ is functionally complete
after Phase 7:
- Every page is reachable from every other page under
file://without any URL stays absolute. - Every redirect navigates locally (no live-site detour).
- The sidebar nav highlights the current page correctly under
file://(the patchednavLink()runs). - The search box returns results under
file://(the patchedinitSearch()readswindow.SEARCH_DATAfrom the preloadedsearch-data.js). - All static assets (images, CSS, JS, vendor) load from the
page-relative path the browser resolves correctly under
file://. - The CI link check (
check.batagainst_site-offline/) passes with--forbid 'https://docs.twinbasic.com'-- zero surviving live-site links. - The GitHub release workflow can package
<destRoot>-offline/astwinbasic-docs-offline.zipand attach to a release (per.github/workflows/jekyll-gh-pages.yml).
The next session can implement Phase 8 (book.mjs renderer half +
pdf.mjs), which takes pages[] (Phase 1) + bookData (Phase 2) +
page.renderedContent (Phase 3) directly and produces the
_site-pdf/ tree. Phase 8 doesn't depend on Phase 7's outputs.
That clean handoff is the point of having an offline phase as a
standalone step -- it consumes only _site/ (Phases 5+6) and produces
only _site-offline/, with no entanglement to the PDF tree.
Five Phase 7 follow-ups have been moved to
FUTURE-WORK.md §B7-B11: per-source-dir nav-block
cache, --no-offline opt-out, --profile-offline instrumentation,
search-data minification, and an AST-based JTD JS patcher. Each
entry lists its trigger condition; none block any current work.