The build pipeline lives in builder/. The orchestrator is
tbdocs.mjs's runBuild(), which today is a mostly-linear sequence of
awaited async calls on the main thread, with a sprinkling of
cooperative concurrency (Promise.all barriers and one background
buildInfoPromise). There are no worker threads; every CPU-bound
phase blocks the main thread.
An earlier round of parallelization work (a CheckPool of persistent
link-checker workers and a bespoke render-worker.mjs driven by a
hand-coded message protocol) was reverted; this plan replaces both
with one task-graph abstraction.
| File | Role |
|---|---|
tbdocs.mjs |
Orchestrator: runBuild(), CLI parsing, summary output |
template.mjs |
templatePhase(), internal buildInit() |
render.mjs |
renderPhase(), createMarkdownIt(), buildLinkTables() |
highlight.mjs |
initHighlighter() -- Shiki WASM init |
discover.mjs |
discover() -- walk source tree, parse frontmatter |
nav.mjs |
computeNav() -- build nav tree from pages |
seo.mjs |
precomputeSeo() -- derive SEO titles/URLs |
book.mjs |
resolveBookChapters() -- resolve book chapter list |
build-info.mjs |
captureBuildInfo() -- git rev-parse/log |
scss.mjs |
compileScss() -- sass compilation |
mermaid.mjs |
regenerateMermaid() -- stale SVG regen via puppeteer |
data.mjs |
loadData() -- load _book.yml |
write.mjs |
writePhase() -- write pages + static files to _site/ |
redirects.mjs |
deriveRedirectStubs(), writeRedirects() |
sitemap.mjs |
deriveSitemapUrls(), writeSitemap() |
search.mjs |
writeSearchData() |
offline.mjs |
writeOffline() -- produce _site-offline/ |
pdf.mjs |
writePdf() -- produce _site-pdf/ |
serve.mjs |
runServe() -- dev server with watcher + rebuild |
There is no render-worker.mjs, no cpu-worker.mjs, no CheckPool,
no createRenderPool(). The build is single-threaded except for I/O.
runBuild() reads top-to-bottom; the only off-main-thread work is the
git shell-outs inside captureBuildInfo() (launched as a background
promise and awaited later). Approximate wall-clock numbers from a
recent clean build are noted in parentheses:
mermaid (~2 ms; ~150 ms when SVGs regenerate)
↓
scss (~700 ms — CPU-bound on main thread)
↓
load _config.yml + apply CLI overrides
↓
buildInfoPromise = captureBuildInfo() (background: git shell-outs)
↓
discover (~135 ms — fs traversal + frontmatter parse)
↓
nav (~8 ms)
↓
initHighlighter (~50–100 ms — Shiki WASM init; overlaps with buildInfo)
↓
buildLinkTables + createMarkdownIt + precomputeSeo + loadData
+ resolveBookChapters (~110 ms together — "markdown-init" + "seo" + "book")
↓
await buildInfoPromise (~0 ms; usually already settled)
↓
renderPhase (~2700 ms — CPU-bound; cooperative Promise.all over pages)
↓
templatePhase (~800 ms — CPU-bound; same shape)
↓
writePhase
├─ Promise.all { writePages | copyTheme | copyStaticFiles }
└─ writeGeneratedAssets (~625 ms total)
↓
Promise.all { writeRedirects, writeSitemap, writeSearchData } (~200 ms)
↓
writeOffline (~1100 ms)
↓
writePdf (~240 ms)
Wall-clock total is roughly 6.7 seconds. The dominant terms are
render (~2.7 s), writeOffline (~1.1 s), template (~0.8 s),
scss (~0.7 s), and write (~0.6 s).
Visible idle time on the main thread:
scssandmermaidrun beforediscover, both serially. Neither depends ondiscover's output.buildInfo's git shell-outs already overlap with the discover/nav/markdown-init/seo chain, but everything else in that chain runs serially even thoughdiscover→nav→markdown-init→seo→bookis the only true dependency edge.writeOfflineandwritePdfare independent of each other; both read_site/(already written bywritePhase) and write into independent output trees. They run sequentially today.renderPhaseandtemplatePhaseare CPU-bound and block the main thread completely.Promise.all(pages.map(...))is cooperative concurrency only -- it interleaves on a single thread.
The two large structures that flow through the pipeline:
pages[] -- array of ~857 page objects. After discover, each has:
{ srcPath, srcRel, ext, frontmatter, rawContent, permalink, destPath,
layoutDefault, imageScope }
Later phases mutate in place, adding: navPath, breadcrumbs,
children, navLevels (nav); seoTitle, seoFullTitle,
seoCanonical, seoIsHome (seo); renderedContent (render);
html (template). The current pipeline relies on this in-place
enrichment -- every consumer assumes the same page object accumulates
fields as it flows through phases.
staticFiles[] -- array of ~214 static file descriptors:
{ srcPath, srcRel, destRel, size }
config -- the parsed _config.yml object. Small, ~30 keys.
Read-only after initial CLI override merges.
navTree -- array of NavNode objects (recursive tree). ~857
nodes total. Only consumed by buildInit() → renderSidebar().
The mutation-in-place pattern matters for the scheduler design: mutations performed on a worker's structured-clone copy do not reach the main-thread master unless explicitly merged. See §Page deltas below.
No new runtime dependencies. The scheduler, the worker pool, and the
worker dispatcher are all in-tree code -- ~150 LOC for the scheduler,
~50 LOC for WorkerPool, ~30 LOC for the worker dispatcher and its
handler table. A general-purpose pool library like piscina was
considered but the project's use is narrow enough (fixed pool size,
one task per worker at a time, no recycling, no dynamic scaling, no
abort signals) that the dependency cost outweighs the saved code, and
an added dep widens the supply-chain attack surface.
The build is a DAG of tasks. Each task has a unique string ID,
takes an input map { [predecessorId]: output }, produces an immutable
output, and declares which downstream tasks receive (slices of) that
output.
The scheduler lives on the main thread. It tracks task
dependencies, decides what's ready, and dispatches. The worker pool
(WorkerPool -- a ~50 LOC in-tree class wrapping
node:worker_threads) handles everything below the task-graph layer:
spawning workers at construction, named dispatch, idle/busy
bookkeeping, lifecycle.
Each task carries a runOnMain: true flag if it must execute on the
main thread -- for tasks that own the master pages[] merge, mutate
state in place, or do I/O that coordinates with main-thread state.
All other tasks run on a worker, dispatched by name.
┌─────────────────────────────────────────────────┐
│ Main thread (scheduler) │
│ │
│ tasks: Map<taskId, TaskDef> │
│ pending: Map<taskId, {expected, received}> │
│ ready: TaskDef[] │
│ results: Map<taskId, output> │
│ state: SharedState (§Shared state) │
│ │
│ on task complete: │
│ store result, run task.submit() to route │
│ output to downstream tasks, check newly │
│ ready, dispatch each to pool or main │
└────────────┬────────────────────────────────────┘
│
▼ WorkerPool ── named handlers in cpu-worker.mjs
Mutating a worker's local pages[] doesn't reach state.pages unless
the mutation is explicitly shipped back as a delta. The current code
mutates pages in place across nav / seo / render / template, so every
mutating step must be modeled deliberately. The cheap way out is:
keep small mutating steps on the main thread and only ship pages
across the boundary when there's a real CPU win to amortize the copy.
The split:
| Task | Placement | Why |
|---|---|---|
config |
M | Trivial fs read, no benefit to round-trip |
discover |
M | ~135 ms; output mutates pages[] in place |
nav |
M | ~8 ms; mutates pages with navPath/navLevels/breadcrumbs/children |
markdownInit |
M | ~63 ms; produces an md instance (NOT serializable -- can't cross boundary) |
seo |
M | ~34 ms; mutates pages with seoXxx fields |
loadData |
M | Trivial fs read |
resolveBookChapters |
M | Mutates state.site.bookData._chapters with refs into state.pages -- identity-critical |
buildInit |
M | Tiny; consumes navTree, produces ~50 KB of html strings |
deriveRedirects |
M | Pure compute, ~ms |
deriveSitemap |
M | Pure compute, ~ms |
dispatch |
M | Slices state.pages into render chunks; no benefit on a worker |
renderJoin |
M | Pure barrier |
write |
M | Owns state.pages + state.staticFiles reads; I/O dominated |
searchData |
M | ~40 ms; owns pages read |
writeAux |
M | Owns pages + bookData reads |
writeOffline |
M | I/O dominated (~1.1 s); see §Post-write tasks for the workerize-or-not call |
writePdf |
M | I/O dominated (~240 ms); ditto |
buildInfo |
W | Free overlap with the main spine |
scss |
W | ~700 ms -- the biggest seed-task parallelism win |
mermaid |
W | ~2 ms idle, ~150 ms when SVGs regen; runs concurrently with discover |
render:i |
W | The big win -- ~2.7 s of CPU work fans out across N cores |
The worker side ships with four handlers: scss, mermaid,
buildInfo, render. Plus the parentPort dispatcher (~15 LOC).
Everything else is plain main-thread code wrapped in a task envelope.
This keeps pages[] from crossing the worker boundary except for the
render fan-out (which only ships per-page slices, not the master).
The seo / nav / markdown-init mutations stay on the main-thread
master directly -- no delta merge needed.
[W] = pool worker; [M] = main thread (runOnMain: true).
Seeds (concurrent):
buildInfo [W] ──────────────────────────────────────────┐
scss [W] ──────────────────────────────────────────┤
mermaid [W] ──────────────────────────────────────────┤
prepDest [M] ──────────────────────────────────────────┤
│
Main spine (sequential, on M): │
config │
└─→ discover ──┬─→ deriveRedirects ─────────────────┐ │
├─→ deriveSitemap ─────────────────┤ │
└─→ nav │ │
└─→ markdownInit │ │
├─→ seo │ │
│ └─→ loadData │ │
│ └─→ resolveBookChapters
│ ↓ │ │
└─→ buildInit │ │ │
↓ │ │ │
└──────────┴─→ dispatch ◄── buildInfo, mermaid, deriveRedirects join here
│
Render fan-out (workers, concurrent): │
┌────────────────────────────────────────────────────┘
│
render:0 [W] render:1 [W] ... render:N-1 [W]
│
▼
renderJoin [M] ◄── waits for all render:i
│
Write fence: │ scss [W], mermaid [W], prepDest [M] join here too
▼
write [M] ◄── reads state.pages, state.staticFiles
│
(in parallel with write:) │
│
renderJoin + prepDest │
│ │
▼ │
searchData [M] │
│ │
└──────────────────────────────────┤
│
writeAux [M] ◄── derived redirects + sitemap join here too
│
▼
writeOffline [M]
(in parallel with write → ... → writeOffline:)
renderJoin + mermaid
│
▼
writePdf [M]
│ │
└─────────────┬─────────────┘
▼
done
Edges into dispatch: buildInit, resolveBookChapters,
buildInfo, mermaid, deriveRedirects.
Edges into write: renderJoin, scss, mermaid, prepDest.
Edges into searchData: renderJoin, prepDest.
Edges into writePdf: renderJoin, mermaid.
Edges into writeAux: write, searchData, deriveRedirects, deriveSitemap.
Three structural wins over the serial baseline:
scss,mermaid,buildInfooverlap with the main spine. The main spine (discover → nav → markdownInit → seo → loadData → resolveBookChapters + buildInit) takes ~250 ms total.scsstakes ~700 ms. The overlap saves ~250 ms ofscssfrom the critical path (not the full ~700 ms -- after the spine finishes, ~450 ms of scss is still on the critical path untilwriteruns).render:0..Nfans out across CPUs. Today's ~2.7 s of cooperative render + ~0.8 s template = ~3.5 s of CPU work, all on one thread. Across N workers this compresses to ~3500 / N + dispatch overhead. On a 4-core box, ~875 ms wall-clock (saving ~2.6 s). On an 8-core box, ~440 ms (saving ~3 s). Dispatch overhead is ~50 ms.writeOfflineandwritePdfoverlap on async I/O. Both stayrunOnMaininitially -- they share the main thread for their CPU sections butawait fs.writeFile-style I/O windows interleave. The gain is the shorter of their two CPU sections (~240 ms). See §Post-write tasks for the case to workerize one of them later.
Mermaid → staticFiles ordering. Under the scheduler mermaid and
discover run in parallel, so freshly-emitted SVGs aren't in
state.staticFiles after discover. The mermaid task's execute()
(running on a worker) does the full fs.stat for each managed SVG
and returns a list of { srcPath, srcRel, destRel, size } descriptors;
the submit() does only a synchronous push into
state.staticFiles. Putting the stat in submit() would race with
downstream consumers since submit is called synchronously by the
scheduler and cannot await.
The render fan-out is the only place where pages cross the worker boundary. The pattern:
- Each
render:itask receives a chunk of pages (worker's clone). - The worker mutates its local pages with
renderedContent,html, and (when!skipOffline)offlineHtml+offlineMisses. - The task returns a delta: an array of
[{ destPath, renderedContent, html, offlineHtml, offlineMisses }]-- only the changed fields, keyed bydestPath. render:i.submit()walks the delta on the main thread, looks up each page viastate.pageByDest, and assigns the fields onto the master page object.
The full pages array never crosses back across the boundary; only the
output deltas do. state.pageByDest is built once in
discover.submit():
discover.submit(out, emit, state) {
state.pages = out.pages;
state.staticFiles = out.staticFiles;
state.site.config = out.config;
for (const p of out.pages) state.pageByDest.set(p.destPath, p);
emit("nav", out);
emit("deriveRedirects", out);
emit("deriveSitemap", out);
}After this initial assignment, state.pages is mutated in place;
no task ever replaces it. state.pageByDest stays valid for the
whole build.
For tasks that run on [M] (nav, seo, etc.), the mutation is direct
on state.pages -- no delta needed.
Each task is a plain object:
expected: array of predecessor task IDs. The scheduler runs the task only when every expected ID has submitted its output. An empty array means a seed task (dispatchable immediately).handler(optional, worker tasks): the worker dispatcher's named handler. Defaults to the task's own ID. Used so multiple task IDs can share one worker function (e.g.render:0,render:1, ... →"render").runOnMain: true(optional): execute on the main thread instead of dispatching to the pool. Theexecute()function receives(inputs, ctx, state)-- wherestateis theSharedStateinstance -- and may mutate it.execute(inputs, ctx [, state]): runs the task body. On a worker, runs as the dispatch table's named handler. On main, runs synchronously through the scheduler. Returns an output value.submit(output, emit [, state, scheduler]): runs synchronously on the main thread afterexecuteresolves. Callsemit(targetTaskId, dataSlice)to route (slices of) the output to downstream tasks. May mutatestate. May not perform async work -- see §Scheduler core. The optionalschedulerarg is used only by tasks that dynamically register downstream tasks (seedispatch).
Representative task defs:
const TASKS = {
config: {
expected: [],
runOnMain: true,
async execute(_, ctx) {
const text = await fs.readFile(
path.join(ctx.srcRoot, "_config.yml"), "utf8");
const config = yaml.load(text);
if (ctx.opts.baseurl != null) config.baseurl = ctx.opts.baseurl;
if (ctx.opts.url != null) config.url = ctx.opts.url;
return { config };
},
submit(out, emit) { emit("discover", out); },
},
buildInfo: {
expected: [],
async execute() { return { buildInfo: await captureBuildInfo() }; },
submit(out, emit) { emit("dispatch", out); },
},
scss: {
expected: [],
async execute(_, ctx) { return { scssResult: await compileScss(ctx.srcRoot) }; },
submit(out, emit) { emit("write", out); },
},
mermaid: {
expected: [],
async execute(_, ctx) {
// The worker stats every managed SVG and returns full descriptors.
// Stat-in-submit on main would race with downstream readers.
const stats = await regenerateMermaid(ctx.srcRoot);
// stats.svgFiles: [{ srcPath, srcRel, destRel, size }, ...]
return { mermaidStats: stats };
},
submit(out, emit, state) {
const known = new Set(state.staticFiles.map((f) => f.srcRel));
for (const f of out.mermaidStats.svgFiles ?? []) {
if (!known.has(f.srcRel)) state.staticFiles.push(f);
}
emit("write", out);
emit("dispatch", out);
},
},
discover: {
expected: ["config"],
runOnMain: true,
async execute({ config: { config } }, ctx) {
const { pages, staticFiles } = await discover(
ctx.srcRoot, config.exclude ?? []);
return { pages, staticFiles, config };
},
submit(out, emit, state) {
state.pages = out.pages;
state.staticFiles = out.staticFiles;
state.site.config = out.config;
for (const p of out.pages) state.pageByDest.set(p.destPath, p);
emit("nav", out);
emit("deriveRedirects", out);
emit("deriveSitemap", out);
},
},
nav: {
expected: ["discover"],
runOnMain: true,
execute(_, ctx, state) {
const { navTree } = computeNav(state.pages, state.site.config);
state.site.navTree = navTree;
return {}; // mutates state in place
},
submit(_, emit) {
emit("markdownInit", {});
emit("buildInit", {});
},
},
buildInit: {
expected: ["nav"],
runOnMain: true,
execute(_, ctx, state) {
// buildInit() takes site.config + site.navTree; returns the
// ~50 KB of pre-rendered sidebar + header + svg-sprite HTML used
// by templatePhase.
return { initData: buildInitFn(state.site) };
},
submit(out, emit) { emit("dispatch", out); },
},
markdownInit: {
expected: ["nav"],
runOnMain: true,
async execute(_, ctx, state) {
// Main's own initHighlighter cache -- workers maintain theirs
// independently. Both call paths converge on the same Shiki
// initialisation work, but the singletons are per-thread.
const highlighter = await initHighlighter();
const linkTables = buildLinkTables(state.pages);
const baseurl = String(state.site.config.baseurl || "");
const staticFileSet = new Set(state.staticFiles.map(s => s.srcRel));
state.site.highlighter = highlighter; // write reads .themeCss from here
state.site.markdown = createMarkdownIt({
highlighter, linkTables, baseurl, staticFiles: staticFileSet,
});
// linkTables travels to render workers as a serialized payload.
state.site.linkTablesSerialized = serializeLinkTables(linkTables);
return {};
},
submit(_, emit) {
emit("seo", {});
emit("loadData", {});
},
},
seo: {
expected: ["markdownInit"],
runOnMain: true,
execute(_, ctx, state) {
const { seoSiteTitle, seoLogoUrl } = precomputeSeo(
state.pages, state.site.config, state.site.markdown);
state.site.seoSiteTitle = seoSiteTitle;
state.site.seoLogoUrl = seoLogoUrl;
return {};
},
submit(_, emit) { emit("resolveBookChapters", {}); },
},
loadData: {
expected: ["markdownInit"],
runOnMain: true,
async execute(_, ctx, state) {
const data = await loadData(ctx.srcRoot);
state.site.data = data;
state.site.bookData = data.book ?? null;
return {};
},
submit(_, emit) { emit("resolveBookChapters", {}); },
},
resolveBookChapters: {
expected: ["seo", "loadData"],
runOnMain: true,
execute(_, ctx, state) {
// Mutates state.site.bookData with _chapters arrays whose
// entries are refs into state.pages. Identity-critical:
// render:i.submit() merges renderedContent into those same
// page objects, so writePdf later sees the rendered bodies
// via bookData._chapters[i].renderedContent.
resolveBookChapters(state.site.bookData, state.pages);
return {};
},
submit(_, emit) { emit("dispatch", {}); },
},
deriveRedirects: {
expected: ["discover"],
runOnMain: true,
execute(_, ctx, state) {
// redirects.mjs's deriveRedirectStubs uses a layout-based
// filter (layout !== "book-combined") rather than checking
// page.html, so the derive can run before template.
return { stubs: deriveRedirectStubs(state.pages, state.site) };
},
submit(out, emit) {
emit("writeAux", out);
emit("dispatch", out);
},
},
deriveSitemap: {
expected: ["discover"],
runOnMain: true,
execute(_, ctx, state) {
return { urls: deriveSitemapUrls(state.pages, state.site) };
},
submit(out, emit) { emit("writeAux", out); },
},
dispatch: {
expected: ["buildInit", "resolveBookChapters", "buildInfo", "mermaid", "deriveRedirects"],
runOnMain: true,
execute({ buildInit: { initData }, buildInfo: { buildInfo }, deriveRedirects: { stubs } }, ctx, state) {
// Read pages directly from state.pages -- main-thread access,
// no need to ship them through the input map.
const chunks = chunkPages(state.pages, ctx.workerCount);
const excludePatterns = state.site.config.offline_exclude ?? [];
const skipOffline = /* from config / CLI opts */ false;
const sitePaths = buildSitePathsSync(
state.pages, state.staticFiles, excludePatterns, stubs,
enumerateVendoredThemeAssets());
state.sitePaths = sitePaths;
const shared = {
siteData: {
config: state.site.config,
seoSiteTitle: state.site.seoSiteTitle,
seoLogoUrl: state.site.seoLogoUrl,
},
initData, buildInfo,
linkTablesData: state.site.linkTablesSerialized,
staticFilesArr: state.staticFiles.map(f => f.srcRel),
baseurl: String(state.site.config.baseurl || ""),
sitePathsArr: [...sitePaths],
offlineExcludePatterns: excludePatterns,
skipOffline: Boolean(skipOffline),
};
// Pack the shared payload into a SharedArrayBuffer so each
// postMessage sends a SAB reference (shared memory) instead of
// structured-cloning ~310--330 KB per worker.
const sharedSAB = packShared(shared);
return { chunks, sharedSAB };
},
submit(out, emit, _state, scheduler) {
const N = out.chunks.length;
// Register the barrier with the dynamic predecessor count.
// write declares "renderJoin" statically; emit() looks up
// pending entries by id, not by source, so the static
// declaration is satisfied as soon as renderJoin submits.
scheduler.register("renderJoin", {
expected: Array.from({ length: N }, (_, i) => `render:${i}`),
runOnMain: true,
execute() { return {}; },
submit(_, emit) { emit("write", {}); },
});
for (let i = 0; i < N; i++) {
const id = `render:${i}`;
scheduler.register(id, {
expected: [],
handler: "render",
submit(renderOut, emit, state) {
for (const r of renderOut) {
const p = state.pageByDest.get(r.destPath);
if (!p) continue;
p.renderedContent = r.renderedContent;
if (r.html !== undefined) p.html = r.html;
if (r.offlineHtml !== undefined) p.offlineHtml = r.offlineHtml;
if (r.offlineMisses !== undefined) p.offlineMisses = r.offlineMisses;
}
emit("renderJoin", renderOut);
},
});
scheduler.seed(id, {
sharedSAB: out.sharedSAB,
chunk: out.chunks[i],
});
}
},
},
};chunkPages rounds up to keep all chunks non-empty when there are
fewer pages than workers (e.g. dry-run paths or future incremental
builds):
function chunkPages(pages, workers) {
const n = Math.min(workers, pages.length); // never more chunks than pages
if (n === 0) return [];
const size = Math.ceil(pages.length / n);
const chunks = [];
for (let i = 0; i < pages.length; i += size) chunks.push(pages.slice(i, i + size));
return chunks;
}Two non-obvious bits in dispatch.submit:
- Dynamic registration.
dispatchdoesn't know N at definition time, so it callsscheduler.register(taskId, def)per chunk plus one forrenderJoin. - Why
renderJoinexists at all. Eachrender:i.submit()already emits into the page-deltas merge, and could emit directly towrite. Butwrite.expectedis declared statically with["renderJoin", "scss", "mermaid", "prepDest"]-- mutating it fromdispatch.submitto add the N dynamic render predecessors would be awkward. The barrier is the cleaner expression: register it once with the right count, let write keep its staticexpected.
class SharedState {
pages = []; // master copy; mutated in place by [M] tasks and by render delta merges
staticFiles = []; // master copy; mermaid.submit appends new SVG descriptors
site = {}; // config, navTree, seoSiteTitle, seoLogoUrl, bookData, data, markdown, ...
pageByDest = new Map(); // destPath → page; built once in discover.submit
}After the initial discover.submit assignment, state.pages is
never replaced -- only mutated in place. Every phase that adds
fields to pages does so on the same object identities, which is what
keeps bookData._chapters refs (set by resolveBookChapters)
pointing at the rendered pages by the time writePdf walks them.
Worker tasks receive structured-clone snapshots of whatever input they
need -- they cannot see the master and cannot mutate it. Their
submit() runs on the main thread, where it merges the worker's
output (a delta keyed by destPath for page mutations) into state.
This is the explicit form of what today's runBuild does implicitly
through closure mutation. Making it explicit lets serve.mjs re-use
the scheduler across rebuilds without leaking state, and gives
post-write tasks a clean read path.
The scheduler is a thin coordinator. The pool is constructed externally and passed in.
class Scheduler {
constructor({ pool, tasks }) {
this.pool = pool; // WorkerPool instance
this.tasks = new Map(Object.entries(tasks));
this.pending = new Map();
this.ready = [];
this.results = new Map();
this.timings = new Map();
this.state = new SharedState();
this.inFlight = 0;
[this._doneP, this._doneResolve, this._doneReject] = deferred();
for (const [id, def] of this.tasks) this._initPending(id, def);
}
_initPending(id, def) {
this.pending.set(id, { expected: def.expected.length, received: new Map() });
}
register(id, def) { this.tasks.set(id, def); this._initPending(id, def); }
// Seed a freshly-registered task directly (used by dispatch.submit
// to feed each render:i its chunk without going through emit()).
seed(id, inputs) {
const def = this.tasks.get(id);
this.pending.delete(id);
this.ready.push({ id, def, inputs });
this._flush();
}
emit(targetId, data, sourceId) {
const entry = this.pending.get(targetId);
if (!entry) throw new Error(`unknown or already-dispatched task: ${targetId}`);
entry.received.set(sourceId, data);
if (entry.received.size === entry.expected) {
this.pending.delete(targetId);
const def = this.tasks.get(targetId);
this.ready.push({ id: targetId, def, inputs: Object.fromEntries(entry.received) });
this._flush();
}
}
async start(ctx) {
this._ctx = ctx;
for (const [id, def] of this.tasks) {
if (def.expected.length === 0) this.ready.push({ id, def, inputs: {} });
}
this._flush();
return this._doneP;
}
_flush() {
while (this.ready.length > 0) this._run(this.ready.shift());
}
_run(task) {
const start = Date.now();
this.inFlight++;
const p = task.def.runOnMain
? Promise.resolve(task.def.execute(task.inputs, this._ctx, this.state))
: this.pool.run({ inputs: task.inputs, ctx: this._ctx },
{ name: task.def.handler ?? task.id });
p.then(
(output) => this._onDone(task, output, start),
(err) => this._onError(task, err),
);
}
_onDone(task, output, start) {
this.timings.set(task.id, { start, end: Date.now() });
this.results.set(task.id, output);
this.inFlight--;
// submit() is invoked synchronously. It must not return a Promise
// (or, if it does, must not race with the emits it makes). Async
// work belongs in execute().
task.def.submit(
output,
(tgt, data) => this.emit(tgt, data, task.id),
this.state,
this,
);
if (this.inFlight === 0 && this.ready.length === 0 && this.pending.size === 0) {
this._doneResolve(this.results);
}
}
_onError(task, err) {
this._doneReject(new Error(`task ${task.id} failed`, { cause: err }));
}
summary() {
return [...this.timings.entries()]
.sort((a, b) => a[1].start - b[1].start)
.map(([id, { start, end }]) => `${id}=${end - start}ms`)
.join(" ");
}
}
function deferred() {
let res, rej;
const p = new Promise((r1, r2) => { res = r1; rej = r2; });
return [p, res, rej];
}The WorkerPool instance is constructed by runBuild() and injected
into the scheduler; the scheduler never sees worker_threads
directly.
A minimal pool over node:worker_threads. One file, ~50 LOC. Spawns
size workers eagerly at construction (so WASM warmup overlaps with
seed-task work; see §Boot sequence), routes named tasks to whichever
worker is idle, queues the rest. No dynamic scaling, no recycling.
// builder/worker-pool.mjs
import { Worker } from "node:worker_threads";
export class WorkerPool {
constructor(size, workerUrl) {
this._workerUrl = workerUrl;
this._idle = []; // Worker[]
this._busy = new Map(); // Worker → { resolve, reject }
this._queue = []; // pending { message, transferList, resolve, reject }
this._workers = Array.from({ length: size }, () => this._spawn());
}
_spawn() {
const w = new Worker(this._workerUrl);
w.on("message", (msg) => {
const entry = this._busy.get(w);
if (!entry) return; // ignore late messages
this._busy.delete(w);
this._idle.push(w);
if (msg.error) entry.reject(Object.assign(new Error(msg.error), { stack: msg.stack }));
else entry.resolve(msg.result);
this._drain();
});
w.on("error", (err) => {
// Worker crash: reject the in-flight task. The dead worker
// stays in this._workers (won't respawn -- see §Worker death
// policy) so the pool degrades to size-1 for the rest of the
// run. For a one-shot build, the resulting task rejection
// aborts via the scheduler's _onError path.
const entry = this._busy.get(w);
if (entry) { this._busy.delete(w); entry.reject(err); }
});
this._idle.push(w);
return w;
}
run(payload, { name, transferList } = {}) {
return new Promise((resolve, reject) => {
this._queue.push({
message: { name, ...payload },
transferList,
resolve, reject,
});
this._drain();
});
}
_drain() {
while (this._queue.length && this._idle.length) {
const w = this._idle.shift();
const { message, transferList, resolve, reject } = this._queue.shift();
this._busy.set(w, { resolve, reject });
w.postMessage(message, transferList);
}
}
destroy() {
return Promise.all(this._workers.map(w => w.terminate()));
}
}What we explicitly do not support, vs. a general-purpose pool: dynamic resizing, per-worker concurrency above 1, worker recycling after N tasks, abort signals, task-priority queues, utilization histograms. Each is real complexity we don't need.
If a worker crashes mid-task, w.on("error") rejects the in-flight
task and removes it from _busy. The crashed worker is NOT
respawned; _workers[] still lists it for destroy() (terminate is
idempotent on a dead worker), but it never returns to _idle. The
pool effectively shrinks by one for the remainder of the run.
For a one-shot runBuild, the rejected task surfaces through the
scheduler's _onError → _doneP.reject and runBuild aborts. Fine
as-is.
For serve mode, a crash permanently degrades the long-lived pool.
The current policy is "tell the user to restart serve"; respawn-on-
error is a follow-up if it ever happens in practice.
new Worker(url) costs ~50-100 ms per worker. We spawn
os.availableParallelism() of them at construction; they spawn
concurrently but Node's process model adds contention -- realistically
~100-200 ms before any task can postMessage to a free worker. This
is a one-shot cost in runBuild; for serve mode it amortizes to
zero across rebuilds. It's worth noting against Phase 1's expected
savings (~250 ms scss overlap is partly eaten by the ~100-200 ms
worker boot).
Single file with named handlers in a dispatch table. The pool sends
{ name, ...payload }; the worker routes to the right handler and
posts back { result } or { error, stack }. Four handlers total
(scss, mermaid, buildInfo, render) plus the ~15 LOC dispatcher.
// builder/cpu-worker.mjs
import { parentPort } from "node:worker_threads";
import { initHighlighter } from "./highlight.mjs";
import { compileScss } from "./scss.mjs";
import { regenerateMermaid } from "./mermaid.mjs";
import { captureBuildInfo } from "./build-info.mjs";
import {
createMarkdownIt,
buildLinkTables,
renderPhase,
} from "./render.mjs";
import { templatePhase } from "./template.mjs";
import { unpackShared } from "./sab-broadcast.mjs";
// Start WASM init immediately, do NOT await. The module finishes
// loading synchronously so the parentPort.on('message') dispatcher is
// installed before the pool sends any work. Only the `render` handler
// awaits highlighterP.
const highlighterP = initHighlighter();
const handlers = {
async scss({ ctx }) {
return { scssResult: await compileScss(ctx.srcRoot) };
},
async mermaid({ ctx }) {
return { mermaidStats: await regenerateMermaid(ctx.srcRoot) };
},
async buildInfo() {
return { buildInfo: await captureBuildInfo() };
},
async render({ inputs }) {
const { sharedSAB, chunk } = inputs;
const { siteData, initData, linkTablesData, staticFilesArr,
baseurl, buildInfo } = unpackShared(sharedSAB);
const highlighter = await highlighterP;
const linkTables = reconstructLinkTables(linkTablesData);
const staticFiles = new Set(staticFilesArr);
const markdown = createMarkdownIt({ highlighter, linkTables, baseurl, staticFiles });
const site = { ...siteData, markdown, buildInfo };
await renderPhase(chunk, site);
await templatePhase(chunk, site, initData);
// book-combined pages have renderedContent but no html (Phase 8
// handles them from renderedContent); send html: undefined for those.
// offlineHtml and offlineMisses are undefined when skipOffline is true.
return chunk.map(p => ({
destPath: p.destPath,
renderedContent: p.renderedContent,
html: p.html,
offlineHtml: p.offlineHtml,
offlineMisses: p.offlineMisses,
}));
},
};
parentPort.on("message", async (msg) => {
const { name, ...payload } = msg;
const handler = handlers[name];
if (!handler) {
parentPort.postMessage({ error: `unknown task: ${name}` });
return;
}
try {
const result = await handler(payload);
parentPort.postMessage({ result });
} catch (err) {
parentPort.postMessage({ error: err.message, stack: err.stack });
}
});
// linkTables values are page objects in the main pipeline, but
// resolveLink() in the relative-links plugin only reads .permalink.
// The serialized form ships [key, permalink] pairs; we reconstruct
// minimal { permalink } stubs in the worker.
function reconstructLinkTables({ byPath, byUrl, byRedirect }) {
const make = (pairs) => new Map(pairs.map(([k, pl]) => [k, { permalink: pl }]));
return { byPath: make(byPath), byUrl: make(byUrl), byRedirect: make(byRedirect) };
}The matching serializeLinkTables lives in render.mjs next to
buildLinkTables and is called from markdownInit.execute() on main:
// builder/render.mjs
export function serializeLinkTables(lt) {
const pairs = (m) => [...m.entries()].map(([k, p]) => [k, p.permalink]);
return { byPath: pairs(lt.byPath), byUrl: pairs(lt.byUrl), byRedirect: pairs(lt.byRedirect) };
}buildInit export from template.mjs. The markdownInit /
buildInit tasks both run on main; they call template.mjs's
internal buildInit() helper, which is currently file-local. Phase 0
of the migration re-exports it as buildInitFn (renamed to avoid
shadowing the task ID inside tbdocs.mjs). Today's templatePhase()
still calls the local function directly; the export adds no overhead.
Two independent initHighlighter() invocations:
- Main thread.
markdownInit.execute()awaitsinitHighlighter()to build the shared markdown-it instance used by seo and (indirectly) bybook.mjs's subtitle/intro rendering duringassembleBook. The cached singleton lives inhighlight.mjs'scachedmodule-level variable on the main thread. - Each worker.
cpu-worker.mjscallsinitHighlighter()at module scope without awaiting. Module evaluation finishes synchronously, soparentPort.on("message")is installed before the pool dispatches. Only therenderhandler awaits the promise. Thescss,mermaid,buildInfohandlers don't need a highlighter -- their workers can service tasks while Shiki is still loading.
The two contexts each have their own cached singleton. Total WASM init cost is paid once per thread (main + N workers), all in parallel, overlapping with worker spawn and the main spine.
Structured clone via postMessage. Negligible cost (< 1 ms).
Serialized once on main inside markdownInit.execute() to
linkTablesData ([key, permalink] pairs, ~50 KB). Shipped to each
render worker via dispatch's output; each worker reconstructs
minimal { permalink } stubs.
The biggest single transfer. The dispatch output's chunks[i]
contains roughly pages.length / N page objects with rawContent
attached. On a 4-worker box: ~215 pages × ~4 KB raw + frontmatter =
~860 KB per chunk × 4 workers = ~3.4 MB total ship-out. The deltas
returned are ~30 KB per worker (just destPath + renderedContent +
html). Two crossings per chunk; one-way ~3.4 MB total at chunk send,
much smaller at delta return.
nav / seo / loadData / resolveBookChapters / buildInit run on main
against state.pages directly. No marshalling, no delta merge --
mutations are immediately visible to downstream main-thread tasks.
The render fan-out's shared payload (siteData + initData + linkTablesData + staticFilesArr + baseurl + buildInfo + sitePathsArr + offlineExcludePatterns + skipOffline, ~310--330 KB)
is JSON-serialized once on the main thread into a SharedArrayBuffer
via sab-broadcast.mjs's packShared(). Each render task receives
the SAB reference (shared memory, not cloned) alongside its per-worker
chunk. Workers call unpackShared() to deserialize independently and
in parallel; each builds a new Set(sitePathsArr) to drive the
inline offline URL rewrite. Measured saving at Phase 4 baseline
(~286 KB, pre-offline fields): ~8 ms per build (fan-out drops from
~19 ms to ~9 ms).
Three severity levels:
-
Fatal (task throws):
_onErrorrejects_doneP. The orchestrator catches, prints, exits 1. Matches today's behavior for nav integrity failures, unsupported layouts, redirect collisions. The pool's outstanding work is implicitly cancelled when the orchestrator callspool.destroy()during shutdown. -
Degraded (task sets a flag): the task returns normally with a
{ failed: true }field in its output. Downstream tasks receive the output (write still needsscssResulteven if compilation failed -- it just skips emitting the generated asset). After_donePresolves, the orchestrator checks results for degraded flags and setsprocess.exitCode. Matches today's mermaid / scss behavior. Applies symmetrically to both seed tasks: a sass compile error setsscssResult.failed, a mermaid render error setsmermaidStats.failed. -
Setup skip (puppeteer / sass missing): task returns
{ setupSkipped: true }. Downstream tasks see existing-on-disk artifacts (mermaid: prior SVGs; scss: nothing emitted, but the theme tree's hand-extracted CSS still applies). Not an error. -
Worker death (worker crashes, OOM, native segfault): the pool's
w.on("error")rejects the in-flight task; the rejection surfaces through_onErroras Fatal above. The dead worker is not respawned (see §Worker death policy).
The pool is constructed in serve.mjs's long-lived process and
re-used across rebuilds; only the Scheduler instance (and its
SharedState) is fresh per rebuild. Workers stay warm -- WASM, JIT,
module cache all survive. The worker spawn cost is paid once, at
runServe() startup.
// serve.mjs (sketch)
const pool = new WorkerPool(os.availableParallelism(), CPU_WORKER_URL);
watcher.on("change", debounce(async () => {
const scheduler = new Scheduler({ pool, tasks: TASKS });
await scheduler.start(ctx);
}, 100));Incremental invalidation (rebuild only changed tasks) is a much later phase; defer.
The link-checker passes (scripts/check_links.mjs) currently run
outside the build, via check.bat. The earlier CheckPool
worker_threads integration in tbdocs.mjs has been removed; this plan
inherits that decision and does not re-integrate link-checking
into the scheduler.
The scheduler design accommodates checks cleanly as runOnMain: true
tasks that delegate to a CheckPool instance passed in via ctx, so
the integration can be re-added as a follow-up phase if desired. The
shape would be:
checkOnline: {
expected: ["writeAux"], // _site/ must be fully written
runOnMain: true,
async execute(_, ctx) {
const r = await ctx.checkPool.run(buildCheckArgv("online", ctx.destRoot, ...));
return { name: "online", ...r };
},
submit() { /* terminal */ },
},But the initial scheduler migration treats check.bat as the
canonical post-build verifier and lands without touching it.
The DAG nodes downstream of render and scss/mermaid. All
write-family tasks are runOnMain: true -- they own the master
pages[] and state.site reads; their CPU sections are short
relative to their I/O.
writePdf: {
expected: ["renderJoin", "mermaid"],
runOnMain: true,
// Sources CSS directly: tb-highlight.css from state.site.highlighter,
// print.css from staticFiles. No dependency on write or _site/.
},
write: {
expected: ["renderJoin", "scss", "mermaid", "prepDest"],
runOnMain: true,
async execute({ scss: { scssResult }, mermaid: { mermaidStats } }, ctx, state) {
// render delta merges already happened in each render:i.submit().
// mermaid.submit() already appended new SVG descriptors to
// state.staticFiles synchronously.
const generatedAssets = [];
if (state.site.highlighter?.themeCss) generatedAssets.push(/* tb-highlight.css */);
if (scssResult.compiled) generatedAssets.push(/* just-the-docs-combined.css */);
return writePhase(state.pages, state.staticFiles, {
destRoot: ctx.destRoot,
generatedAssets,
baseurl: String(state.site.config.baseurl || ""),
dryRun: ctx.opts.dryRun,
});
},
submit(out, emit) { emit("writeAux", out); },
},
searchData: {
expected: ["renderJoin", "prepDest"],
runOnMain: true,
// Reads only in-memory renderedContent; writes search-data.json
// into _site/ (needs prepDest). Runs in parallel with write.
async execute(_, ctx, state) {
return writeSearchData(state.pages, state.site, ctx.destRoot);
},
submit(out, emit) { emit("writeAux", out); },
},
writeAux: {
expected: ["write", "searchData", "deriveRedirects", "deriveSitemap"],
runOnMain: true,
async execute({ deriveRedirects, deriveSitemap }, ctx, state) {
await Promise.all([
writeRedirects(state.pages, state.site, ctx.destRoot, deriveRedirects.stubs),
writeSitemap (state.pages, state.site, ctx.destRoot, deriveSitemap.urls),
]);
},
submit(out, emit) {
emit("writeOffline", out);
},
},
writeOffline: {
expected: ["writeAux"],
runOnMain: true,
async execute(_, ctx, state) {
return writeOffline(state.pages, state.staticFiles, state.site,
ctx.destRoot, {
auxStats,
precomputed: true,
sitePaths: state.sitePaths,
});
},
submit() { /* terminal */ },
},
// writePdf depends on renderJoin + mermaid (CSS sourced directly).
// It runs in parallel with write → searchData → writeAux → writeOffline.Restoring the derive-time exports. redirects.mjs /
sitemap.mjs regain the precomputedStubs / precomputedUrls
passthrough parameters so the derive tasks' outputs can flow into the
write calls without re-deriving.
deriveRedirectStubs filter change. The current filter
(p.html !== undefined) is set after template runs. Under the
scheduler, deriveRedirects runs concurrently with the main spine
and well before render+template. Change the filter in redirects.mjs
to p.frontmatter.layout !== "book-combined" (the property that
determines whether html will be set; known after discover) so the
derive can run at any point after discover.
Workerizing writeOffline or writePdf (measured, declined). Both
phases have non-trivial CPU sections: writeOffline rewrites URLs
across all 856 HTML files; writePdf assembles book.html via
assembleBook. writePdf now depends only on renderJoin + mermaid
(CSS is sourced directly, not from _site/), so it runs in parallel
with the entire write → searchData → writeAux → writeOffline chain.
Cooperative async concurrency on the main thread interleaves their I/O
gaps. The structured-clone cost of shipping pages[] across the worker
boundary (~37–65 ms) would be pure overhead. See §Phase 3-follow-up
for the full measurement.
The scheduler records { start, end } per task. The summary is
formatted to match the current t.lap() output style:
config=1ms discover=98ms scss=1041ms mermaid=2ms buildInfo=8ms
nav=9ms buildInit=1ms markdownInit=63ms seo=34ms loadData=4ms
resolveBookChapters=7ms render:0=312ms render:1=298ms ...
write=542ms searchData=41ms writeAux=12ms
writeOffline=1210ms writePdf=352ms
// tbdocs.mjs
import os from "node:os";
import { WorkerPool } from "./worker-pool.mjs";
import { Scheduler } from "./scheduler.mjs";
const CPU_WORKER_URL = new URL("./cpu-worker.mjs", import.meta.url);
export async function runBuild(opts) {
const srcRoot = path.resolve(process.cwd(), opts.src);
const destRoot = path.resolve(opts.dest ?? path.join(srcRoot, "_site"));
const workerCount = os.availableParallelism();
const pool = new WorkerPool(workerCount, CPU_WORKER_URL);
const scheduler = new Scheduler({ pool, tasks: TASKS });
const ctx = { srcRoot, destRoot, opts, workerCount };
try {
const results = await scheduler.start(ctx);
console.log(scheduler.summary());
// ... existing summary output using results + scheduler.state ...
return { pages: scheduler.state.pages,
staticFiles: scheduler.state.staticFiles,
site: scheduler.state.site,
destRoot };
} finally {
await pool.destroy();
}
}The current code is the simple serial baseline -- there is no scaffolding to delete, only new pieces to add. Each phase keeps the build working end-to-end and produces byte-identical output.
Where to start. Implement the phases in order, beginning with Phase 0. Commit after each phase. The verification gate at the end of each phase block below is the done-signal -- if it doesn't pass cleanly, the phase isn't done.
Historical context (cited inline below). Some refactors restore
small carriers (function parameters, return-value fields) that
existed in an earlier WIP iteration of the parallelisation work --
commit 5736fee4 ("WIP dataflow parallelization work") -- and were
reverted along with the threading shape they served. Where this plan
refers to "restoring" something, git show 5736fee4 -- <file> shows
the prior form for reference; do not re-introduce the whole reverted
shape, only the small pieces noted.
To reduce session cost, the phases are labelled with the model that fits each phase's nature. Sonnet is preferred when the spec is precise enough that the implementation is a translation; Opus is preferred when correctness depends on reasoning about concurrency, data lifetime across the worker boundary, or low-level serialisation.
| Phase | Model | Why |
|---|---|---|
| 0. Skeleton + small refactors | Sonnet | Code is given verbatim in §Worker pool / §Scheduler core. The five refactors are precisely-bounded edits to existing modules. No design judgement needed. |
| 1. Seeds + main-thread spine | Sonnet | Each task body is a thin wrapper around an existing phase function. The scheduler core is copy-from-plan. All mutation is on the main thread (no cross-worker identity yet). |
| 2. Render fan-out | Opus | Cross-thread structured-clone semantics, module-scope initialisation order, dynamic task registration, per-page delta-merge identity. Debugging concurrency bugs needs depth. |
| 3. Post-write tasks | Sonnet | All runOnMain; thin wrappers around existing write functions. No new concurrency surface beyond the already-built scheduler. |
| 3-follow-up. Workerize writeOffline | Opus (decided) | Profiled: zero CPU contention; cooperative async overlap is already optimal. Declined — no implementation needed. |
| 4. SAB broadcast | Opus | JSON + SAB approach; measured ~55% fan-out reduction (~8 ms saving). |
Escalate to Opus mid-phase if a Sonnet session hits a debugging block it can't reason through.
After each phase, run:
build.bat && check.batThe phase is done iff:
build.batexits 0; no new warnings vs. the prior phase.- The summary line shows
pages.lengthat the current baseline (857 today; the drift guard intbdocs.mjswarns if it slips below 836). - All three
check.batpasses return0 issue(s)for online and offline, and the PDF pass's pre-existing broken-link count matches the baseline (8 today; unrelated to the scheduler work). - The scheduler's timing summary shows the expected concurrency pattern for the phase (each phase block notes what to look for).
The site is the regression bar; there is no separate unit-test harness for builder/ to satisfy.
Suggested model: Sonnet.
Create three new modules:
builder/worker-pool.mjs-- theWorkerPoolclass (~50 LOC, see §Worker pool).builder/scheduler.mjs-- theSchedulerclass +SharedState(~150 LOC, see §Scheduler core).builder/cpu-worker.mjs-- the worker harness (parentPortmessage dispatcher + empty handlers map for now, ~15 LOC).
Refactors needed (each is small and lands cleanly under today's serial
runBuild):
- Re-export
buildInitfromtemplate.mjs(currently file-local) asbuildInitFn. Today'stemplatePhase()still calls the local function; the export is for the upcoming main-threadbuildInittask. - Export
serializeLinkTablesfromrender.mjs. Today nothing calls it; Phase 2 wires it up. - Change the filter in
redirects.mjs'sderiveRedirectStubsfromp.html !== undefinedtop.frontmatter.layout !== "book-combined". Behaviour is identical under the current serial pipeline (template has already run by the time writeRedirects fires), but the new filter lets the derive step run before template under the scheduler. - Restore the
precomputedStubs/precomputedUrlspassthrough parameters onwriteRedirects/writeSitemap(these were in commit5736fee4and removed in the revert). - Add
svgFiles: [{ srcPath, srcRel, destRel, size }, ...]toregenerateMermaid's return value (this field was in5736fee4and removed; the stat happens inside the existingregenerateMermaidcall).
The build still runs from the existing runBuild() exactly as today.
No new runtime dependencies.
Deliverable: new modules compile, refactors land under the serial pipeline, build output unchanged.
Verification. build.bat && check.bat clean. Output and timing
summary unchanged vs. before Phase 0 -- no scheduler is wired up
yet, so the build must look identical.
Suggested model: Sonnet.
Wire runBuild() to construct the pool, instantiate the scheduler,
and call scheduler.start(ctx). Port:
- Seeds:
config,buildInfo,scss,mermaid,prepDest. - Main-thread spine:
discover,nav,markdownInit,seo,loadData,resolveBookChapters,buildInit,deriveRedirects,deriveSitemap.
The existing renderPhase → templatePhase → writePhase →
post-write code stays in runBuild() as a trailing block that
consumes scheduler.state after scheduler.start() resolves.
config runs on main as a runOnMain task; discover ditto for
identity reasons (it builds state.pages and state.pageByDest).
serve.mjs is untouched. serve.mjs imports runBuild from
tbdocs.mjs and calls it per change. As long as runBuild's
signature stays stable (it does), serve.mjs needs no changes
through Phases 0-3. The dev-server flow keeps working end-to-end
without scheduler-aware code in serve.mjs.
Expected savings: ~150 ms wall-clock. The main spine takes
~250 ms; scss (~700 ms) overlaps roughly half of it. Worker spawn
(~100-200 ms one-shot) eats a chunk of that. Honest end-to-end
estimate: 6.7 s → ~6.55 s. Most of the value here is structural --
the DAG is now explicit -- not raw wall-clock.
Verification. build.bat && check.bat clean. The timing summary
should show scss=...ms starting at t=0 alongside the main-thread
spine entries (discover, nav, ...), not after them. render /
template / write / writeOffline / writePdf still appear in
the summary at roughly their current durations -- they haven't been
moved to the scheduler yet.
Suggested model: Opus.
Wire up the render named handler in cpu-worker.mjs (renderPhase
templatePhaseover a chunk, return per-page deltas). Add thedispatchtask + dynamicrender:0..Nregistration. Drop the serialrenderPhase/templatePhasecalls fromrunBuild().
This is the largest single win: ~3.5 s of CPU compresses to
~3500 / N ms.
Expected savings: on 4 cores, ~2.6 s saved (6.55 s → ~4.0 s). On 8 cores, ~3 s saved (~3.6 s). Dispatch overhead is ~50 ms.
Verification. build.bat && check.bat clean. The timing summary
should show N render:i entries (one per worker) whose individual
durations sum to roughly today's combined render + template time,
not today's per-page renders. Wall-clock drop should match the
expected savings above within ±20%.
Suggested model: Sonnet.
Port write, searchData, writeAux, writeOffline, writePdf as
runOnMain tasks. writeOffline and writePdf run in parallel on
the main thread; the gain is the shorter of their two CPU sections
(~240 ms) plus interleaved I/O.
runBuild() shrinks to: pool construction + scheduler.start(ctx) +
summary output + pool.destroy().
Expected savings: ~240 ms (4.0 s → ~3.75 s on 4 cores).
Verification. build.bat && check.bat clean. writeOffline and
writePdf should overlap in the timing summary -- their start
timestamps should be within a few ms of each other.
Decision: do not workerize. Profiling shows zero CPU contention.
Two independent measurement runs confirmed that writeOffline and
writePdf already achieve perfect overlap via cooperative async
concurrency on the main thread. In both runs the combined wall-clock
equalled max(writeOffline, writePdf) — the "wasted (CPU contention)"
metric was 0 ms. writePdf's assembleBook synchronous section
(~150 ms) runs entirely inside writeOffline's I/O await gaps.
Structured-clone cost for shipping pages[] across the worker
boundary was measured at ~37–65 ms (depending on per-page HTML size),
which would be pure overhead against a 0 ms contention baseline.
Adding the offline handler to cpu-worker.mjs would also increase
worker spawn time (acorn import), complicate the worker's module
surface, and add a result-merge path — all for no measurable gain.
The overlap already saves ~260 ms vs. sequential execution (the full
duration of writePdf). No further action needed.
Suggested model: Opus.
For the render fan-out, serialize siteData + initData + linkTables + staticFilesArr + baseurl + buildInfo once into a SharedArrayBuffer
and pass it to all render tasks. The SAB is shared memory --- each
worker deserializes its own copy from the same buffer instead of the
main thread structured-cloning the ~286 KB shared payload 16 times.
Implementation. Three files:
builder/sab-broadcast.mjs(~15 LOC):packShared(obj)serializes an object to JSON, encodes to UTF-8, and copies into a SAB;unpackShared(sab)reverses the process.tbdocs.mjsdispatch.execute(): packs the shared payload into a SAB and returns{ chunks, sharedSAB }instead of the flat fields.cpu-worker.mjsrenderhandler: callsunpackShared(sharedSAB)to reconstruct the shared fields before rendering.
Measurements (16 workers, ~286 KB shared payload, 857 pages):
| Run 1 | Run 2 | Run 3 | Median | |
|---|---|---|---|---|
| Baseline (structured-clone) | 18.1 ms | 35.0 ms | 19.5 ms | ~19 ms |
| SAB broadcast | 9.2 ms | 7.3 ms | 10.2 ms | ~9 ms |
SAB packing cost (in dispatch.execute()): ~2 ms (visible as
dispatch=2ms vs. prior dispatch=0ms). Net saving: ~8 ms per build,
a ~55% reduction in fan-out overhead. The saving is modest in absolute
terms (~0.2% of a ~4 s build) but the implementation is small and the
pattern moves redundant serialization work off the main thread ---
each worker independently deserializes from shared memory in parallel
instead of the main thread serializing 16 identical copies
sequentially.
Verification. build.bat && check.bat clean. dispatch now
shows ~2 ms (SAB packing) vs. 0 ms before.