Skip to content

Commit 53b8f25

Browse files
authored
Merge pull request #153 from KubaO/staging
Render speed-ups and performance investigations.
2 parents e0bed7a + 1b4fad6 commit 53b8f25

36 files changed

Lines changed: 9176 additions & 4141 deletions

WIP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ From `docs/`:
433433
- `check.bat` — link check (offline Lychee against `_site/`).
434434
- `book.bat` — renders the PDF from `_site-pdf/book.html` via `pagedjs-cli` into `_pdf/book.pdf`. Run `build.bat` first to populate `_site-pdf/`.
435435

436-
The HTML whitespace compression that wraps every page's render chain is handled by `_plugins/html-compress.rb` rather than the just-the-docs theme's `vendor/compress.html` Liquid layout — see [_plugins/html-compress.md](docs/_plugins/html-compress.md) for the full writeup. The Liquid layout's per-page cost in the profile was ~2.4s of Liquid filter dispatch (a `split: " " | join: " "` over the outside-of-`<pre>` content, lowering to a per-page Array allocation of every whitespace-delimited token across 837 pages — millions of small `String` objects). The layout is short-circuited via `compress_html.ignore.envs: all` in `_config.yml`; it then outputs a bare `{{ content }}` and the plugin takes over at `:pages, :post_render` / `:documents, :post_render` with `priority :high`, doing the same pre-block-protected whitespace collapse via `content.split(PRE_BLOCK_RE).each { |s| s.split(" ").join(" ") }` in C-implemented Ruby. The `priority :high` annotation places this hook before offlinify and pdfify (both `:normal`) so they see the compressed bytes. Pages whose layout chain doesn't reach `vendor/compress` are gated out via a `:site, :pre_render` precompute that walks `site.layouts[name].data["layout"]` for every layout key and marks the entire compress-reaching chain (default → table_wrappers → vendor/compress) -- jekyll-redirect-from stubs, the SCSS-derived CSS pages, `assets/js/zzzz-search-data.json`, and `book.html` (which uses the minimal `book-combined` layout that has no parent) all stay un-gated and pass through verbatim, matching exactly what the Liquid layout would have processed. Output is byte-identical to the layout-based version: a recursive `diff -rq` of `_site/` against a vendor/compress.html baseline reports zero differences across all ~840 HTML pages, 290 redirect stubs, every CSS / JSON / SVG / image asset. The plugin's correctness depended on two non-obvious details that broke an earlier cut -- the layout-chain walk has to compare against the layout *key* (`"vendor/compress"`) rather than `layout.name` (which carries the `.html` extension), and the per-segment `split(" ").join(" ")` strips trailing whitespace that the Liquid layout's *template* re-adds via its trailing-newline source character, so the plugin captures `content.end_with?("\n")` before the split and re-appends a `\n` after the join. Both regressions surfaced as nonzero `diff -rq` counts during development and are flagged in the plugin's header comment and [_plugins/html-compress.md](docs/_plugins/html-compress.md).
436+
The HTML whitespace compression that wraps every page's render chain is handled by `_plugins/html-compress.rb` rather than the just-the-docs theme's `vendor/compress.html` Liquid layout — see [_plugins/html-compress.md](docs/_plugins/html-compress.md) for the full writeup. The Liquid layout's per-page cost in the profile was ~2.4s of Liquid filter dispatch (a `split: " " | join: " "` over the outside-of-`<pre>` content, lowering to a per-page Array allocation of every whitespace-delimited token across 837 pages — millions of small `String` objects). The layout is short-circuited via `compress_html.ignore.envs: all` in `_config.yml`; it then outputs a bare `{{ content }}` and the plugin takes over at `:pages, :post_render` / `:documents, :post_render` with `priority :normal`, doing the same pre-block-protected whitespace collapse via `content.split(PRE_BLOCK_RE).each { |s| s.split(" ").join(" ") }` in C-implemented Ruby. The `:normal` priority is the *middle* tier of a three-level convention across the site's `:post_render` hooks: mutators (`book-href-rewrite`) run at `:high`, this cleanup pass at `:normal`, readers (`pdfify`, `offlinify`) at `:low`. The invariant "compress runs after every mutator and before every reader" therefore holds by construction; no downstream plugin has to be whitespace-aware. Pages whose layout chain doesn't reach `vendor/compress` are gated out via a `:site, :pre_render` precompute that walks `site.layouts[name].data["layout"]` for every layout key and marks the entire compress-reaching chain (default → table_wrappers → vendor/compress) -- jekyll-redirect-from stubs, the SCSS-derived CSS pages, and `assets/js/zzzz-search-data.json` all stay un-gated and pass through verbatim. `book.html` (which uses the minimal `book-combined` layout that has no parent) is *also* outside that chain but is explicitly added to the compress-eligible set at the end of the precompute, so the same whitespace collapse runs on it -- saves paged.js's render-time `WhiteSpaceFilter` ~37k DOM mutations (~28k `textContent` overwrites + ~9k `removeChild` calls) at the cost of ~480 ms once per Jekyll build. Output is byte-identical to the layout-based version: a recursive `diff -rq` of `_site/` against a vendor/compress.html baseline reports zero differences across all ~840 HTML pages, 290 redirect stubs, every CSS / JSON / SVG / image asset. The plugin's correctness depended on two non-obvious details that broke an earlier cut -- the layout-chain walk has to compare against the layout *key* (`"vendor/compress"`) rather than `layout.name` (which carries the `.html` extension), and the per-segment `split(" ").join(" ")` strips trailing whitespace that the Liquid layout's *template* re-adds via its trailing-newline source character, so the plugin captures `content.end_with?("\n")` before the split and re-appends a `\n` after the join. Both regressions surfaced as nonzero `diff -rq` counts during development and are flagged in the plugin's header comment and [_plugins/html-compress.md](docs/_plugins/html-compress.md).
437437

438438
### Profiling the build
439439

docs/_plugins/book-href-rewrite.rb

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,11 @@ def self.process(page)
372372
end
373373
end
374374

375-
Jekyll::Hooks.register :pages, :post_render do |page|
375+
# :high so this MUTATOR runs before html-compress (priority :normal).
376+
# Otherwise the landing-heading strip leaves a double-space run that
377+
# no downstream pass cleans up. See html-compress.rb's priority
378+
# convention comment for the full layering.
379+
Jekyll::Hooks.register :pages, :post_render, priority: :high do |page|
376380
next unless page.path == "book.html"
377381
BookHrefRewrite.process(page)
378382
end

docs/_plugins/html-compress.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# HtmlCompress
22

3-
`_plugins/html-compress.rb` runs the HTML whitespace compression that wraps every page's render chain — the same job just-the-docs's vendor/compress.html Liquid layout was doing, but in Ruby instead of Liquid filters. Output is byte-identical to the layout-based version (verified by recursive diff of every file in `_site/` against a vendor/compress.html baseline). The Liquid layout is short-circuited to a `{{ content }}` passthrough via `compress_html.ignore.envs: all` in `_config.yml`; the plugin then runs at `:pages, :post_render` / `:documents, :post_render` with `priority :high`, so the compressed bytes are what offlinify and Jekyll's writer see.
3+
`_plugins/html-compress.rb` runs the HTML whitespace compression that wraps every page's render chain — the same job just-the-docs's vendor/compress.html Liquid layout was doing, but in Ruby instead of Liquid filters. Output is byte-identical to the layout-based version for the 837 vendor/compress-reaching pages (verified by recursive diff of every file in `_site/` against a vendor/compress.html baseline). The Liquid layout is short-circuited to a `{{ content }}` passthrough via `compress_html.ignore.envs: all` in `_config.yml`; the plugin then runs at `:pages, :post_render` / `:documents, :post_render` with `priority :normal` as the *cleanup* step in a three-tier `:high``:normal``:low` ordering (mutators → compress → readers — see [Hook priority convention](#hook-priority-convention) below). It also picks up one page the original layout didn't process, `book.html`, via an explicit `book-combined` addition to the compress-eligible set — see [book.html inclusion](#bookhtml-inclusion).
44

55
This file sits in `_plugins/` for the same reasons as `offlinify.md` and `pdfify.md`: it lives next to the code it documents, and Jekyll's `_plugins/` folder is plugin-only territory, so this Markdown never gets rendered into the public site.
66

@@ -74,7 +74,7 @@ page.md (layout: default)
7474
└── vendor/compress.html (no layout)
7575
```
7676

77-
Pages that don't use any of these layouts — jekyll-redirect-from stubs, the SCSS-derived CSS pages, `assets/js/zzzz-search-data.json`, `book.html` (which uses the minimal `book-combined` layout that has no parent) — were left untouched by the layout. The plugin has to match that gating, otherwise it would compress files that compress.html doesn't, breaking byte-identity.
77+
Pages that don't use any of these layouts — jekyll-redirect-from stubs, the SCSS-derived CSS pages, `assets/js/zzzz-search-data.json` — were left untouched by the layout. The plugin has to match that gating, otherwise it would compress files that compress.html doesn't, breaking byte-identity. `book.html` (which uses the minimal `book-combined` layout that has no parent) was originally in this list, but is now explicitly added to the compress-eligible set — see [book.html inclusion](#bookhtml-inclusion).
7878

7979
The gate is precomputed once at `:site, :pre_render`:
8080

@@ -114,20 +114,42 @@ Jekyll::Hooks.register :site, :pre_render do |site|
114114
HtmlCompress.precompute_compress_layouts!(site)
115115
end
116116

117-
Jekyll::Hooks.register :pages, :post_render, priority: :high do |page|
117+
Jekyll::Hooks.register :pages, :post_render, priority: :normal do |page|
118118
next unless page.output.is_a?(String)
119119
next unless HtmlCompress.compress?(page)
120120
HtmlCompress.compress!(page.output)
121121
end
122122

123-
Jekyll::Hooks.register :documents, :post_render, priority: :high do |doc|
123+
Jekyll::Hooks.register :documents, :post_render, priority: :normal do |doc|
124124
next unless doc.output.is_a?(String)
125125
next unless HtmlCompress.compress?(doc)
126126
HtmlCompress.compress!(doc.output)
127127
end
128128
```
129129

130-
The `priority: :high` is what places the plugin *before* `offlinify.rb` and `pdfify.rb` in the per-page render-hook order — both of those use the default `:normal` priority and rely on reading the final compressed `page.output`. Jekyll runs `:post_render` hooks in descending priority, so `:high` (30) fires before `:normal` (20). Without the priority annotation the order would be insertion-order across all `.rb` files in `_plugins/`, which is not a stable contract.
130+
## Hook priority convention
131+
132+
The `priority: :normal` is the middle tier of a three-level ordering for `:pages, :post_render` and `:documents, :post_render` hooks across the plugin set. Jekyll runs hooks in descending priority (`:high` (30) → `:normal` (20) → `:low` (10)), and the three tiers carry distinct roles:
133+
134+
| Tier | Role | Plugins |
135+
| --- | --- | --- |
136+
| `:high` (30) | **Mutators.** Modify `page.output` so the final bytes reflect this pass. | `book-href-rewrite` (chapter href rewrites + landing-heading strip on `book.html`). |
137+
| `:normal` (20) | **Compress.** The cleanup pass. Sandwiched between mutators and readers so any whitespace runs left behind by a mutator's `gsub` get collapsed before any reader captures the bytes. | `html-compress` (this plugin). |
138+
| `:low` (10) | **Readers.** Snapshot or consume `page.output` after the cleanup pass. | `pdfify` (captures `book.html` for the PDF pipeline), `offlinify` (per-page href / src rewrites + write to `_site-offline/`). |
139+
140+
The layering was originally implicit: the plugin sat at `:high` next to no other priority-annotated `:post_render` hooks. That worked until `book-href-rewrite` joined the set at default `:normal`. Its landing-heading strip ran *after* compress, removing `<h2>` blocks but leaving the (already-collapsed) single-space runs on either side adjacent — producing literal `> <` blobs in three chapter openings that paged.js's WhiteSpaceFilter then had to handle at render time. Promoting `book-href-rewrite` to `:high` and demoting compress to `:normal` makes the invariant "compress is the last cleanup step among mutators" hold by construction; demoting the readers to `:low` makes "readers see the final compressed output" hold by construction. Future plugins choose their tier by their role and the ordering composes automatically.
141+
142+
The full priority story is documented as a comment block above the `Jekyll::Hooks.register` calls in [`html-compress.rb`](html-compress.rb); each of the four affected plugins (this one, `book-href-rewrite`, `pdfify`, `offlinify`) carries a one-line note pointing back to that block.
143+
144+
## book.html inclusion
145+
146+
The layout-chain walk above only marks layouts that reach `vendor/compress`. `book.html` uses the minimal `book-combined` layout, which has no parent, so the walk never reaches it and the page was originally skipped (matching the layout's behaviour). After investigation of paged.js's per-render `WhiteSpaceFilter` work (see [`perf/README.md`](../../perf/README.md)) showed it doing ~37k DOM mutations at render time to handle whitespace text nodes that *would* have been collapsed if the page had been compressed at Jekyll build time, the precompute was extended to mark `book-combined` explicitly:
147+
148+
```ruby
149+
@compress_layouts << "book-combined" if site.layouts.key?("book-combined")
150+
```
151+
152+
at the end of `precompute_compress_layouts!`. Output: `book.html` now passes through `compress!` once per build (~480 ms of additional `String#split` work on the ~5.5 MB document), saving roughly the same wall-clock at paged.js render time (~28k `textContent` overwrites + ~9k `removeChild` calls eliminated). Net is approximately wall-clock-neutral for full builds, and a small net win for incremental Jekyll workflows that skip the PDF (`also_build_pdf: false`) — the compress cost is paid once per Jekyll build, the render saving is paid every PDF build, and decoupling the two is the structural improvement.
131153

132154
## Verification
133155

@@ -157,6 +179,6 @@ In source order in [`html-compress.rb`](html-compress.rb):
157179

158180
- `precompute_compress_layouts!(site)``:site, :pre_render` entry. Walks every layout chain via `data["layout"]`, marks each layout on the path as compress-ending the moment the walk hits `vendor/compress`. Idempotent; the resulting `@compress_layouts` set persists across builds in `jekyll serve` and gets rebuilt fresh each `:pre_render`.
159181

160-
- `compress?(page)` — gate check. Returns `true` when the page's `data["layout"]` is in `@compress_layouts`. Pages without a layout (jekyll-redirect-from stubs, SCSS-derived CSS, JSON-via-page-rendering, `book.html` via `book-combined`) return `false` and skip the compression entirely.
182+
- `compress?(page)` — gate check. Returns `true` when the page's `data["layout"]` is in `@compress_layouts`. Pages without a layout (jekyll-redirect-from stubs, SCSS-derived CSS, JSON-via-page-rendering) return `false` and skip the compression entirely. `book.html` (which uses `book-combined`, a minimal layout with no parent) used to land here too; it is now explicitly added to the set by `precompute_compress_layouts!` — see [book.html inclusion](#bookhtml-inclusion).
161183

162184
- `compress!(content)` — the actual compression, in place. Captures the trailing-newline state, splits by `PRE_BLOCK_RE` with the capture group so pre bodies are preserved in the result array, runs `split(" ").join(" ")` on every outside-of-pre segment, joins, restores the trailing newline if needed, then mutates the input string via `String#replace`. The `replace` is what lets us hand back the same string object the caller passed in — Jekyll's writer reads `page.output` after `:post_render`, so in-place mutation is the cheapest way to update what gets written.

docs/_plugins/html-compress.rb

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,12 @@ def self.precompute_compress_layouts!(site)
7272
cur_name = cur ? cur.data["layout"] : nil
7373
end
7474
end
75+
# book-combined is a minimal layout with no parent, so the walk
76+
# above doesn't reach it. Compressing its only consumer (book.html)
77+
# at Jekyll time saves paged.js's WhiteSpaceFilter ~37k DOM
78+
# mutations and ~300-400 ms once per render -- see
79+
# perf/README.md "WhiteSpaceFilter that wasn't" section.
80+
@compress_layouts << "book-combined" if site.layouts.key?("book-combined")
7581
end
7682

7783
# True when `page` (or document) uses a layout chain ending in
@@ -117,16 +123,43 @@ def self.compress!(content)
117123
HtmlCompress.precompute_compress_layouts!(site)
118124
end
119125

120-
# Run before offlinify (default :normal priority) so the offline-tree
121-
# rewrites see the compressed page.output, and before Jekyll's
122-
# `:site, :post_write` writes _site/ for the same reason.
123-
Jekyll::Hooks.register :pages, :post_render, priority: :high do |page|
126+
# Priority convention for :pages, :post_render hooks in this site:
127+
#
128+
# :high = MUTATORS. Plugins that modify page.output. Run first so
129+
# their mutations are visible to compress and downstream
130+
# readers. Examples: book-href-rewrite (landing heading
131+
# strip + in-book href rewrites).
132+
#
133+
# :normal = COMPRESS. This plugin. The cleanup pass, sandwiched
134+
# between mutators and readers so any whitespace runs left
135+
# behind by a mutator's gsub get collapsed before anyone
136+
# reads the final bytes.
137+
#
138+
# :low = READERS. Plugins that snapshot or consume page.output
139+
# after all mutations and the compress pass. Run last so
140+
# they see final output. Examples: pdfify (captures
141+
# book.html for the PDF pipeline), offlinify (rewrites
142+
# root-absolute hrefs and writes to _site-offline/).
143+
#
144+
# Without this layering, a mutator running after compress leaves
145+
# adjacent whitespace runs that no downstream pass collapses; a
146+
# reader running before compress captures uncompressed bytes. Both
147+
# regressions surfaced when book-href-rewrite (default :normal) ran
148+
# after html-compress (originally :high) -- its 3 landing-heading
149+
# strips left double-space artifacts that paged.js's WhiteSpaceFilter
150+
# had to handle at render time.
151+
#
152+
# Offlinify also runs at :site, :post_write (a later phase entirely),
153+
# where it always sees the final compressed bytes regardless of
154+
# per-page priority. The :low designation here governs its per-page
155+
# capture hook specifically.
156+
Jekyll::Hooks.register :pages, :post_render, priority: :normal do |page|
124157
next unless page.output.is_a?(String)
125158
next unless HtmlCompress.compress?(page)
126159
HtmlCompress.compress!(page.output)
127160
end
128161

129-
Jekyll::Hooks.register :documents, :post_render, priority: :high do |doc|
162+
Jekyll::Hooks.register :documents, :post_render, priority: :normal do |doc|
130163
next unless doc.output.is_a?(String)
131164
next unless HtmlCompress.compress?(doc)
132165
HtmlCompress.compress!(doc.output)

docs/_plugins/offlinify.rb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1443,11 +1443,13 @@ def self.decode(path)
14431443
Offlinify.setup(site)
14441444
end
14451445

1446-
Jekyll::Hooks.register :pages, :post_render do |page|
1446+
# :low so these READERS see page.output after html-compress (:normal)
1447+
# has run. See html-compress.rb's priority convention.
1448+
Jekyll::Hooks.register :pages, :post_render, priority: :low do |page|
14471449
Offlinify.process_page(page)
14481450
end
14491451

1450-
Jekyll::Hooks.register :documents, :post_render do |doc|
1452+
Jekyll::Hooks.register :documents, :post_render, priority: :low do |doc|
14511453
Offlinify.process_page(doc)
14521454
end
14531455

docs/_plugins/pdfify.rb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,9 @@ def self.copy_file(src, dst)
287287
Pdfify.setup(site)
288288
end
289289

290-
Jekyll::Hooks.register :pages, :post_render do |page|
290+
# :low so this READER captures page.output after html-compress
291+
# (:normal) has run. See html-compress.rb's priority convention.
292+
Jekyll::Hooks.register :pages, :post_render, priority: :low do |page|
291293
Pdfify.maybe_capture(page)
292294
end
293295

0 commit comments

Comments
 (0)