Skip to content

Commit d6a888d

Browse files
committed
Fold book-chapter-body's replace chains into a Ruby filter.
1 parent 2ead4ea commit d6a888d

4 files changed

Lines changed: 252 additions & 221 deletions

File tree

WIP.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,41 @@ Ranked by estimated wall-clock saving on the current Windows machine:
499499
| `Liquid::Variable#render` total | 10.05 s | 8.96 s | -1.09 s |
500500

501501
The `BlockBody#render` / `Context#stack` / `Variable#render` drops reflect the eliminated `{%- assign -%}` / `{%- if -%}` blocks in head_seo.html (dropped from ~85 lines of Liquid logic to ~20 lines of straight output). The 128 remaining `markdownify` calls come from `book.html`'s part subtitle/intro (~24) and `book-chapter-body.html`'s per-chapter `chapter.content | markdownify` (~100 chapters whose content doesn't start with `<`); both candidates for a follow-up pass (see #3). New `Jekyll::SeoPrecompute#absolute_url` adds 0.44 s for 846 calls, replacing 1,675 filter calls that totalled 0.40 s -- essentially flat, but the absolute_url filter had its own per-build cache, so the swap is a wash on this axis. Output byte-identical to baseline (`diff -rq` clean on all three of `_site/`, `_site-offline/`, `_site-pdf/`).
502-
3. **`book-chapter-body.html` heading-shift + anchor-prefix `replace` chain → Ruby pass.** ~36 k replaces fold into one string rewrite. Smaller win (~0.3 s) but probably wants to land alongside #2 since both touch the same render path.
502+
3. **`book-chapter-body.html` heading-shift + anchor-prefix `replace` chain → Ruby pass. [LANDED]** Replaced the per-chapter chain of 0-3 heading-shift cascades (12 replaces each), the 12-pattern whitespace span wrapping, and the 13-replace anchor-id prefix pass with a single Liquid filter `book_chapter_transform` (`_plugins/book-chapter-transform.rb`). The filter takes the body, the site baseurl, a precomputed `heading_shift_n` (0-3, derived in Liquid from `skip_base_heading_shift` / `is_sub_page` / `extra_heading_shift`), and the chapter anchor; does all six passes in one method with no intermediate string allocations beyond what the regex engine produces internally. The dead `p1_search` / `p1_replace` / ... whitespace-pattern declarations were also removed from `book.html`'s prologue.
503+
504+
The single-pass heading shift (one regex bumping each level by N, capping at h7-stub for source levels above 6) is equivalent to N applications of the bottom-up cascade chain -- each source heading lands at `level + N` or `h7-stub` regardless of how many sequential passes the chain ran, since the cascade structure was an artifact of Liquid not having a bump-by-N primitive, not a semantic requirement.
505+
506+
Ruby-prof effect (post-SEO baseline vs post-chapter-transform):
507+
508+
| Metric | Before | After | Delta |
509+
|---|---|---|---|
510+
| Total instrumented wall | 36.90 s | 34.78 s | -2.12 s |
511+
| `Liquid::Strainer#invoke` total | 5.97 s / 179,266 calls | 5.45 s / 122,397 calls | -0.52 s / -56,869 calls |
512+
| `Liquid::StandardFilters#replace` calls | 87,991 | 48,577 | -39,414 |
513+
| `Liquid::StandardFilters#replace` total | 0.58 s | 0.33 s | -0.25 s |
514+
| new `BookChapterTransform#book_chapter_transform` | -- | 0.14 s / 718 calls | +0.14 s |
515+
| `Liquid::BlockBody#render` total | 16.14 s | 14.43 s | -1.71 s |
516+
| `Liquid::Context#stack` total | 15.50 s | 13.78 s | -1.71 s |
517+
| `Liquid::Variable#render` total | 8.96 s | 7.82 s | -1.14 s |
518+
519+
The Liquid framework drops (`BlockBody#render`, `Context#stack`, `Variable#render`) again outweigh the filter-dispatch drop -- they capture the eliminated `{%- unless -%}` / `{%- if -%}` blocks plus the chained `| replace:` pipeline AST nodes. The new filter does ~190 µs per call across 718 invocations, covering the same work the eliminated 39 k Liquid replaces did. Output byte-identical to baseline (`diff -rq` clean on `_site/`, `_site-offline/`, `_site-pdf/`).
520+
521+
#### Cumulative
522+
523+
The three landed optimizations together (chapter precompute, SEO precompute, chapter-body transform) shrank ruby-prof's instrumented build wall from ~41.7 s (immediately post-html-compress baseline) down to 34.78 s -- about -7 s. The cumulative profile-table picture, comparing the post-html-compress baseline to the post-chapter-transform state:
524+
525+
| Metric | Post-html-compress | Post-CT | Delta |
526+
|---|---|---|---|
527+
| Total instrumented wall | 39.30 s | 34.78 s | -4.52 s |
528+
| `Liquid::Strainer#invoke` total | 8.90 s / 191,365 calls | 5.45 s / 122,397 calls | -3.45 s / -68,968 calls |
529+
| `where_exp` calls | 37 | 0 | -37 |
530+
| `markdownify` calls | 1,802 | 128 | -1,674 |
531+
| `absolute_url` filter calls | 1,675 | 1 | -1,674 |
532+
| `replace` calls | 87,991 | 48,577 | -39,414 |
533+
| `Liquid::BlockBody#render` total | 18.38 s | 14.43 s | -3.95 s |
534+
| `Liquid::Context#stack` total | 18.19 s | 13.78 s | -4.41 s |
535+
536+
What's left of the per-filter table is approximately what kramdown / Rouge actually parse and emit: the 128 remaining `markdownify` calls are the per-chapter `chapter.content | markdownify` in `book-chapter-body.html` plus `book.html`'s part subtitle / intro markdown. Each of those is unique input, so Jekyll's converter cache rarely hits and the kramdown parse itself dominates. Further savings on this axis would need either (a) reusing the already-rendered `_site/<page>.html` instead of re-parsing source markdown for the book, or (b) accepting kramdown's parse cost as the floor and looking elsewhere -- the next-biggest non-library hotspot is `Offlinify#rewrite_html!` at ~2 s of self-time, already heavily optimised (see `_plugins/offlinify.md`).
503537

504538
## Site integrity check
505539

docs/_includes/book-chapter-body.html

Lines changed: 39 additions & 155 deletions
Original file line numberDiff line numberDiff line change
@@ -99,168 +99,52 @@
9999
{%- endunless -%}
100100

101101
{%- comment -%}
102-
Strip the `src="<baseurl>/` prefix that `relative_url` injects when
103-
`jekyll build --baseurl /<repo>` is passed (the CI deploy path uses
104-
this for Pages project sites without a custom domain). With empty
105-
baseurl the prefix collapses to `src="/`, matching the historical
106-
leading-slash strip exactly. Once stripped, image paths inside
107-
book.html are root-of-_site/-relative, which is what both pdfify's
108-
source lookup and pagedjs's render-time fetch expect.
109-
{%- endcomment -%}
110-
{%- assign src_baseurl_strip = 'src="' | append: site.baseurl | append: '/' -%}
111-
112-
{%- assign body = body
113-
| replace: src_baseurl_strip, 'src="'
114-
| replace: p1_search, p1_replace
115-
| replace: p2_search, p2_replace
116-
| replace: p3i12_search, p3i12_replace
117-
| replace: p3i8_search, p3i8_replace
118-
| replace: p3i4_search, p3i4_replace
119-
| replace: p3_search, p3_replace
120-
| replace: p4i16_search, p4i16_replace
121-
| replace: p4i12_search, p4i12_replace
122-
| replace: p4i8_search, p4i8_replace
123-
| replace: p4i4_search, p4i4_replace
124-
| replace: p4_search, p4_replace
125-
| replace: '</span> <span', '</span><span class="w"> </span><span' -%}
126-
127-
{%- assign stripped = body | strip -%}
128-
{%- unless stripped == "" -%}
129-
130-
{%- comment -%}
131-
1.5a: heading depth shift. Source `# Title` becomes the chapter
132-
title; in the book it lives one level below the part divider's
133-
H1, so cascade every heading down by one. `<h6` becomes
134-
`<h7-stub` since there's no real h7; in practice no chapter
135-
content uses h5 or h6, so the stub never appears, but the rule
136-
keeps the cascade consistent.
102+
Body transformation: the `src="<baseurl>/` strip (so pagedjs's
103+
render-time `<img>` fetches resolve against `_site/`), the 12
104+
inter-span whitespace patterns (so pagedjs's page splitter doesn't
105+
mash code-line tokens together at page breaks), the 0-3 heading
106+
shift cascades (1.5a base, 1.6b sub-page, 1.9 chaptered-part
107+
extra), and the per-heading anchor-id prefix injection all run in
108+
one Ruby method via the `book_chapter_transform` filter (see
109+
`_plugins/book-chapter-transform.rb` for the per-step rationale).
137110

138-
`skip_base_heading_shift` skips this pass entirely -- used when
139-
the containing part or chapter divider is silent
140-
(`no_outline_entry`) and the content should occupy the outline
141-
level the divider would have used. 1.6b sub-page and 1.9 extra
142-
shifts still cascade on top when their conditions hold.
143-
{%- endcomment -%}
144-
{%- unless include.skip_base_heading_shift -%}
145-
{%- assign body = body
146-
| replace: '<h6', '<h7-stub'
147-
| replace: '</h6>', '</h7-stub>'
148-
| replace: '<h5', '<h6'
149-
| replace: '</h5>', '</h6>'
150-
| replace: '<h4', '<h5'
151-
| replace: '</h4>', '</h5>'
152-
| replace: '<h3', '<h4'
153-
| replace: '</h3>', '</h4>'
154-
| replace: '<h2', '<h3'
155-
| replace: '</h2>', '</h3>'
156-
| replace: '<h1', '<h2'
157-
| replace: '</h1>', '</h2>' -%}
158-
{%- endunless -%}
111+
`heading_shift_n` is the total number of cascade levels to apply,
112+
precomputed in Liquid from the three flags. The plugin then bumps
113+
every heading by exactly N in a single regex pass, capping at
114+
`h7-stub` for source levels that exceed 6.
159115

160-
{%- comment -%}
161-
1.6b: extra heading shift for sub-pages so they nest one level
162-
deeper than their parent index in the PDF outline. A sub-page's
163-
chapter title (source `#`) ends up as `<h3>` instead of `<h2>`,
164-
sub-page sections as `<h4>` instead of `<h3>`, and so on. The
165-
shift cascades through the same rule set; no real content reaches
166-
h7-stub or beyond after this second pass.
167-
{%- endcomment -%}
168-
{%- if is_sub_page -%}
169-
{%- assign body = body
170-
| replace: '<h6', '<h7-stub'
171-
| replace: '</h6>', '</h7-stub>'
172-
| replace: '<h5', '<h6'
173-
| replace: '</h5>', '</h6>'
174-
| replace: '<h4', '<h5'
175-
| replace: '</h4>', '</h5>'
176-
| replace: '<h3', '<h4'
177-
| replace: '</h3>', '</h4>'
178-
| replace: '<h2', '<h3'
179-
| replace: '</h2>', '</h3>' -%}
180-
{%- endif -%}
116+
`chapter_anchor` is the per-chapter id prefix derived from the
117+
chapter URL (or supplied by the caller via
118+
`chapter_anchor_override`). It's still computed in Liquid because
119+
the article element below also reads it; the plugin just injects
120+
it into every heading-id and `href="#..."` in the body.
121+
{%- endcomment -%}
122+
{%- assign heading_shift_n = 0 -%}
123+
{%- unless include.skip_base_heading_shift -%}{%- assign heading_shift_n = heading_shift_n | plus: 1 -%}{%- endunless -%}
124+
{%- if is_sub_page -%}{%- assign heading_shift_n = heading_shift_n | plus: 1 -%}{%- endif -%}
125+
{%- if include.extra_heading_shift -%}{%- assign heading_shift_n = heading_shift_n | plus: 1 -%}{%- endif -%}
181126

182-
{%- comment -%}
183-
1.9 extra shift: chaptered-part chapters nest under the chapter-
184-
divider's H2, so every chapter body needs one more level of
185-
demotion on top of 1.5a (and 1.6b, if the chapter is a sub-page).
186-
Without this, a class index's source H1 lands at H2 -- the same
187-
outline depth as the chapter divider it should sit beneath -- so
188-
`AmbientProperties class` appears as a sibling of `VBRUN Package`
189-
rather than a child. The shift cascades through the same rule set;
190-
real content stops at source-H4 (member sub-page section), which
191-
after 1.5a + 1.6b + this pass ends up at h7-stub and is excluded
192-
from the outline anyway.
193-
{%- endcomment -%}
194-
{%- if include.extra_heading_shift -%}
195-
{%- assign body = body
196-
| replace: '<h6', '<h7-stub'
197-
| replace: '</h6>', '</h7-stub>'
198-
| replace: '<h5', '<h6'
199-
| replace: '</h5>', '</h6>'
200-
| replace: '<h4', '<h5'
201-
| replace: '</h4>', '</h5>'
202-
| replace: '<h3', '<h4'
203-
| replace: '</h3>', '</h4>'
204-
| replace: '<h2', '<h3'
205-
| replace: '</h2>', '</h3>' -%}
127+
{%- if include.chapter_anchor_override -%}
128+
{%- assign chapter_anchor = include.chapter_anchor_override -%}
129+
{%- else -%}
130+
{%- assign url_path = include.chapter.url | replace: '/', '-' -%}
131+
{%- assign anchor_first_char = url_path | slice: 0, 1 -%}
132+
{%- if anchor_first_char == '-' -%}
133+
{%- assign url_len = url_path.size | minus: 1 -%}
134+
{%- assign url_path = url_path | slice: 1, url_len -%}
206135
{%- endif -%}
207-
208-
{%- comment -%}
209-
1.5b: chapter anchor + per-heading id uniqueness. Derive a stable
210-
chapter anchor from the chapter URL (strip leading and trailing
211-
slashes, replace inner slashes with dashes); the article element
212-
carries the bare anchor as its id so Phase 2 cross-references can
213-
target `#ch-<anchor>` and land at the chapter's first page. Every
214-
heading id and intra-chapter `href="#..."` inside the chapter body
215-
gets that anchor prepended so identical kramdown-generated slugs
216-
(`see-also`, `example`, ...) don't collapse to one outline
217-
destination.
218-
{%- endcomment -%}
219-
{%- if include.chapter_anchor_override -%}
220-
{%- assign chapter_anchor = include.chapter_anchor_override -%}
221-
{%- else -%}
222-
{%- assign url_path = include.chapter.url | replace: '/', '-' -%}
223-
{%- assign anchor_first_char = url_path | slice: 0, 1 -%}
224-
{%- if anchor_first_char == '-' -%}
225-
{%- assign url_len = url_path.size | minus: 1 -%}
226-
{%- assign url_path = url_path | slice: 1, url_len -%}
227-
{%- endif -%}
228-
{%- assign anchor_last_char = url_path | slice: -1, 1 -%}
229-
{%- if anchor_last_char == '-' -%}
230-
{%- assign url_len = url_path.size | minus: 1 -%}
231-
{%- assign url_path = url_path | slice: 0, url_len -%}
232-
{%- endif -%}
233-
{%- assign chapter_anchor = 'ch-' | append: url_path -%}
136+
{%- assign anchor_last_char = url_path | slice: -1, 1 -%}
137+
{%- if anchor_last_char == '-' -%}
138+
{%- assign url_len = url_path.size | minus: 1 -%}
139+
{%- assign url_path = url_path | slice: 0, url_len -%}
234140
{%- endif -%}
141+
{%- assign chapter_anchor = 'ch-' | append: url_path -%}
142+
{%- endif -%}
235143

236-
{%- assign h2_class_prefix = '<h2 class="no_toc" id="' | append: chapter_anchor | append: '-' -%}
237-
{%- assign h2_prefix = '<h2 id="' | append: chapter_anchor | append: '-' -%}
238-
{%- assign h3_class_prefix = '<h3 class="no_toc" id="' | append: chapter_anchor | append: '-' -%}
239-
{%- assign h3_prefix = '<h3 id="' | append: chapter_anchor | append: '-' -%}
240-
{%- assign h4_class_prefix = '<h4 class="no_toc" id="' | append: chapter_anchor | append: '-' -%}
241-
{%- assign h4_prefix = '<h4 id="' | append: chapter_anchor | append: '-' -%}
242-
{%- assign h5_class_prefix = '<h5 class="no_toc" id="' | append: chapter_anchor | append: '-' -%}
243-
{%- assign h5_prefix = '<h5 id="' | append: chapter_anchor | append: '-' -%}
244-
{%- assign h6_class_prefix = '<h6 class="no_toc" id="' | append: chapter_anchor | append: '-' -%}
245-
{%- assign h6_prefix = '<h6 id="' | append: chapter_anchor | append: '-' -%}
246-
{%- assign h7_class_prefix = '<h7-stub class="no_toc" id="' | append: chapter_anchor | append: '-' -%}
247-
{%- assign h7_prefix = '<h7-stub id="' | append: chapter_anchor | append: '-' -%}
248-
{%- assign href_prefix = 'href="#' | append: chapter_anchor | append: '-' -%}
144+
{%- assign body = body | book_chapter_transform: site.baseurl, heading_shift_n, chapter_anchor -%}
249145

250-
{%- assign body = body
251-
| replace: '<h2 class="no_toc" id="', h2_class_prefix
252-
| replace: '<h2 id="', h2_prefix
253-
| replace: '<h3 class="no_toc" id="', h3_class_prefix
254-
| replace: '<h3 id="', h3_prefix
255-
| replace: '<h4 class="no_toc" id="', h4_class_prefix
256-
| replace: '<h4 id="', h4_prefix
257-
| replace: '<h5 class="no_toc" id="', h5_class_prefix
258-
| replace: '<h5 id="', h5_prefix
259-
| replace: '<h6 class="no_toc" id="', h6_class_prefix
260-
| replace: '<h6 id="', h6_prefix
261-
| replace: '<h7-stub class="no_toc" id="', h7_class_prefix
262-
| replace: '<h7-stub id="', h7_prefix
263-
| replace: 'href="#', href_prefix -%}
146+
{%- assign stripped = body | strip -%}
147+
{%- unless stripped == "" -%}
264148

265149
{%- comment -%}
266150
Pick the article class and running-header text. When the caller

0 commit comments

Comments
 (0)