Remove book.html from _site, where it didn't belong.

KubaO · KubaO · commit f35538aa1011 · 2026-05-17T16:07:52.000+02:00
diff --git a/WIP.md b/WIP.md
@@ -428,7 +428,7 @@ Python scripts are reserved for non-render concerns: one-off content conversion
 
 From `docs/`:
 
-- `bundle exec jekyll build` (or `build.bat`) — builds three trees in a single Jekyll run: the online copy at `_site/`, a `file://`-browsable copy at `_site-offline/`, and the sparse pagedjs source at `_site-pdf/`. The offline pass (`_plugins/offlinify.rb`, activated by `also_build_offline: true` in `_config.yml`) adds ~3-5s and the PDF pass (`_plugins/pdfify.rb`, activated by `also_build_pdf: true`) adds <1s on top of the normal ~13s build. The PDF plugin copies `_site/book.html` (the concatenated chapter document rendered via `_layouts/book-combined.html`) verbatim into `_site-pdf/`, along with `assets/css/print.css`, `assets/css/rouge.css`, and every relative `<img src=>` target -- just what pagedjs needs to render the book PDF. After Jekyll's WRITE phase, the offline plugin walks `_site/`, copies binary assets verbatim into `_site-offline/`, and for each HTML and CSS file rewrites every root-absolute `href` / `src` / `url()` to a page-relative path with the resolved file extension (`/FAQ` → `../../FAQ.html`, `/Tutorials/CEF/` → `../../Tutorials/CEF/index.html`). It also patches the offline copy of `assets/js/just-the-docs.js` in two places — `navLink()` to match the active nav entry by resolved DOM `link.href` rather than `document.location.pathname` (the upstream pathname-vs-attribute compare returns no match under `file://`, leaving the sidebar with no `.active` class so the nav appears collapsed on every navigation), and `initSearch()` to read the lunr index from `window.SEARCH_DATA` rather than fetching `search-data.json` over `XMLHttpRequest` (XHR to `file://` resources is blocked by browsers; classic `<script src=>` is not). To support that, the plugin (a) generates `_site-offline/assets/js/search-data.js` once per build by wrapping the rendered `search-data.json` in `window.SEARCH_DATA = {...};`, and (b) injects two `<script>` tags per page right before `just-the-docs.js`: one that sets `window.OFFLINE_SITE_ROOT` to the per-page relative prefix to the offline site root, and one that loads `search-data.js`. The patched `initSearch()` rewrites every `doc.url` from a root-absolute permalink (`/tB/Core/Const`) to a page-relative path (`<OFFLINE_SITE_ROOT>tB/Core/Const.html`) so search-result clicks land on the actual file regardless of which page the user is on.
+- `bundle exec jekyll build` (or `build.bat`) — builds three trees in a single Jekyll run: the online copy at `_site/`, a `file://`-browsable copy at `_site-offline/`, and the sparse pagedjs source at `_site-pdf/`. The offline pass (`_plugins/offlinify.rb`, activated by `also_build_offline: true` in `_config.yml`) adds ~3-5s and the PDF pass (`_plugins/pdfify.rb`, activated by `also_build_pdf: true`) adds <1s on top of the normal ~13s build. The PDF plugin copies `_site/book.html` (the concatenated chapter document rendered via `_layouts/book-combined.html`) verbatim into `_site-pdf/`, along with `assets/css/print.css`, `assets/css/rouge.css`, and every relative `<img src=>` target -- just what pagedjs needs to render the book PDF. After the copy, the plugin deletes `_site/book.html`: the concatenated document is a build artifact for the PDF render path alone, not a public page on the online site. The companion `offline_exclude: [..., book.html]` entry in `_config.yml` keeps `offlinify.rb` from copying it into `_site-offline/`. The two safeguards are independent -- the exclude pattern works regardless of whether offlinify walks `_site/` before or after pdfify's delete, and pdfify's delete works regardless of whether offlinify is enabled. After Jekyll's WRITE phase, the offline plugin walks `_site/`, copies binary assets verbatim into `_site-offline/`, and for each HTML and CSS file rewrites every root-absolute `href` / `src` / `url()` to a page-relative path with the resolved file extension (`/FAQ` → `../../FAQ.html`, `/Tutorials/CEF/` → `../../Tutorials/CEF/index.html`). It also patches the offline copy of `assets/js/just-the-docs.js` in two places — `navLink()` to match the active nav entry by resolved DOM `link.href` rather than `document.location.pathname` (the upstream pathname-vs-attribute compare returns no match under `file://`, leaving the sidebar with no `.active` class so the nav appears collapsed on every navigation), and `initSearch()` to read the lunr index from `window.SEARCH_DATA` rather than fetching `search-data.json` over `XMLHttpRequest` (XHR to `file://` resources is blocked by browsers; classic `<script src=>` is not). To support that, the plugin (a) generates `_site-offline/assets/js/search-data.js` once per build by wrapping the rendered `search-data.json` in `window.SEARCH_DATA = {...};`, and (b) injects two `<script>` tags per page right before `just-the-docs.js`: one that sets `window.OFFLINE_SITE_ROOT` to the per-page relative prefix to the offline site root, and one that loads `search-data.js`. The patched `initSearch()` rewrites every `doc.url` from a root-absolute permalink (`/tB/Core/Const`) to a page-relative path (`<OFFLINE_SITE_ROOT>tB/Core/Const.html`) so search-result clicks land on the actual file regardless of which page the user is on.
 - `bundle exec jekyll serve` (or `serve.bat`) — local server at `localhost:4000`. Note that `_site-offline/` is also produced on the initial build, but live-reload only updates `_site/`; manual rebuild needed for offline updates.
 - `check.bat` — link check (offline Lychee against `_site/`).
 - `book.bat` — renders the PDF from `_site-pdf/book.html` via `pagedjs-cli` into `_pdf/book.pdf`. Run `build.bat` first to populate `_site-pdf/`.
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -146,8 +146,10 @@ also_build_pdf: true
 # Patterns for files Jekyll produces in _site/ that have no purpose
 # in the offline tree -- Pages / crawler metadata, jekyll-redirect-
 # from output, Windows batch scripts Jekyll picks up from the source
-# directory. The online _site/ keeps them; offlinify strips them
-# from _site-offline/.
+# directory, and the concatenated `book.html` that exists only to
+# feed `_plugins/pdfify.rb` (which copies it to _site-pdf/ and deletes
+# the _site/ copy). The online _site/ keeps the metadata files;
+# offlinify strips them from _site-offline/.
 #
 # Patterns are File.fnmatch-style with FNM_PATHNAME, matched against
 # each file's path relative to the site root. `*` does NOT cross
@@ -157,6 +159,7 @@ offline_exclude:
   - CNAME
   - robots.txt
   - sitemap.xml
+  - book.html
 
 # Excludes for both the build (Jekyll won't try to process these as
 # source) and the watcher (`jekyll serve` won't trigger a rebuild
diff --git a/docs/_plugins/pdfify.md b/docs/_plugins/pdfify.md
@@ -33,7 +33,7 @@ After Jekyll's WRITE phase completes, the hook fires `Pdfify.run(site, source_ro
 
 2. **Wipe and recreate `<dest_root>/`.** Unlike `offlinify.rb`, which empties the directory contents but keeps the directory itself in place to keep the jekyll-watcher happy, `pdfify.rb` deletes the whole tree. The PDF pass doesn't need watcher friendliness — nobody runs `jekyll serve` and refreshes a `_site-pdf/` page in their browser. The wipe is to ensure no stale images linger after source pages are deleted or renamed.
 
-3. **Copy `book.html`** verbatim into the destination. The byte-equivalence to `_site/book.html` is intentional — both files are produced by the same Liquid pass under the same `book-combined` layout, so pagedjs would see identical input whether it ran against `_site/book.html` or `_site-pdf/book.html`. The plugin doesn't rewrite anything inside `book.html`; relative paths like `Features/Images/foo.png` resolve correctly because the destination tree mirrors the source layout exactly.
+3. **Copy `book.html`** verbatim into the destination. The plugin doesn't rewrite anything inside `book.html`; relative paths like `Features/Images/foo.png` resolve correctly because the destination tree mirrors the source layout exactly.
 
 4. **Copy `REQUIRED_CSS`.** Two files in fixed positions:
 
@@ -46,7 +46,9 @@ After Jekyll's WRITE phase completes, the hook fires `Pdfify.run(site, source_ro
 
 5. **Extract and copy every relative `<img src=>` target.** Scan `book.html` with `IMG_SRC_RE` (see [What gets copied](#what-gets-copied) for the regex), deduplicate, and copy each one. Missing source files increment a `skipped` counter that lands in the summary log line.
 
-6. **Log the summary:**
+6. **Delete `<source_root>/book.html`.** The concatenated document exists in `_site/` only as a hand-off between Jekyll's render pass and this plugin — it's not a public page on the online site. The companion exclusion in `_config.yml` (`offline_exclude: [..., book.html]`) keeps `offlinify.rb` from copying it into `_site-offline/`; the delete here clears it from `_site/` itself. The two safeguards are independent: the exclude pattern fires whether `offlinify.rb` walks `_site/` before or after pdfify's delete (and still applies when `also_build_pdf: false`, when pdfify never runs at all), and pdfify's delete fires whether or not offlinify is enabled. No hook-ordering assumption is required.
+
+7. **Log the summary:**
 
    ```
    Pdfify: wrote .../_site-pdf -- copied 84 file(s) (86 image(s), 5 missing)
@@ -78,7 +80,7 @@ Matches `src="..."` (or single-quoted) where the URL is **page-relative** — do
 
 Captures: group 1 is the quote character (so the trailing quote in the pattern matches the same character), group 2 is the URL.
 
-Each match has its `?query` and `#fragment` stripped — images don't need them, and they would confuse the `File.file?` existence probe — then the path is deduplicated via a `Set` and copied if present in `<source_root>/`. The destination layout mirrors the source paths exactly, so the same `<img src="Features/Images/foo.png">` resolves correctly relative to `book.html` in both `_site/` and `_site-pdf/`.
+Each match has its `?query` and `#fragment` stripped — images don't need them, and they would confuse the `File.file?` existence probe — then the path is deduplicated via a `Set` and copied if present in `<source_root>/`. The destination layout mirrors the source paths exactly, so an `<img src="Features/Images/foo.png">` reference inside `_site-pdf/book.html` resolves to `_site-pdf/Features/Images/foo.png` — the same shape the source `_site/book.html` had against `_site/` before pdfify deleted it.
 
 ## File layout
 
diff --git a/docs/_plugins/pdfify.rb b/docs/_plugins/pdfify.rb
@@ -41,6 +41,16 @@
 # skipped. The output tree mirrors the source paths exactly so book.html
 # can stay byte-identical -- no URL rewriting is needed.
 #
+# After the copy, `<site.dest>/book.html` is deleted: the concatenated
+# document is a build artifact for this plugin alone, not a public page
+# on the online site. The `offline_exclude` entry in _config.yml keeps
+# it out of the offline tree independently. The two safeguards do not
+# rely on each other: the exclude pattern fires whether `offlinify.rb`
+# walks _site/ before or after pdfify's delete (and works even when
+# `also_build_pdf: false`, when pdfify never runs at all), and pdfify's
+# delete fires whether or not offlinify is enabled. No hook ordering
+# is assumed.
+#
 # === Compatibility ===
 #
 # Reads `site.dest` and `site.config['also_build_pdf']`. Writes a fresh
@@ -111,6 +121,16 @@ def self.run(site, source_root, dest_root)
       end
     end
 
+    # book.html exists in source/ (the online _site/) only as a
+    # build artifact for this plugin -- it's not a page on the
+    # published site and it isn't part of the offline tree (the
+    # `offline_exclude` entry in _config.yml keeps offlinify from
+    # copying it). Remove it now that we've consumed it, so a stale
+    # copy doesn't sit under _site/ between builds and so a serve-
+    # mode `localhost:4000/book.html` correctly 404s instead of
+    # leaking the concatenated document.
+    book_src.delete
+
     Jekyll.logger.info "Pdfify:", "wrote #{dest_root} -- copied #{copied} file(s) (#{image_paths.size} image(s)#{skipped.zero? ? "" : ", #{skipped} missing"})"
   end