Skip to content

Commit e23ff28

Browse files
garrytanclaude
andauthored
fix(v1.4.1.0): /make-pdf — page numbers, entity escape, Linux fonts (garrytan#1098)
* fix(make-pdf): single-source page numbers via CSS, honor --no-page-numbers end-to-end Two page-number sources were stacking in every PDF: Chromium's native footer and our @page @bottom-center CSS. The CLI flag --page-numbers/--no-page-numbers also never reached the CSS layer, because RenderOptions didn't carry it. Passing --footer-template likewise dropped the "custom footer replaces stock footer" semantic. - orchestrator.ts: browseClient.pdf() gets pageNumbers:false unconditionally. CSS is the single source of truth. Chromium native numbering always off. - render.ts: RenderOptions gains pageNumbers + footerTemplate. render() computes showPageNumbers = pageNumbers !== false && !footerTemplate and passes to printCss(), preserving the prior footerTemplate-suppresses-stock semantic. - print-css.ts: PrintCssOptions.pageNumbers wraps @bottom-center in a conditional matching the existing showConfidential pattern. - types.ts: PreviewOptions.pageNumbers so preview path compiles and matches CLI. - render.test.ts: 7 regression tests covering printCss({pageNumbers}) in isolation AND the full render() data flow incl. footerTemplate path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(make-pdf): decode HTML entities in titles and TOC to prevent double-escape A markdown title like "# Herbert & Garry" rendered as "Herbert &amp;amp; Garry" in <title>, cover block, and TOC entries. marked emits "&amp;" (correct HTML), but extractFirstHeading and extractHeadings only stripTags — leaving the entity intact. That string then flows through escapeHtml, producing the double-encode. - render.ts: new decodeTextEntities helper, distinct from decodeTypographicEntities (which runs on in-pipeline HTML and intentionally preserves &amp;). Covers named entities (lt/gt/quot/apos/39/x27/amp) AND numeric (decimal + hex) so inputs like "&garrytan#169;" or "&#x2014;" don't create the same partial-fix bug. Amp-last ordering prevents double-decode on "&amp;lt;" et al. - Apply in both extractFirstHeading and extractHeadings. extractHeadings feeds buildTocBlock → escapeHtml, so the TOC site had the same bug. - render.test.ts: 8 tests covering the contract — parameterized across &, <, >, ©, — chars; single-escape in <title>/cover; TOC double-escape check; numeric entity decode; smartypants-interacts-with-quotes contract (no raw equality). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(make-pdf): Liberation Sans font fallback for Linux rendering On Linux (Docker, CI, servers), neither Helvetica nor Arial exist. Our CSS stacks were falling through to DejaVu Sans — wider letterforms that look like Verdana, not the intended Helvetica/Faber look. Liberation Sans is the standard metric-compatible Arial clone (SIL OFL 1.1, apt package fonts-liberation). - print-css.ts: all four font stacks (body + @top-center + @bottom-center + @bottom-right CONFIDENTIAL) gain "Liberation Sans" between Helvetica and Arial. File-header docblock updated to reflect the new stack. - .github/docker/Dockerfile.ci: explicit apt-get install fonts-liberation + fontconfig with retry, fc-cache -f, and a verify step that fails the build loud if the font disappears. Playwright's install-deps happens to pull this in today but the dep is implicit and could silently regress. - SKILL.md.tmpl: one-sentence note pointing Linux users at fonts-liberation. - SKILL.md: regenerated via bun run gen:skill-docs --host all (only make-pdf's generated file changed — verified clean diff scope). - render.test.ts: 2 assertions — Liberation Sans in body stack AND in at least one @page margin-box rule (proves all four intended stacks got touched, not just one). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.4.1.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: anonymize test fixtures, drop VC-partner framing - CHANGELOG + render.test.ts fixtures use "Faber & Faber" instead of a personal name. Same regression coverage (ampersand in <title>, cover, TOC, body), neutral subject. - make-pdf/SKILL.md.tmpl description drops the "send to a VC partner, a book agent, a judge, or Rick Rubin's team" line. "Not a draft artifact — a finished artifact" stands on its own without the audience posturing. - SKILL.md regenerated. No functional changes. All 58 make-pdf tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 97584f9 commit e23ff28

11 files changed

Lines changed: 285 additions & 24 deletions

File tree

.github/docker/Dockerfile.ci

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,18 @@ RUN npm i -g @anthropic-ai/claude-code
7272
# Playwright system deps (Chromium) — needed for browse E2E tests
7373
RUN npx playwright install-deps chromium
7474

75+
# Linux has neither Helvetica nor Arial. make-pdf's print CSS stacks fall back
76+
# to Liberation Sans (metric-compatible Arial clone, SIL OFL 1.1) so PDFs don't
77+
# render in DejaVu Sans. playwright install-deps happens to pull this in today,
78+
# but the dep is implicit and could change — install explicitly so upgrades
79+
# can't silently regress rendering.
80+
RUN for i in 1 2 3; do \
81+
apt-get update && apt-get install -y --no-install-recommends fonts-liberation fontconfig && break || \
82+
(echo "fonts-liberation install retry $i/3"; sleep 10); \
83+
done \
84+
&& fc-cache -f \
85+
&& rm -rf /var/lib/apt/lists/*
86+
7587
# Pre-install dependencies (cached layer — only rebuilds when package.json changes)
7688
COPY package.json /workspace/
7789
WORKDIR /workspace
@@ -84,7 +96,9 @@ RUN npx playwright install chromium \
8496

8597
# Verify everything works
8698
RUN bun --version && node --version && claude --version && jq --version && gh --version \
87-
&& npx playwright --version
99+
&& npx playwright --version \
100+
&& fc-match "Liberation Sans" | grep -qi "Liberation" \
101+
|| (echo "ERROR: fonts-liberation not installed — make-pdf PDFs will render in DejaVu Sans" && exit 1)
88102

89103
# At runtime: checkout overwrites /workspace, but node_modules persists
90104
# if we move it out of the way and symlink back

CHANGELOG.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,60 @@
11
# Changelog
22

3+
## [1.5.1.0] - 2026-04-20
4+
5+
## **Three visible bugs in v1.4.0.0 /make-pdf, all fixed.**
6+
7+
Page footers showed "6 of 8" twice on every page because Chromium's native footer and our print CSS were both rendering numbers. A markdown title containing `&` rendered as `Faber &amp;amp; Faber` in `<title>` and TOC entries, because the extractors stripped tags but forgot to decode entities. On Linux (Docker, CI, servers), body text fell through to DejaVu Sans because neither Helvetica nor Arial is installed by default, and nothing in the font stack caught that. This release fixes all three and extends the fix beyond the obvious symptom each time.
8+
9+
### The numbers that matter
10+
11+
All three bugs were caught and expanded in review before any code was written. The plan went through `/plan-eng-review` (Claude), then `/codex` (outside voice), then implementation. Source: `.github/docker/Dockerfile.ci` (Linux fonts), `make-pdf/test/render.test.ts` (17 new tests), `git log main..HEAD` (this branch).
12+
13+
| Surface | Before (v1.4.0.0) | After (v1.5.1.0) |
14+
|---------|-------------------|-----------------|
15+
| Page footer | "6 of 8" stacked twice | "6 of 8" once |
16+
| `# Faber & Faber` in `<title>` | `Faber &amp;amp; Faber` | `Faber &amp; Faber` |
17+
| TOC entry with `&` | Double-escaped | Single-escaped |
18+
| `&#169;` (copyright) in H1 | Broken | Decodes to `©` |
19+
| `--no-page-numbers` CLI flag | Silently did nothing | Actually suppresses page numbers |
20+
| `--footer-template` | Layered CSS page numbers on top | Custom footer wins cleanly |
21+
| Linux PDF body font | DejaVu Sans (wrong) | Liberation Sans (metric-compatible Helvetica clone) |
22+
23+
| Review layer | Findings | Outcome |
24+
|--------------|----------|---------|
25+
| `/plan-eng-review` (Claude) | 1 architectural gap | expanded Bug 1 scope to include CSS-side conditional |
26+
| `/codex` (outside voice) | 11 findings | 11 incorporated (data flow, TOC site, decoder collision, footer semantic, test contract, scope boundaries, font dependency) |
27+
| Cross-model agreement rate | ~30% | Codex found 7 issues Claude's eng review missed by staying too high-altitude |
28+
29+
The agreement rate is the tell. One reviewer was not enough on this diff. Codex caught that my original "one-line fix" for Bug 1 would have left the `--no-page-numbers` CLI flag silently dead, because `RenderOptions` didn't carry `pageNumbers` and the orchestrator's `render()` call didn't pass it. Without the second opinion, the CLI flag ships broken again.
30+
31+
### What this means for anyone generating PDFs
32+
33+
Page numbers are now controlled by one flag from CLI to CSS, with the custom-footer semantic restored. Titles, cover pages, and TOC entries render HTML entities correctly, including numeric entities like `&#169;`. Linux environments no longer need to know about fonts-liberation — the Dockerfile installs it explicitly and a build-time `fc-match` check fails the image if the font disappears. Run `bun run dev make-pdf <file.md> --cover --toc` on Mac, and now also inside Docker, and the output looks the same.
34+
35+
### Itemized changes
36+
37+
#### Fixed
38+
39+
- **Page numbers no longer render twice on every page.** Chromium's native footer used to layer on top of our `@page @bottom-center` CSS. Now CSS is the single source of truth; Chromium native numbering is off unconditionally.
40+
- **`--no-page-numbers` works end-to-end.** The CLI flag now reaches the CSS layer via `RenderOptions.pageNumbers`. Previously it died at the orchestrator and the CSS kept rendering numbers regardless.
41+
- **`--footer-template` cleanly replaces the stock footer.** Passing a custom footer now also suppresses the CSS page numbers, preserving the original "custom footer wins" semantic that existed before Bug 1 collided with it.
42+
- **HTML entities in titles, cover pages, and TOC entries render correctly.** A markdown heading like `# Faber & Faber` renders as `Faber &amp; Faber` in `<title>` (single-escaped) instead of `Faber &amp;amp; Faber` (double-escaped). Covers both extractor call sites: `extractFirstHeading` (title + cover) and `extractHeadings` (TOC).
43+
- **Numeric HTML entities decode too.** `&#169;` in an H1 now renders as `©` in the PDF title. Decimal and hex numeric entities both supported.
44+
- **Linux PDFs render in Liberation Sans instead of DejaVu Sans.** Font stacks in all four print-CSS slots (body, running header, page number, CONFIDENTIAL label) now include `"Liberation Sans"` between Helvetica and Arial. Metric-compatible, SIL OFL 1.1, installs via `fonts-liberation`.
45+
46+
#### Changed
47+
48+
- `.github/docker/Dockerfile.ci` installs `fonts-liberation` + `fontconfig` explicitly with retries, runs `fc-cache -f`, and verifies `fc-match "Liberation Sans"` in the final build step. Previously relied on Playwright's `install-deps` pulling it in transitively, which could silently regress on upgrade.
49+
- `SKILL.md.tmpl` documents the Linux font dependency for users who install outside CI/Docker.
50+
51+
#### For contributors
52+
53+
- New helper `decodeTextEntities` in `render.ts` (distinct from existing `decodeTypographicEntities`, which intentionally preserves `&amp;` in pipeline HTML where `&amp;amp;` can be legitimate). Use the new one when extracting plain text destined for `<title>`, cover, or TOC.
54+
- `PrintCssOptions.pageNumbers` wraps the `@bottom-center` rule in a conditional matching the existing `showConfidential` pattern. Thread `pageNumbers` through `RenderOptions` and forward from `orchestrator.ts` into both `render()` call sites (generate + preview).
55+
- 17 new tests in `make-pdf/test/render.test.ts`: `printCss` pageNumbers isolation (3), `render()` data flow with footerTemplate (4), parameterized entity contracts across `&`, `<`, `>`, `©`, `` (5), `<title>` exact single-escape assertion, TOC single-escape, numeric entity decode, smartypants-interacts contract, Liberation Sans body + @page box coverage (2).
56+
- Known test gaps (small, future PR): hex numeric entity path, amp-last ordering with double-encoded input, SKILL.md Linux note content assertion. Orchestrator → `browseClient.pdf({pageNumbers: false})` and orchestrator → `render()` forwarding are covered transitively via the CSS end-to-end tests, not asserted directly.
57+
358
## [1.5.0.0] - 2026-04-20
459

560
## **Your sidebar agent now defends itself against prompt injection.**

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.5.0.0
1+
1.5.1.0

make-pdf/SKILL.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,9 @@ version: 1.0.0
55
description: |
66
Turn any markdown file into a publication-quality PDF. Proper 1in margins,
77
intelligent page breaks, page numbers, cover pages, running headers, curly
8-
quotes and em dashes, clickable TOC, diagonal DRAFT watermark. Output you'd
9-
send to a VC partner, a book agent, a judge, or Rick Rubin's team. Not a
10-
draft artifact — a finished artifact. Use when asked to "make a PDF",
11-
"export to PDF", "turn this markdown into a PDF", or "generate a document".
12-
(gstack)
8+
quotes and em dashes, clickable TOC, diagonal DRAFT watermark. Not a draft
9+
artifact — a finished artifact. Use when asked to "make a PDF", "export to
10+
PDF", "turn this markdown into a PDF", or "generate a document". (gstack)
1311
Voice triggers (speech-to-text aliases): "make this a pdf", "make it a pdf", "export to pdf", "turn this into a pdf", "turn this markdown into a pdf", "generate a pdf", "make a pdf from", "pdf this markdown".
1412
triggers:
1513
- markdown to pdf
@@ -470,6 +468,10 @@ left-aligned body, Helvetica throughout, curly quotes and em dashes, optional
470468
cover page and clickable TOC, diagonal DRAFT watermark when you need it.
471469
Copy-paste from the PDF produces clean words, never "S a i l i n g".
472470

471+
On Linux, install `fonts-liberation` for correct rendering — Helvetica and Arial
472+
aren't present by default, and Liberation Sans is the standard metric-compatible
473+
fallback. CI and Docker builds install it automatically via Dockerfile.ci.
474+
473475
## MAKE-PDF SETUP (run this check BEFORE any make-pdf command)
474476

475477
```bash

make-pdf/SKILL.md.tmpl

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,9 @@ version: 1.0.0
55
description: |
66
Turn any markdown file into a publication-quality PDF. Proper 1in margins,
77
intelligent page breaks, page numbers, cover pages, running headers, curly
8-
quotes and em dashes, clickable TOC, diagonal DRAFT watermark. Output you'd
9-
send to a VC partner, a book agent, a judge, or Rick Rubin's team. Not a
10-
draft artifact — a finished artifact. Use when asked to "make a PDF",
11-
"export to PDF", "turn this markdown into a PDF", or "generate a document".
12-
(gstack)
8+
quotes and em dashes, clickable TOC, diagonal DRAFT watermark. Not a draft
9+
artifact — a finished artifact. Use when asked to "make a PDF", "export to
10+
PDF", "turn this markdown into a PDF", or "generate a document". (gstack)
1311
voice-triggers:
1412
- "make this a pdf"
1513
- "make it a pdf"
@@ -39,6 +37,10 @@ left-aligned body, Helvetica throughout, curly quotes and em dashes, optional
3937
cover page and clickable TOC, diagonal DRAFT watermark when you need it.
4038
Copy-paste from the PDF produces clean words, never "S a i l i n g".
4139

40+
On Linux, install `fonts-liberation` for correct rendering — Helvetica and Arial
41+
aren't present by default, and Liberation Sans is the standard metric-compatible
42+
fallback. CI and Docker builds install it automatically via Dockerfile.ci.
43+
4244
{{MAKE_PDF_SETUP}}
4345

4446
## Core patterns

make-pdf/src/orchestrator.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ export async function generate(opts: GenerateOptions): Promise<string> {
9494
confidential: opts.confidential,
9595
pageSize: opts.pageSize,
9696
margins: opts.margins,
97+
pageNumbers: opts.pageNumbers,
98+
footerTemplate: opts.footerTemplate,
9799
});
98100
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
99101

@@ -136,7 +138,10 @@ export async function generate(opts: GenerateOptions): Promise<string> {
136138
marginLeft: opts.marginLeft ?? opts.margins ?? "1in",
137139
headerTemplate: opts.headerTemplate,
138140
footerTemplate: opts.footerTemplate,
139-
pageNumbers: opts.pageNumbers !== false && !opts.footerTemplate,
141+
// CSS is the single source of truth for page numbers (see print-css.ts
142+
// @bottom-center). Chromium's native numbering always off to avoid double
143+
// footers. The CSS layer honors pageNumbers + footerTemplate via render().
144+
pageNumbers: false,
140145
tagged: opts.tagged !== false,
141146
outline: opts.outline !== false,
142147
printBackground: !!opts.watermark,
@@ -183,6 +188,7 @@ export async function preview(opts: PreviewOptions): Promise<string> {
183188
watermark: opts.watermark,
184189
noChapterBreaks: opts.noChapterBreaks,
185190
confidential: opts.confidential,
191+
pageNumbers: opts.pageNumbers,
186192
});
187193
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
188194

make-pdf/src/print-css.ts

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@
55
* Mirror those CSS rules here. The HTML references were approved via
66
* /plan-design-review with explicit design decisions locked in the plan:
77
*
8-
* - Helvetica only (system font, no bundled webfonts — dodges the
9-
* per-glyph Tj bug that breaks copy-paste extraction).
8+
* - Helvetica first, with Liberation Sans as a metric-compatible Linux
9+
* fallback (Helvetica and Arial aren't installed on most Linux distros;
10+
* Liberation Sans ships via the fonts-liberation package and Playwright's
11+
* install-deps). No bundled webfonts — dodges the per-glyph Tj bug that
12+
* breaks copy-paste extraction.
1013
* - All paragraphs flush-left. No first-line indent, no justify, no
1114
* p+p indent. text-align: left everywhere. 12pt margin-bottom.
1215
* - Cover page has the same 1in margins as every other page. No flexbox
@@ -15,8 +18,8 @@
1518
* - `@page :first` suppresses running header/footer but does NOT override
1619
* the 1in margin.
1720
* - No <link>, no external CSS/fonts — everything inlined.
18-
* - CJK fallback: Helvetica, Arial, Hiragino Kaku Gothic ProN, Noto Sans
19-
* CJK JP, Microsoft YaHei, sans-serif.
21+
* - CJK fallback: Helvetica, Liberation Sans, Arial, Hiragino Kaku Gothic
22+
* ProN, Noto Sans CJK JP, Microsoft YaHei, sans-serif.
2023
*/
2124

2225
export interface PrintCssOptions {
@@ -37,6 +40,11 @@ export interface PrintCssOptions {
3740

3841
// Margins (default 1in)
3942
margins?: string;
43+
44+
// Whether to render "N of M" page numbers in the @page @bottom-center rule.
45+
// Default true. Set false to suppress CSS numbering (used when the caller
46+
// supplies a custom Chromium footerTemplate, or when --no-page-numbers).
47+
pageNumbers?: boolean;
4048
}
4149

4250
/**
@@ -69,17 +77,20 @@ export function printCss(opts: PrintCssOptions = {}): string {
6977
function pageRules(size: string, margin: string, opts: PrintCssOptions): string {
7078
const runningHeader = escapeCssString(opts.runningHeader ?? "");
7179
const showConfidential = opts.confidential !== false;
80+
const showPageNumbers = opts.pageNumbers !== false;
7281

7382
return [
7483
`@page {`,
7584
` size: ${size};`,
7685
` margin: ${margin};`,
7786
runningHeader
78-
? ` @top-center { content: "${runningHeader}"; font-family: Helvetica, Arial, sans-serif; font-size: 9pt; color: #666; }`
87+
? ` @top-center { content: "${runningHeader}"; font-family: Helvetica, "Liberation Sans", Arial, sans-serif; font-size: 9pt; color: #666; }`
88+
: ``,
89+
showPageNumbers
90+
? ` @bottom-center { content: counter(page) " of " counter(pages); font-family: Helvetica, "Liberation Sans", Arial, sans-serif; font-size: 9pt; color: #666; }`
7991
: ``,
80-
` @bottom-center { content: counter(page) " of " counter(pages); font-family: Helvetica, Arial, sans-serif; font-size: 9pt; color: #666; }`,
8192
showConfidential
82-
? ` @bottom-right { content: "CONFIDENTIAL"; font-family: Helvetica, Arial, sans-serif; font-size: 8pt; color: #aaa; letter-spacing: 0.05em; }`
93+
? ` @bottom-right { content: "CONFIDENTIAL"; font-family: Helvetica, "Liberation Sans", Arial, sans-serif; font-size: 8pt; color: #aaa; letter-spacing: 0.05em; }`
8394
: ``,
8495
`}`,
8596
``,
@@ -96,7 +107,7 @@ function rootTypography(): string {
96107
return [
97108
`html { lang: en; }`,
98109
`body {`,
99-
` font-family: Helvetica, Arial, "Hiragino Kaku Gothic ProN", "Noto Sans CJK JP", "Microsoft YaHei", sans-serif;`,
110+
` font-family: Helvetica, "Liberation Sans", Arial, "Hiragino Kaku Gothic ProN", "Noto Sans CJK JP", "Microsoft YaHei", sans-serif;`,
100111
` font-size: 11pt;`,
101112
` line-height: 1.5;`,
102113
` color: #111;`,

make-pdf/src/render.ts

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,11 @@ export interface RenderOptions {
3434
// Page layout
3535
pageSize?: "letter" | "a4" | "legal" | "tabloid";
3636
margins?: string;
37+
38+
// Footer behavior. pageNumbers defaults to true. When footerTemplate is set,
39+
// CSS page numbers are suppressed so the custom Chromium footer wins cleanly.
40+
pageNumbers?: boolean;
41+
footerTemplate?: string;
3742
}
3843

3944
export interface RenderResult {
@@ -74,6 +79,10 @@ export function render(opts: RenderOptions): RenderResult {
7479
const derivedDate = opts.date ?? formatToday();
7580

7681
// 5. Build CSS
82+
// CSS is the single source of truth for page numbers (Chromium native
83+
// numbering is always off in orchestrator). If the caller supplied a custom
84+
// footerTemplate, suppress CSS page numbers too so their footer wins.
85+
const showPageNumbers = opts.pageNumbers !== false && !opts.footerTemplate;
7786
const cssOptions: PrintCssOptions = {
7887
cover: opts.cover,
7988
toc: opts.toc,
@@ -83,6 +92,7 @@ export function render(opts: RenderOptions): RenderResult {
8392
runningHeader: derivedTitle,
8493
pageSize: opts.pageSize,
8594
margins: opts.margins,
95+
pageNumbers: showPageNumbers,
8696
};
8797
const css = printCss(cssOptions);
8898

@@ -278,7 +288,7 @@ function extractHeadings(html: string): Array<{ level: number; text: string }> {
278288
let match;
279289
while ((match = re.exec(html)) !== null) {
280290
const level = parseInt(match[1].slice(1), 10);
281-
const text = stripTags(match[2]).trim();
291+
const text = decodeTextEntities(stripTags(match[2]).trim());
282292
if (text) headings.push({ level, text });
283293
}
284294
return headings;
@@ -314,7 +324,32 @@ function wrapChaptersByH1(html: string): string {
314324

315325
function extractFirstHeading(html: string): string | null {
316326
const m = html.match(/<h1\b[^>]*>([\s\S]*?)<\/h1>/i);
317-
return m ? stripTags(m[1]).trim() : null;
327+
return m ? decodeTextEntities(stripTags(m[1]).trim()) : null;
328+
}
329+
330+
/**
331+
* Decode HTML entities in plain text extracted from rendered HTML. Distinct
332+
* from decodeTypographicEntities (which runs on in-pipeline HTML and preserves
333+
* &amp; because &amp;amp; can be legitimate there). This runs on text destined
334+
* for <title>, cover, and TOC entries where &amp; MUST become & or escapeHtml
335+
* produces &amp;amp;.
336+
*
337+
* Amp-last ordering: input "&amp;#169;" decodes to "&#169;" in the named pass,
338+
* then the numeric pass decodes "&#169;" to "©". Decoding &amp; first would
339+
* produce "&#169;" and the numeric pass would consume it — different end state
340+
* but risks double-decode on inputs like "&amp;lt;".
341+
*/
342+
function decodeTextEntities(s: string): string {
343+
return s
344+
.replace(/&lt;/g, "<")
345+
.replace(/&gt;/g, ">")
346+
.replace(/&quot;/g, '"')
347+
.replace(/&#39;/g, "'")
348+
.replace(/&apos;/g, "'")
349+
.replace(/&#x27;/g, "'")
350+
.replace(/&#(\d+);/g, (_, n) => String.fromCodePoint(parseInt(n, 10)))
351+
.replace(/&#x([0-9a-fA-F]+);/g, (_, n) => String.fromCodePoint(parseInt(n, 16)))
352+
.replace(/&amp;/g, "&");
318353
}
319354

320355
function stripTags(html: string): string {

make-pdf/src/types.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ export interface PreviewOptions {
6363
watermark?: string;
6464
noChapterBreaks?: boolean;
6565
confidential?: boolean;
66+
pageNumbers?: boolean;
6667
allowNetwork?: boolean;
6768
title?: string;
6869
author?: string;

0 commit comments

Comments
 (0)