Skip to content

Commit ed43fe0

Browse files
committed
feat: vector viewport overhaul — fit/zoom/rotation/cursor/font cache + hybrid stamp extraction
- Layout: removed #pdf-container padding so the canvas fills the viewport edge-to-edge; fitToViewport() no longer subtracts a 20px margin - Centralized fit math in computeFitZoom(); fitWidth/fitPage/fitToViewport share one helper across vector + legacy modes - Reactive cursor (js/ui/cursor.js): single createMemo derived from app state; tools no longer touch canvas.style.cursor (~200 lines deleted) - Single wheel handler in navigation-events.js for zoom + pan + page-nav-at-edges; discrete zoom steps with trackpad accumulation - clampAndCenter() invariant: page is centered on any axis where it fits - ResizeObserver on #pdf-container re-anchors the world point at canvas center on layout changes (panel toggle, window resize, etc) - Document-scoped font cache: FontRegistry keyed by global font ObjectId, amortized across pages within a document; DocumentHandleCache in Tauri layer keeps the parsed lopdf::Document alive across IPC commands - Rust extract_text + extract_text_batch commands; rayon par_iter for extract_draw_commands_batch / extract_text_spans_batch / page_dimensions_all - Page rotation in vector mode: open-pdf-render reads /Rotate (with /Parent inheritance), applies the rotation matrix to draw commands, combines with the user's per-page rotation map; cache key includes rotation - Hybrid stamp extraction: pdf-lib XObject path first (no annotation bake-in), PDF.js render+crop fallback only when pdf-lib can't decode - Annotations key off viewport.pageNum (visible) instead of doc.currentPage (intended) so they don't flash on the wrong page during slow extracts - updateActiveThumbnail() syncs selectedPages so the thumbnail panel highlight follows any navigation method - Status bar zoom +/- and input + ribbon View buttons wired to viewport zoom helpers instead of the legacy doc.scale path - Status bar prev/next go through goToPage() so the thumbnail tracks - setPage() preserves zoom on same-document page changes (detects new file via path); only fits on a genuinely new document - Text highlights on dedicated #text-highlight-canvas with CSS mix-blend-mode: multiply (don't hide underlying text) - Cursor stays "grabbing" through middle-click drag (handled by the reactive cursor system) - Version 1.42.0 → 1.43.0
1 parent 96d0b68 commit ed43fe0

37 files changed

Lines changed: 2470 additions & 395 deletions

open-pdf-render/Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

open-pdf-render/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ lopdf = "0.34"
1010
tiny-skia = "0.11"
1111
image = { version = "0.25", default-features = false, features = ["jpeg", "png"] }
1212
ttf-parser = "0.25"
13+
rayon = "1"

open-pdf-render/examples/inspect_page.rs

Lines changed: 409 additions & 0 deletions
Large diffs are not rendered by default.

open-pdf-render/src/fonts.rs

Lines changed: 80 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
use std::collections::HashMap;
2-
use lopdf::{Dictionary, Document, Object};
2+
use std::sync::Arc;
3+
use lopdf::{Dictionary, Document, Object, ObjectId};
34
use crate::encoding;
45
use crate::font_parser::{self, ParsedFont};
56

@@ -17,9 +18,22 @@ pub struct FontEntry {
1718
pub to_unicode: HashMap<u8, char>,
1819
}
1920

20-
/// Registry that caches font lookups per font name.
21+
/// Registry that caches parsed fonts by their global PDF ObjectId.
22+
///
23+
/// IMPORTANT: this is keyed by `ObjectId`, NOT by the page-local font name
24+
/// (e.g. "F1"), because the same name can refer to different fonts on
25+
/// different pages. ObjectId is the only stable global identifier for an
26+
/// indirect font dictionary inside a single PDF document.
27+
///
28+
/// Inline (non-referenced) font dictionaries cannot be cached because they
29+
/// have no stable identity — they get parsed on every lookup. This is rare
30+
/// in practice; almost all PDFs share fonts via indirect references.
31+
///
32+
/// FontRegistry lives on `DocumentHandle` and survives across page renders,
33+
/// so the expensive glyph-outline extraction (which dominates per-page cost
34+
/// for text-heavy pages) only runs the first time a font is encountered.
2135
pub struct FontRegistry {
22-
fonts: HashMap<String, FontEntry>,
36+
fonts: HashMap<ObjectId, Arc<FontEntry>>,
2337
}
2438

2539
impl FontRegistry {
@@ -29,21 +43,45 @@ impl FontRegistry {
2943
}
3044
}
3145

32-
/// Look up a font by name from the page Resources/Font dictionary.
33-
/// Caches results so each font is only parsed once.
34-
pub fn get_font<'a>(
35-
&'a mut self,
46+
/// Look up a font by its page-local name. Resolves the name through the
47+
/// page Resources/Font dictionary to a global `ObjectId`, then returns
48+
/// the cached `FontEntry` (or parses + caches it on miss).
49+
///
50+
/// Returns `Arc<FontEntry>` so the caller can hold the entry across the
51+
/// borrow that filled the cache without lifetime contortions.
52+
pub fn get_font(
53+
&mut self,
3654
name: &str,
3755
doc: &Document,
3856
resources: &Dictionary,
39-
) -> Option<&'a FontEntry> {
40-
if self.fonts.contains_key(name) {
41-
return self.fonts.get(name);
57+
) -> Option<Arc<FontEntry>> {
58+
let (font_id_opt, font_dict) = Self::resolve_font_dict_with_id(name, doc, resources)?;
59+
60+
// Cache hit (only possible for indirectly-referenced fonts)
61+
if let Some(font_id) = font_id_opt {
62+
if let Some(entry) = self.fonts.get(&font_id) {
63+
return Some(entry.clone());
64+
}
65+
}
66+
67+
// Cache miss — do the expensive parse
68+
let entry = Arc::new(Self::build_font_entry(&font_dict, doc));
69+
70+
// Cache only when we have a stable global ObjectId
71+
if let Some(font_id) = font_id_opt {
72+
self.fonts.insert(font_id, entry.clone());
4273
}
4374

44-
// Look up font dictionary from Resources -> Font -> <name>
45-
let font_dict = Self::resolve_font_dict(name, doc, resources)?;
75+
Some(entry)
76+
}
4677

78+
/// Build a FontEntry from a font dictionary. This is the expensive
79+
/// per-font work — all of the calls inside (extract_and_parse_font,
80+
/// try_system_font, parse_truetype) walk every glyph in the embedded
81+
/// font and build a 64K-entry Unicode→GID cmap. Caching the result via
82+
/// `get_font()` saves all of this work on subsequent lookups within the
83+
/// same document.
84+
fn build_font_entry(font_dict: &Dictionary, doc: &Document) -> FontEntry {
4785
// Extract base font name
4886
let base_font = font_dict
4987
.get(b"BaseFont")
@@ -68,19 +106,19 @@ impl FontRegistry {
68106

69107
// Check CIDToGIDMap for Type0 fonts
70108
let cid_to_gid_identity = if is_cid {
71-
Self::check_cid_to_gid_identity(&font_dict, doc)
109+
Self::check_cid_to_gid_identity(font_dict, doc)
72110
} else {
73111
false
74112
};
75113

76114
// Extract encoding info
77-
let (encoding_name, differences) = Self::extract_encoding(&font_dict, doc);
115+
let (encoding_name, differences) = Self::extract_encoding(font_dict, doc);
78116

79117
// Extract ToUnicode CMap (maps char codes to Unicode codepoints)
80-
let to_unicode = Self::extract_to_unicode(&font_dict, doc);
118+
let to_unicode = Self::extract_to_unicode(font_dict, doc);
81119

82120
// Try to extract and parse embedded font data
83-
let mut parsed = Self::extract_and_parse_font(&font_dict, doc);
121+
let mut parsed = Self::extract_and_parse_font(font_dict, doc);
84122

85123
// Check if the embedded font has usable glyph outlines for common character codes.
86124
// Some PDFs embed fonts with empty glyph entries for subset codes — fall back
@@ -103,21 +141,18 @@ impl FontRegistry {
103141

104142
// For Type0 fonts with DescendantFonts, also check the descendant for embedded data
105143
if parsed.is_none() && is_cid {
106-
parsed = Self::extract_descendant_font(&font_dict, doc);
144+
parsed = Self::extract_descendant_font(font_dict, doc);
107145
}
108146

109-
let entry = FontEntry {
147+
FontEntry {
110148
parsed,
111149
encoding_name,
112150
differences,
113151
base_font,
114152
is_cid,
115153
cid_to_gid_identity,
116154
to_unicode,
117-
};
118-
119-
self.fonts.insert(name.to_string(), entry);
120-
self.fonts.get(name)
155+
}
121156
}
122157

123158
/// Resolve a character code to a glyph ID using the font entry.
@@ -154,19 +189,36 @@ impl FontRegistry {
154189
None
155190
}
156191

157-
/// Resolve the Font dictionary for a given font name from resources.
158-
fn resolve_font_dict(name: &str, doc: &Document, resources: &Dictionary) -> Option<Dictionary> {
192+
/// Resolve the Font dictionary for a given font name from resources,
193+
/// returning the global `ObjectId` if the font is referenced indirectly.
194+
/// Inline (non-referenced) font dicts get `None` as the id and are
195+
/// re-parsed on every lookup.
196+
fn resolve_font_dict_with_id(
197+
name: &str,
198+
doc: &Document,
199+
resources: &Dictionary,
200+
) -> Option<(Option<ObjectId>, Dictionary)> {
159201
let font_res = resources.get(b"Font").ok()?;
160-
let font_res = Self::resolve_obj(font_res, doc)?;
161-
let font_dict_parent = match font_res {
202+
let font_res_resolved = Self::resolve_obj(font_res, doc)?;
203+
let font_dict_parent = match font_res_resolved {
162204
Object::Dictionary(d) => d.clone(),
163205
_ => return None,
164206
};
165207

166208
let font_obj = font_dict_parent.get(name.as_bytes()).ok()?;
167-
let font_obj = Self::resolve_obj(font_obj, doc)?;
209+
210+
// The entry may be an indirect reference (which gives us a stable
211+
// ObjectId for caching) or an inline dictionary (no stable id).
168212
match font_obj {
169-
Object::Dictionary(d) => Some(d.clone()),
213+
Object::Reference(id) => {
214+
let resolved = doc.get_object(*id).ok()?.clone();
215+
if let Object::Dictionary(d) = resolved {
216+
Some((Some(*id), d))
217+
} else {
218+
None
219+
}
220+
}
221+
Object::Dictionary(d) => Some((None, d.clone())),
170222
_ => None,
171223
}
172224
}

0 commit comments

Comments
 (0)