Skip to content

Commit fa72190

Browse files
andiwandclaude
andauthored
PDF: text colour, strongly-typed render mode, image clip fix (#571)
* font: drop out-of-range glyph ids when re-encoding to PUA `glyph_for_code` can fall back to "code as GID" (ISO 32000-1 9.6.6.4) and yield an index past `numGlyphs`. A single cmap entry referencing such a glyph makes the OTS sanitizer reject the *entire* font, so every glyph renders as a tofu box rather than just the unmappable code. Skip those entries in both the CFF (`wrap_to_otf`) and SFNT (`reencode_to_pua`) paths, with tests covering the drop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * PDF: text colour, strongly-typed render mode, image clip fix - Carry the non-stroking (fill) and stroking device colours on each TextElement and paint runs in their colour (interned `.k` class) when it is not the default black — fill modes use the fill colour, the stroke-only modes (Tr 1/5) the stroking colour. The colour rides the visible layer in the dual-layer (PUA glyph) cases. - Replace the bare `int rendering_mode` with a strongly-typed `TextRenderingMode` enum (ISO 32000-1 Table 106) across the graphics state, page element and renderer, removing the magic 0/1/3/5/7 comparisons. - Wrap clipped `<image>` elements in a transform-free `<g clip-path>`: a clip-path on the image itself resolves in the image's post-transform unit-square space and clips the whole image away. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent dcc7230 commit fa72190

12 files changed

Lines changed: 140 additions & 38 deletions

File tree

src/odr/internal/font/cff_transform.cpp

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,15 @@ std::string cff::wrap_to_otf(const CffFont &font,
153153
pua[pua_code_point(glyph)] = glyph;
154154
}
155155
// Real-Unicode entries: caller guarantees BMP, non-PUA keys, so these never
156-
// collide with the PUA range filled above.
156+
// collide with the PUA range filled above. A glyph id the font does not have
157+
// is dropped: `glyph_for_code` can fall back to "code as GID" (ISO 32000-1
158+
// 9.6.6.4) and yield an out-of-range index, and a single cmap reference past
159+
// `numGlyphs` makes the OTS sanitizer reject the *entire* font (so every
160+
// glyph would render as a tofu box, not just the unmappable code).
157161
for (const auto &[code, glyph] : extra) {
158-
pua[code] = glyph;
162+
if (glyph < glyphs) {
163+
pua[code] = glyph;
164+
}
159165
}
160166

161167
std::uint16_t advance_width_max = 0;

src/odr/internal/font/sfnt_transform.cpp

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -282,9 +282,15 @@ void font::reencode_to_pua(sfnt::SfntFont &font,
282282
map[pua_code_point(glyph)] = glyph;
283283
}
284284
// Real-Unicode entries: caller guarantees BMP, non-PUA keys, so these never
285-
// collide with the PUA range filled above.
285+
// collide with the PUA range filled above. A glyph id the font does not have
286+
// is dropped: `glyph_for_code` can fall back to "code as GID" (ISO 32000-1
287+
// 9.6.6.4) and yield an out-of-range index, and a single cmap reference past
288+
// `numGlyphs` makes the OTS sanitizer reject the *entire* font (so every
289+
// glyph would render as a tofu box, not just the unmappable code).
286290
for (const auto &[code, glyph] : extra) {
287-
map[code] = glyph;
291+
if (glyph < font.glyph_count()) {
292+
map[code] = glyph;
293+
}
288294
}
289295
font.set_cmap(std::move(map));
290296
}

src/odr/internal/html/pdf_file.cpp

Lines changed: 48 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,12 @@ std::string svg_path_fragment(const pdf::PathElement &path,
181181
/// or "" when it carries no pass-through bytes. The image fills the unit square
182182
/// in user space (ISO 32000-1 8.10.5); the transform maps that square — through
183183
/// a vertical flip (the image's first row is its top, SVG draws y-down) and the
184-
/// CTM — into the page box. `clip_id` installs a clip via `clip-path`.
184+
/// CTM — into the page box. `clip_id`, when non-empty, installs a clip via a
185+
/// wrapping `<g clip-path>`. The clip geometry is in the page viewBox
186+
/// (`userSpaceOnUse`), but the `<image>` carries its own `transform`, so a
187+
/// `clip-path` placed *on the image* would be resolved in the image's
188+
/// post-transform unit-square space and clip the whole image away. The `<g>`
189+
/// carries no transform, so the clip is read in the viewBox where it lives.
185190
std::string svg_image_fragment(const pdf::ImageElement &image,
186191
const util::math::Transform2D &to_box,
187192
const std::string &clip_id) {
@@ -194,13 +199,18 @@ std::string svg_image_fragment(const pdf::ImageElement &image,
194199
const util::math::Transform2D m = flip * image.transform * to_box;
195200

196201
std::ostringstream f;
202+
// The clip wraps the image in a transform-free `<g>` rather than sitting on
203+
// the `<image>`: see the function comment.
204+
if (!clip_id.empty()) {
205+
f << "<g clip-path=\"url(#" << clip_id << ")\">";
206+
}
197207
f << R"(<image width="1" height="1" preserveAspectRatio="none" transform="matrix()"
198208
<< m.a << ',' << m.b << ',' << m.c << ',' << m.d << ',' << round2(m.e)
199209
<< ',' << round2(m.f) << ")\"";
210+
f << " href=\"" << file_to_url(image.data, image.mime) << "\"/>";
200211
if (!clip_id.empty()) {
201-
f << " clip-path=\"url(#" << clip_id << ")\"";
212+
f << "</g>";
202213
}
203-
f << " href=\"" << file_to_url(image.data, image.mime) << "\"/>";
204214
return std::move(f).str();
205215
}
206216

@@ -621,7 +631,29 @@ class HtmlServiceImpl final : public HtmlService {
621631
// Tr 3 (invisible) and Tr 7 (clip-only) paint nothing; keep them
622632
// selectable via the transparent `.i` class.
623633
const bool invisible =
624-
text.rendering_mode == 3 || text.rendering_mode == 7;
634+
text.rendering_mode == pdf::TextRenderingMode::invisible ||
635+
text.rendering_mode == pdf::TextRenderingMode::clip;
636+
637+
// The run's visible paint colour, folded onto the visible span as an
638+
// interned colour class — but only when it is not the default black, so
639+
// the overwhelmingly common black run adds nothing. The per-font
640+
// `.fvN`/`.gvN` classes declare `color:#000`; this class is emitted
641+
// after them in <head> (equal specificity), so it overrides. Invisible
642+
// runs (Tr 3/7) stay transparent via `.i`, so they take no colour
643+
// class. The fill modes paint with the non-stroking colour, the
644+
// stroke-only modes (Tr 1/5) with the stroking colour.
645+
std::string color_suffix;
646+
if (!invisible) {
647+
const pdf::GraphicsState::Color &paint =
648+
(text.rendering_mode == pdf::TextRenderingMode::stroke ||
649+
text.rendering_mode == pdf::TextRenderingMode::stroke_clip)
650+
? text.stroke_color
651+
: text.fill_color;
652+
if (std::string css = device_color_to_css(paint);
653+
css != "rgb(0,0,0)") {
654+
color_suffix = ' ' + styles.intern("k", "color:" + std::move(css));
655+
}
656+
}
625657

626658
// Placement and spacing are shared by both layers of a run; build them
627659
// once on `base`.
@@ -740,17 +772,18 @@ class HtmlServiceImpl final : public HtmlService {
740772
std::string classes = std::move(base);
741773
classes += ' ';
742774
classes += font_class(font, invisible, /*nested=*/false);
775+
classes += color_suffix;
743776
page_out.items.push_back(
744777
SpanOut{std::move(classes), escape_text(text.text), {}, {}});
745778
} else {
746779
// Dual layer (a glyph lost its scalar to an earlier one): a
747780
// transparent selectable Unicode span with the PUA glyph layer
748781
// nested inside, the latter folded into the combined `.gvN` /
749-
// `.giN` class.
750-
page_out.items.push_back(
751-
SpanOut{base + " i", escape_text(text.text),
752-
font_class(font, invisible, /*nested=*/true),
753-
escape_text(glyph_run(*text.font, text.codes))});
782+
// `.giN` class. The colour rides the visible (nested) layer.
783+
page_out.items.push_back(SpanOut{
784+
base + " i", escape_text(text.text),
785+
font_class(font, invisible, /*nested=*/true) + color_suffix,
786+
escape_text(glyph_run(*text.font, text.codes))});
754787
}
755788
} else if (font != 0) {
756789
// The visible glyph layer: PUA code points in the embedded font,
@@ -763,16 +796,17 @@ class HtmlServiceImpl final : public HtmlService {
763796
// Unicode (for copy/search) with the glyph layer nested inside.
764797
// The nested child overlays the run origin and inherits the
765798
// placement via the combined `.gvN` / `.giN` class.
766-
page_out.items.push_back(
767-
SpanOut{base + " i", escape_text(text.text),
768-
font_class(font, invisible, /*nested=*/true),
769-
std::move(glyph_text)});
799+
page_out.items.push_back(SpanOut{
800+
base + " i", escape_text(text.text),
801+
font_class(font, invisible, /*nested=*/true) + color_suffix,
802+
std::move(glyph_text)});
770803
} else {
771804
// Display-only run: nothing is extractable (the `no_unicode` case),
772805
// so the glyph layer stands alone and carries the placement itself
773806
// (`base`), `.g` (unselectable) and the combined paint+font class.
774807
std::string glyph_classes = base + " g ";
775808
glyph_classes += font_class(font, invisible, /*nested=*/false);
809+
glyph_classes += color_suffix;
776810
page_out.items.push_back(SpanOut{
777811
std::move(glyph_classes), std::move(glyph_text), {}, {}});
778812
}
@@ -783,6 +817,7 @@ class HtmlServiceImpl final : public HtmlService {
783817
if (invisible) {
784818
classes += " i";
785819
}
820+
classes += color_suffix;
786821
page_out.items.push_back(
787822
SpanOut{std::move(classes), escape_text(text.text), {}, {}});
788823
}

src/odr/internal/pdf/pdf_graphics_state.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -250,7 +250,7 @@ void GraphicsState::execute(const GraphicsOperator &op) {
250250
break;
251251
case GraphicsOperatorType::set_text_rendering_mode:
252252
current().text.rendering_mode =
253-
static_cast<int>(op.arguments.at(0).as_integer());
253+
static_cast<TextRenderingMode>(op.arguments.at(0).as_integer());
254254
break;
255255
case GraphicsOperatorType::set_text_rise:
256256
current().text.rise = op.arguments.at(0).as_real();

src/odr/internal/pdf/pdf_graphics_state.hpp

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,21 @@ enum class ColorSpace {
1818
device_cmyk,
1919
};
2020

21+
/// Text rendering mode (`Tr`, ISO 32000-1 Table 106): how shown glyphs are
22+
/// painted and/or added to the clipping path. The low two bits select the paint
23+
/// (fill / stroke / both / none); the `_clip` modes additionally add the glyph
24+
/// outlines to the clip. The integer values are the `Tr` operands.
25+
enum class TextRenderingMode {
26+
fill = 0,
27+
stroke = 1,
28+
fill_stroke = 2,
29+
invisible = 3,
30+
fill_clip = 4,
31+
stroke_clip = 5,
32+
fill_stroke_clip = 6,
33+
clip = 7,
34+
};
35+
2136
/// One segment of a subpath, in user space (the CTM is already applied at
2237
/// construction time, ISO 32000-1 8.5.2.1). A line carries only `end`; a cubic
2338
/// Bézier carries its two control points as well.
@@ -72,16 +87,16 @@ struct GraphicsState {
7287
};
7388

7489
struct Text {
75-
double char_spacing{0}; // Tc
76-
double word_spacing{0}; // Tw
77-
double horizontal_scaling{100}; // Tz, in percent (100 = normal)
78-
double leading{0}; // TL
79-
std::string font; // Tf resource name
80-
double size{}; // Tf size
81-
int rendering_mode{0}; // Tr
82-
double rise{0}; // Ts
83-
util::math::Transform2D matrix; // Tm
84-
util::math::Transform2D line_matrix; // Tlm
90+
double char_spacing{0}; // Tc
91+
double word_spacing{0}; // Tw
92+
double horizontal_scaling{100}; // Tz, in percent (100 = normal)
93+
double leading{0}; // TL
94+
std::string font; // Tf resource name
95+
double size{}; // Tf size
96+
TextRenderingMode rendering_mode{TextRenderingMode::fill}; // Tr
97+
double rise{0}; // Ts
98+
util::math::Transform2D matrix; // Tm
99+
util::math::Transform2D line_matrix; // Tlm
85100
std::array<double, 2> glyph_width{};
86101
std::array<double, 4> glyph_bounding_box{};
87102
};

src/odr/internal/pdf/pdf_page_element.hpp

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,18 @@ struct TextElement {
2121
util::math::Transform2D transform;
2222
/// Resolved font, or `nullptr` when the `/Font` resource name was unknown.
2323
Font *font{nullptr};
24-
double size{0}; // Tf size
25-
double char_spacing{0}; // Tc
26-
double word_spacing{0}; // Tw
27-
double horizontal_scaling{100}; // Tz, percent
28-
double rise{0}; // Ts
29-
int rendering_mode{0}; // Tr
24+
double size{0}; // Tf size
25+
double char_spacing{0}; // Tc
26+
double word_spacing{0}; // Tw
27+
double horizontal_scaling{100}; // Tz, percent
28+
double rise{0}; // Ts
29+
TextRenderingMode rendering_mode{TextRenderingMode::fill}; // Tr
30+
/// Non-stroking (fill) and stroking colours in force when the run was shown,
31+
/// as device colours. The renderer paints the run in whichever its rendering
32+
/// mode selects — the fill colour for the common fill modes, the stroking
33+
/// colour for the stroke-only modes (Tr 1/5) — defaulting to black.
34+
GraphicsState::Color fill_color;
35+
GraphicsState::Color stroke_color;
3036
/// Raw character codes shown by this segment (one `Tj`, or one string of a
3137
/// `TJ` array).
3238
std::string codes;

src/odr/internal/pdf/pdf_page_extractor.cpp

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@ void show(std::vector<PageElement> &out, GraphicsState &state,
270270
element.horizontal_scaling = text.horizontal_scaling;
271271
element.rise = text.rise;
272272
element.rendering_mode = text.rendering_mode;
273+
element.fill_color = state.current().other_color;
274+
element.stroke_color = state.current().stroke_color;
273275
ResolvedText resolved = resolve_text(marked, font, codes);
274276
element.text = std::move(resolved.text);
275277
element.no_unicode = resolved.no_unicode;
@@ -409,7 +411,11 @@ void paint_path(std::vector<PageElement> &out, GraphicsState &state, bool fill,
409411
element.dash_array.push_back(on_off * scale);
410412
}
411413
element.dash_phase = s.general.dash.phase * scale;
412-
out.push_back(std::move(element));
414+
// Construct the variant alternative in place: building a temporary variant
415+
// and move-constructing it (`push_back(std::move(element))`) trips a spurious
416+
// GCC 14 `-Wmaybe-uninitialized` on the string-bearing `ImageElement`
417+
// alternative while analyzing the union move, which `-Werror` makes fatal.
418+
out.emplace_back(std::in_place_type<PathElement>, std::move(element));
413419
// Install a pending `W`/`W*` now (it uses the just-painted path) so it scopes
414420
// the *following* content, then drop the path.
415421
state.commit_clip();

test/src/internal/font/cff_font.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,3 +441,18 @@ TEST(CffFontTest, WrapsToLoadableOtf) {
441441
// The synthesized hmtx carries the CFF advance widths.
442442
EXPECT_EQ(wrapped.advance_width(1), 250);
443443
}
444+
445+
TEST(CffFontTest, WrapDropsExtraEntriesPastGlyphCount) {
446+
using namespace odr::internal::font;
447+
const CffFont cff{build_cff()}; // 2 glyphs: valid ids 0..1
448+
// 'A' -> 1 is in range; 'B' -> 5 is past the glyph count. An out-of-range
449+
// glyph reference makes the OTS sanitizer reject the whole cmap (and the
450+
// font), so it must be dropped rather than written into the cmap.
451+
const std::string otf = cff::wrap_to_otf(cff, {{U'A', 1}, {U'B', 5}});
452+
453+
ASSERT_TRUE(sfnt::SfntFont::is_sfnt(otf));
454+
const sfnt::SfntFont wrapped{std::make_unique<std::istringstream>(otf)};
455+
EXPECT_EQ(wrapped.glyph_for_code_point('A'), 1);
456+
EXPECT_EQ(wrapped.glyph_for_code_point('B'), 0); // dropped: not mapped
457+
EXPECT_EQ(wrapped.glyph_for_code_point(pua_code_point(1)), 1);
458+
}

0 commit comments

Comments
 (0)