Commit 71bf8d7
feat(converter): render EMF+ images via embedded bitmaps (SD-2503) (#3214)
* feat(converter): render EMF+ images via embedded bitmaps (SD-2503)
EMF+ payloads use GDI+ drawing records that rtf.js doesn't implement, so
prior to this change every EMF+ image rendered as the "Unable to render
EMF+ image" placeholder.
Most real-world EMF+ files generated by Office (cover slides, charts,
illustrations) embed a complete PNG/JPEG inside an EmfPlusObject(Image)
record with BitmapDataType=Compressed. Walk the EMR_COMMENT records in
the EMF stream, parse the inner EMF+ records, reassemble continuation
series, and return the embedded image directly.
Per MS-EMFPLUS § 2.3.5.1 the EmfPlusObject header layout depends on the
ContinueBit:
ContinueBit=1: Type(2) Flags(2) Size(4) TotalObjectSize(4) DataSize(4) ObjectData
ContinueBit=0: Type(2) Flags(2) Size(4) DataSize(4) ObjectData
TotalObjectSize is present on every continued record (not only the
first). The strict spec terminates a continued series with a
ContinueBit=0 record; the parser also flushes early once TotalObjectSize
bytes have been accumulated as a defense against off-spec encoders that
leave ContinueBit=1 on the final record.
Pure-vector and pixel-format EMF+ images still fall back to the
placeholder — a full GDI+ rasterizer is out of scope here.
Tests use synthetic in-memory EMF+ buffers and cover PNG/JPEG
extraction, spec-compliant 2-record and 3-record continuation
reassembly, off-spec early flush, the non-Image fallback path, and
rejection of pixel-format bitmaps.
Closes #3172
* feat(converter): render raw-pixel EmfPlusBitmap via canvas + review nits
Address review feedback on #3214:
1. Raw-pixel EmfPlusBitmap support — the m3 proposal.docx reproducer
from #3172 stores its image as raw pixels (BitmapDataType=Pixel),
which the prior extractor rejected. Now decode 24bppRGB / 32bppRGB /
32bppARGB / 32bppPARGB pixel data, draw onto a canvas, and export as
PNG (mirroring the tiff-converter pattern). Indexed formats and
missing-canvas environments still fall back to the placeholder.
EMF+ pixel formats store channels in DWORD-little-endian order
(B,G,R[,A]); the converter swaps to canvas-native R,G,B,A. PARGB
un-premultiplies alpha so straight-alpha consumers render correctly.
Negative height = top-down rows; positive height = bottom-up
(classic Windows DIB), reversed before write.
A MAX_PIXEL_BITMAP_PIXELS guard bounds canvas allocation at 100M
pixels (~400 MB RGBA), matching tiff-converter.
2. Slice reassembled chunks to TotalObjectSize so an off-spec writer
that overshoots its declared size doesn't tack trailing bytes onto
the data URI.
3. Tighten EMR_COMMENT recordSize check to >= 20 to match isEmfPlus's
existing minimum.
Tests: 6 new pixel-bitmap tests using a vi.spyOn canvas mock cover the
core 32bppARGB path, bottom-up row flipping, 24bppRGB byte order,
32bppPARGB un-premultiplication, the no-canvas fallback, and the
indexed-format fallback. 20/20 in this file, 300/300 across the
helpers directory.
* fix(converter): treat EMF+ raw pixels as top-down regardless of Height sign
MS-EMFPLUS § 2.2.2.2 is silent on what Height/Stride sign means for
storage direction. The earlier reading "positive Height = bottom-up"
borrowed from the classic Windows DIB convention, but every GDI+
producer (which means every Office-generated EMF+) lays pixel memory
out top-down regardless of Height sign. Rendering the SD-2503
reproducer with the bottom-up assumption produced an upside-down
cover image.
Drop the row-reversal entirely; storage row 0 is the visual top in
all cases. Update the corresponding test and JSDoc to reflect the
empirical convention.
---------
Co-authored-by: Caio Pizzol <caio@superdoc.dev>1 parent 6d65d9a commit 71bf8d7
2 files changed
Lines changed: 966 additions & 6 deletions
File tree
- packages/super-editor/src/editors/v1/core/super-converter/v3/handlers/wp/helpers
0 commit comments