Skip to content

Commit 71bf8d7

Browse files
feat(converter): render EMF+ images via embedded bitmaps (SD-2503) (#3214)
* feat(converter): render EMF+ images via embedded bitmaps (SD-2503) EMF+ payloads use GDI+ drawing records that rtf.js doesn't implement, so prior to this change every EMF+ image rendered as the "Unable to render EMF+ image" placeholder. Most real-world EMF+ files generated by Office (cover slides, charts, illustrations) embed a complete PNG/JPEG inside an EmfPlusObject(Image) record with BitmapDataType=Compressed. Walk the EMR_COMMENT records in the EMF stream, parse the inner EMF+ records, reassemble continuation series, and return the embedded image directly. Per MS-EMFPLUS § 2.3.5.1 the EmfPlusObject header layout depends on the ContinueBit: ContinueBit=1: Type(2) Flags(2) Size(4) TotalObjectSize(4) DataSize(4) ObjectData ContinueBit=0: Type(2) Flags(2) Size(4) DataSize(4) ObjectData TotalObjectSize is present on every continued record (not only the first). The strict spec terminates a continued series with a ContinueBit=0 record; the parser also flushes early once TotalObjectSize bytes have been accumulated as a defense against off-spec encoders that leave ContinueBit=1 on the final record. Pure-vector and pixel-format EMF+ images still fall back to the placeholder — a full GDI+ rasterizer is out of scope here. Tests use synthetic in-memory EMF+ buffers and cover PNG/JPEG extraction, spec-compliant 2-record and 3-record continuation reassembly, off-spec early flush, the non-Image fallback path, and rejection of pixel-format bitmaps. Closes #3172 * feat(converter): render raw-pixel EmfPlusBitmap via canvas + review nits Address review feedback on #3214: 1. Raw-pixel EmfPlusBitmap support — the m3 proposal.docx reproducer from #3172 stores its image as raw pixels (BitmapDataType=Pixel), which the prior extractor rejected. Now decode 24bppRGB / 32bppRGB / 32bppARGB / 32bppPARGB pixel data, draw onto a canvas, and export as PNG (mirroring the tiff-converter pattern). Indexed formats and missing-canvas environments still fall back to the placeholder. EMF+ pixel formats store channels in DWORD-little-endian order (B,G,R[,A]); the converter swaps to canvas-native R,G,B,A. PARGB un-premultiplies alpha so straight-alpha consumers render correctly. Negative height = top-down rows; positive height = bottom-up (classic Windows DIB), reversed before write. A MAX_PIXEL_BITMAP_PIXELS guard bounds canvas allocation at 100M pixels (~400 MB RGBA), matching tiff-converter. 2. Slice reassembled chunks to TotalObjectSize so an off-spec writer that overshoots its declared size doesn't tack trailing bytes onto the data URI. 3. Tighten EMR_COMMENT recordSize check to >= 20 to match isEmfPlus's existing minimum. Tests: 6 new pixel-bitmap tests using a vi.spyOn canvas mock cover the core 32bppARGB path, bottom-up row flipping, 24bppRGB byte order, 32bppPARGB un-premultiplication, the no-canvas fallback, and the indexed-format fallback. 20/20 in this file, 300/300 across the helpers directory. * fix(converter): treat EMF+ raw pixels as top-down regardless of Height sign MS-EMFPLUS § 2.2.2.2 is silent on what Height/Stride sign means for storage direction. The earlier reading "positive Height = bottom-up" borrowed from the classic Windows DIB convention, but every GDI+ producer (which means every Office-generated EMF+) lays pixel memory out top-down regardless of Height sign. Rendering the SD-2503 reproducer with the bottom-up assumption produced an upside-down cover image. Drop the row-reversal entirely; storage row 0 is the visual top in all cases. Update the corresponding test and JSDoc to reflect the empirical convention. --------- Co-authored-by: Caio Pizzol <caio@superdoc.dev>
1 parent 6d65d9a commit 71bf8d7

2 files changed

Lines changed: 966 additions & 6 deletions

File tree

0 commit comments

Comments
 (0)