Parse image dimensions by kettanaito · Pull Request #445 · mwilliamson/mammoth.js

kettanaito · 2025-05-27T09:17:35Z

Closes Image width and height #187

Motivation

This change parses out the image's cx and cy values that describe the end image dimensions in the document. As I've mentioned in the issue, these values are end values (e.g. after image resizing). DOCX doesn't store the original image dimensions. If consumers need those, they would have to infer them from the original image buffer using tools like sharp. Most of the time, however, you don't care about the original dimensions since you parse the document to render it in some way.

Roadmap

Discuss which units to use when exposing the image's cx and cy? They are in EMU by default. We can convert them to pixels, but we have to be consistent with how Mammoth treats other measurements here.
Add tests.

kettanaito · 2025-05-27T09:19:06Z

-        return combineResults(blips.map(readBlip.bind(null, element)));
+        var dimensions =
+            element.first('wp:extent') ||
+            picture


Do you implement safe nested property access in .getElementsByTagName()? If not, why aren't we checking if the referenced tags exist?

kettanaito · 2025-05-27T09:19:37Z

+                .attributes['a:ext']
+
+        return combineResults(blips.map((blip) => {
+            return readBlip(element, blip, dimensions)


I'm unwrapping the map function because there's no benefit to keeping it inlined. It only confuses the argument order.

kettanaito · 2025-05-27T09:20:23Z

-            return readImage(blipImageFile, altText);
+            return readImage(blipImageFile, {
+                altText,
+                height: dimensions.attributes['cy'],


Image dimensions are defined on a higher node than blips, so I assume dimensions apply to all blips. Let me know if I'm wrong in this assumption.

kettanaito · 2025-05-27T09:21:04Z

            .getElementsByTagName("a:blip");
-
-        return combineResults(blips.map(readBlip.bind(null, element)));
+        var dimensions =


Some vendors (Microsoft, Google Docs) store image dimensions in wp:extent while others, like Apple Pages, set the on the pic:pic node instead. We have to parse both scenarios (they describe the same thing and are mutually exclusive).

kettanaito · 2025-05-27T09:21:58Z


    function readDrawingElement(element) {
-        var blips = element
+        var picture = element


I'm lifting up the picture node reference so we don't have to look it up multiple times. If this is premature, let me know.

Parse image dimensions

a2027af

kettanaito mentioned this pull request May 27, 2025

Image width and height #187

Open

kettanaito commented May 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse image dimensions - #445

Parse image dimensions#445
kettanaito wants to merge 1 commit into
mwilliamson:masterfrom
kettanaito:feat/image-dimensions

kettanaito commented May 27, 2025 •

edited

Loading

Uh oh!

kettanaito May 27, 2025

Uh oh!

kettanaito May 27, 2025

Uh oh!

kettanaito May 27, 2025

Uh oh!

kettanaito May 27, 2025

Uh oh!

kettanaito May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kettanaito commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Roadmap

Uh oh!

kettanaito May 27, 2025

Choose a reason for hiding this comment

Uh oh!

kettanaito May 27, 2025

Choose a reason for hiding this comment

Uh oh!

kettanaito May 27, 2025

Choose a reason for hiding this comment

Uh oh!

kettanaito May 27, 2025

Choose a reason for hiding this comment

Uh oh!

kettanaito May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kettanaito commented May 27, 2025 •

edited

Loading