Skip to content

Parse image dimensions#445

Draft
kettanaito wants to merge 1 commit into
mwilliamson:masterfrom
kettanaito:feat/image-dimensions
Draft

Parse image dimensions#445
kettanaito wants to merge 1 commit into
mwilliamson:masterfrom
kettanaito:feat/image-dimensions

Conversation

@kettanaito
Copy link
Copy Markdown

@kettanaito kettanaito commented May 27, 2025

Motivation

This change parses out the image's cx and cy values that describe the end image dimensions in the document. As I've mentioned in the issue, these values are end values (e.g. after image resizing). DOCX doesn't store the original image dimensions. If consumers need those, they would have to infer them from the original image buffer using tools like sharp. Most of the time, however, you don't care about the original dimensions since you parse the document to render it in some way.

Roadmap

  • Discuss which units to use when exposing the image's cx and cy? They are in EMU by default. We can convert them to pixels, but we have to be consistent with how Mammoth treats other measurements here.
  • Add tests.

Comment thread lib/docx/body-reader.js
return combineResults(blips.map(readBlip.bind(null, element)));
var dimensions =
element.first('wp:extent') ||
picture
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you implement safe nested property access in .getElementsByTagName()? If not, why aren't we checking if the referenced tags exist?

Comment thread lib/docx/body-reader.js
.attributes['a:ext']

return combineResults(blips.map((blip) => {
return readBlip(element, blip, dimensions)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unwrapping the map function because there's no benefit to keeping it inlined. It only confuses the argument order.

Comment thread lib/docx/body-reader.js
return readImage(blipImageFile, altText);
return readImage(blipImageFile, {
altText,
height: dimensions.attributes['cy'],
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image dimensions are defined on a higher node than blips, so I assume dimensions apply to all blips. Let me know if I'm wrong in this assumption.

Comment thread lib/docx/body-reader.js
.getElementsByTagName("a:blip");

return combineResults(blips.map(readBlip.bind(null, element)));
var dimensions =
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some vendors (Microsoft, Google Docs) store image dimensions in wp:extent while others, like Apple Pages, set the on the pic:pic node instead. We have to parse both scenarios (they describe the same thing and are mutually exclusive).

Comment thread lib/docx/body-reader.js

function readDrawingElement(element) {
var blips = element
var picture = element
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm lifting up the picture node reference so we don't have to look it up multiple times. If this is premature, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image width and height

1 participant