Problem
include_images=false does not currently suppress image generation when image_export_mode is embedded or referenced.
In the conversion manager, image_export_mode != placeholder enables page image generation, and referenced additionally enables picture image generation, without checking include_images.
That makes the two options conflict semantically: callers can explicitly ask not to include images, but still get generated/persisted image assets because the export mode is referenced.
Expected
include_images=false should be the controlling option for generated image assets. If it is false, conversion should not enable page or picture image generation regardless of image_export_mode.
Evidence
In a downstream docling-jobkit/docling-serve workload on a 157-page PDF:
include_images=false with image_export_mode=referenced still emitted 486 image assets, about 42.8 MB of assets, and about 130 MB JSON.
include_images=false with image_export_mode=placeholder emitted 0 assets and about 14.2 MB JSON.
The table/chunk text outputs were stable across the two runs, so the extra assets came from image-generation/export semantics rather than document content.
Proposed fix
Gate page and picture image generation on include_images before applying image_export_mode.
Problem
include_images=falsedoes not currently suppress image generation whenimage_export_modeisembeddedorreferenced.In the conversion manager,
image_export_mode != placeholderenables page image generation, andreferencedadditionally enables picture image generation, without checkinginclude_images.That makes the two options conflict semantically: callers can explicitly ask not to include images, but still get generated/persisted image assets because the export mode is referenced.
Expected
include_images=falseshould be the controlling option for generated image assets. If it is false, conversion should not enable page or picture image generation regardless ofimage_export_mode.Evidence
In a downstream docling-jobkit/docling-serve workload on a 157-page PDF:
include_images=falsewithimage_export_mode=referencedstill emitted 486 image assets, about 42.8 MB of assets, and about 130 MB JSON.include_images=falsewithimage_export_mode=placeholderemitted 0 assets and about 14.2 MB JSON.The table/chunk text outputs were stable across the two runs, so the extra assets came from image-generation/export semantics rather than document content.
Proposed fix
Gate page and picture image generation on
include_imagesbefore applyingimage_export_mode.