Skip to content

include_images=false still enables image generation for non-placeholder export modes #150

@drukpa1455

Description

@drukpa1455

Problem

include_images=false does not currently suppress image generation when image_export_mode is embedded or referenced.

In the conversion manager, image_export_mode != placeholder enables page image generation, and referenced additionally enables picture image generation, without checking include_images.

That makes the two options conflict semantically: callers can explicitly ask not to include images, but still get generated/persisted image assets because the export mode is referenced.

Expected

include_images=false should be the controlling option for generated image assets. If it is false, conversion should not enable page or picture image generation regardless of image_export_mode.

Evidence

In a downstream docling-jobkit/docling-serve workload on a 157-page PDF:

  • include_images=false with image_export_mode=referenced still emitted 486 image assets, about 42.8 MB of assets, and about 130 MB JSON.
  • include_images=false with image_export_mode=placeholder emitted 0 assets and about 14.2 MB JSON.

The table/chunk text outputs were stable across the two runs, so the extra assets came from image-generation/export semantics rather than document content.

Proposed fix

Gate page and picture image generation on include_images before applying image_export_mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions