Feature Proposal: Add Image Position References to Document Chunks

## Summary

Enhance the chunking pipeline to include precise image-to-chunk mapping metadata, enabling accurate tracking of image positions within documents. This addresses current limitations where image presence is not indicated within the chunks and so we can't exactly know where that image live inside the document.

---

## Current Limitation

In the existing implementation :

* Images are not present in the chunks part of the output, they are only present in the documents.content.type_content.pictures

### There is no precise information about:

-   Where the image lives exactly in the document.
-   Which chunk come before and after the image.
-   Limitation concerning handling image in a RAG pipeline.

---

## Impact

This lack of granularity prevents:

* Accurate image-to-chunk association
* Reconstruction of documents with correct image placement
* Effective use in RAG pipelines where contextual alignment matters
* Downstream processing requiring image provenance or positioning

---


## Proposed Enhancement

* Images are serialized using a static placeholder:
```markdown
chunk_i-1  ![Picture] chunk_i+1
```
* Chunk metadata includes :

  ```json
  {
    "has_image": true
  }
  ```
* And so the output will look like this : 

```json
    {
      "text": "text \n![Picture]\n text",
      "headings": [
        "heading"
      ],   
     "page_numbers": [2],
      "metadata": {
        "origin": {
          "mimetype": "application/pdf",
          "binary_hash": 16467438883613526983,
          "filename": "file.pdf",
          "uri": null
        },
        "has_image": true
      }
    }
```
* Like this we made sure we have enough information about our chunk images.

---
## Benefits

* Enables precise multimodal alignment in RAG systems
* Supports document reconstruction with layout fidelity
---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: Add Image Position References to Document Chunks #136

Summary

Current Limitation

There is no precise information about:

Impact

Proposed Enhancement

Benefits

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Proposal: Add Image Position References to Document Chunks #136

Description

Summary

Current Limitation

There is no precise information about:

Impact

Proposed Enhancement

Benefits

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions