Issue concerning formula detection/processing.

I apologise in advance if this is uploaded in the wrong repository since I was utilising both langchain-opendataloader-pdf and opendataloader-pdf[hybrid]. If it is issued in the wrong repository, kindly inform to me and I will re-issue in the right repository instead!

### Bug

When using `hybrid_mode="full"` on the client side as recommended when the backend is started with `--enrich-formula`, all pages are expected to be routed to the hybrid backend. However, the extracted output is identical to what is produced without `hybrid_mode="full"`, indicating that pages are still being processed locally and the `--enrich-formula` enrichment has no effect.

This is especially visible on pages containing mathematical formulas, where the output still contains fragmented `SymbolMT`-encoded characters instead of properly reconstructed formula content with the appropriate "formula" label. At the end of this issue is a screenshot of the formula for further context.

The following line of command was used to create the hybrid instance:
```opendataloader-pdf-hybrid --enrich-formula --port 5002```

And the loader had the following parameters: 

```
loader = OpenDataLoaderPDFLoader(
        file_path=[str(sample_pdf)],
        format="json",
        quiet=True,
        split_pages=True,
        use_struct_tree=True, 
        table_method="cluster", 
        include_header_footer=False,
        sanitize=False, 

        # Image Handling
        image_output="external", 
        image_dir="/home/<redacted>/opendata_test/imagestore",
        image_format="png",

        # Hybrid Extractions
        hybrid="docling-fast",
        hybrid_mode="full",
        hybrid_url="http://localhost:5002",
        hybrid_timeout="10000",
        hybrid_fallback=True,
    )
```
...

### Version

Python 3.13.9
langchain-opendataloader-pdf 2.0.0
opendataloader-pdf           2.0.1
langchain-text-splitters     1.1.1

Only the following commands were used concerning package installation:
```
uv pip install "opendataloader-pdf[hybrid]"
uv pip install -U langchain-opendataloader-pdf
uv pip install -U langchain-text-splitters
```

...

### Java version

openjdk 17.0.16 2025-07-15
OpenJDK Runtime Environment (build 17.0.16+8-Ubuntu-0ubuntu122.04.1)
OpenJDK 64-Bit Server VM (build 17.0.16+8-Ubuntu-0ubuntu122.04.1, mixed mode, sharing)

...

### Image 

<img width="596" height="99" alt="Image" src="https://github.com/user-attachments/assets/4c9ab008-4623-40e6-a28b-8717a98bf263" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue concerning formula detection/processing. #297

Bug

Version

Java version

Image

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue concerning formula detection/processing. #297

Description

Bug

Version

Java version

Image

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions