| title | Preprocessors |
|---|---|
| id | experimental-preprocessors-api |
| description | Pipelines wrapped as components. |
| slug | /experimental-preprocessors-api |
Infers and rewrites header levels in Markdown text to normalize hierarchy.
First header → Always becomes level 1 (#)
Subsequent headers → Level increases if no content between headers, stays same if content exists
Maximum level → Capped at 6 (######)
### Usage example
```python
from haystack import Document
from haystack_experimental.components.preprocessors import MarkdownHeaderLevelInferrer
# Create a document with uniform header levels
text = "## Title
Section
More Content" doc = Document(content=text)
# Initialize the inferrer and process the document
inferrer = MarkdownHeaderLevelInferrer()
result = inferrer.run([doc])
# The headers are now normalized with proper hierarchy
print(result["documents"][0].content)
> # Title
Section
More Content ```
def __init__()Initializes the MarkdownHeaderLevelInferrer.
@component.output_types(documents=list[Document])
def run(documents: list[Document]) -> dictInfers and rewrites the header levels in the content for documents that use uniform header levels.
Arguments:
documents: list of Document objects to process.
Returns:
dict: a dictionary with the key 'documents' containing the processed Document objects.